On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality. Friedman, J. H. Data Mining and Knowledge Discovery, 1(1):55–77, 1997.
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality [link]Paper  doi  abstract   bibtex   
The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = \x_1,....,x_n\. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like ``naive'' Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why``bagging/aggregating'' classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.
@Article{friedman97bias,
  author    = {Friedman, Jerome H.},
  title     = {On Bias, Variance, 0/1---Loss, and the Curse-of-Dimensionality},
  journal   = {Data Mining and Knowledge Discovery},
  year      = {1997},
  volume    = {1},
  number    = {1},
  pages     = {55--77},
  issn      = {1573-756X},
  abstract  = {The classification problem is considered in which an outputvariable y assumes discrete values with respectiveprobabilities that depend upon the simultaneous values of a set of input variablesx = {\{}x{\_}1,....,x{\_}n{\}}. At issue is how error in the estimates of theseprobabilities affects classification error when the estimates are used ina classification rule. These effects are seen to be somewhat counterintuitive in both their strength and nature. In particular the bias andvariance components of the estimation error combine to influenceclassification in a very different way than with squared error on theprobabilities themselves. Certain types of (very high) bias can becanceled by low variance to produce accurate classification. This candramatically mitigate the effect of the bias associated with some simpleestimators like ``naive'' Bayes, and the bias induced by thecurse-of-dimensionality on nearest-neighbor procedures. This helps explainwhy such simple methods are often competitive with and sometimes superiorto more sophisticated ones for classification, and why``bagging/aggregating'' classifiers can often improveaccuracy. These results also suggest simple modifications to theseprocedures that can (sometimes dramatically) further improve theirclassification performance.},
  doi       = {10.1023/A:1009778005914},
  owner     = {Purva},
  timestamp = {2016-09-18},
  url       = {http://dx.doi.org/10.1023/A:1009778005914},
}

Downloads: 0