Performance of Several Variable-Selection Methods Applied to Real Ecological Data. Murtaugh, P. A. 12(10):1061–1068.
Performance of Several Variable-Selection Methods Applied to Real Ecological Data [link]Paper  doi  abstract   bibtex   
I evaluated the predictive ability of statistical models obtained by applying seven methods of variable selection to 12 ecological and environmental data sets. Cross-validation, involving repeated splits of each data set into training and validation subsets, was used to obtain honest estimates of predictive ability that could be fairly compared among methods. There was surprisingly little difference in predictive ability among five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike’s information criterion, Schwarz’s Bayesian information criterion or F statistics) had little effect on predictive ability. For most of the data sets, two methods based on regression trees yielded models with substantially lower predictive ability. I argue that there is no ‘best’ method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.
@article{murtaughPerformanceSeveralVariableselection2009,
  title = {Performance of Several Variable-Selection Methods Applied to Real Ecological Data},
  author = {Murtaugh, Paul A.},
  date = {2009},
  journaltitle = {Ecology Letters},
  volume = {12},
  pages = {1061--1068},
  issn = {1461-0248},
  doi = {10.1111/j.1461-0248.2009.01361.x},
  url = {https://doi.org/10.1111/j.1461-0248.2009.01361.x},
  urldate = {2019-09-19},
  abstract = {I evaluated the predictive ability of statistical models obtained by applying seven methods of variable selection to 12 ecological and environmental data sets. Cross-validation, involving repeated splits of each data set into training and validation subsets, was used to obtain honest estimates of predictive ability that could be fairly compared among methods. There was surprisingly little difference in predictive ability among five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike’s information criterion, Schwarz’s Bayesian information criterion or F statistics) had little effect on predictive ability. For most of the data sets, two methods based on regression trees yielded models with substantially lower predictive ability. I argue that there is no ‘best’ method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.},
  keywords = {~INRMM-MiD:z-6KL8KDFQ,comparison,data-transformation-modelling,regression,variable-selection},
  langid = {english},
  number = {10}
}

Downloads: 0