Feature selection for classification tasks: Expert knowledge or traditional methods?. Corrales, D. C., Lasso, E., Ledezma, A., & Corrales, J. C. Journal of Intelligent & Fuzzy Systems, Preprint:1-11, 2018. JCR (2016) 88/133 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
doi  abstract   bibtex   
Recently, available data has increased explosively in both number of samples and dimensionality. The huge number of high dimensional data generates the presence of noisy, redundant and irrelevant dimensions. Such dimensions can increase the time and computational cost in the learning process and even degenerate the performance of learning tasks. One of the ways to reduce dimensionality is by Feature Selection (FS). The aim of this paper is study the feature selection based on expert knowledge and traditional methods (filter, wrapper and embedded) and analyze their performance in classification tasks. Three datasets related to cancer domain in humans were used for feature selection: Breast Cancer (BC), Primary Tumor (PT) and Central Nervous System (CNS). C4.5, K-Nearest Neighbors, Support Vector Machine and Multi Layer Perceptron were trained with the best subset of features for each cancer dataset. The subset of features selected by the wrapper method presents the best average accuracy in the datasets BC and PT, while the subset of features selected by the embedded method reaches the highest average accuracy in the CNS dataset.
@Article{Corrales2018,
  author   = {David Camilo Corrales and Emmanuel Lasso and Agapito Ledezma and Juan Carlos Corrales},
  title    = {Feature selection for classification tasks: Expert knowledge or traditional methods?},
  journal  = {Journal of Intelligent \& Fuzzy Systems},
  year     = {2018},
  volume   = {Preprint},
  pages    = {1-11},
  issn     = {ISSN: 1064-1246},
  note     = {JCR (2016) 88/133 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE},
  abstract = {Recently, available data has increased explosively in both number of samples and dimensionality. The huge number of high dimensional data generates the presence of noisy, redundant and irrelevant dimensions. Such dimensions can increase the time and computational cost in the learning process and even degenerate the performance of learning tasks. One of the ways to reduce dimensionality is by Feature Selection (FS). The aim of this paper is study the feature selection based on expert knowledge and traditional methods (filter, wrapper and embedded) and analyze their performance in classification tasks. Three datasets related to cancer domain in humans were used for feature selection: Breast Cancer (BC), Primary Tumor (PT) and Central Nervous System (CNS). C4.5, K-Nearest Neighbors, Support Vector Machine and Multi Layer Perceptron were trained with the best subset of features for each cancer dataset. The subset of features selected by the wrapper method presents the best average accuracy in the datasets BC and PT, while the subset of features selected by the embedded method reaches the highest average accuracy in the CNS dataset.},
  doi      = {10.3233/JIFS-169470},
  file     = {:journals/2018_Feature selection for classification tasks- Expert knowledge or traditional methods.pdf:PDF},
  groups   = {JCR},
}

Downloads: 0