Optimization of dependency and pruning usage in text classification. Özgür, L. & Güngör, T. Pattern Analysis and Applications, 15(1):45-58, 2, 2012.
abstract   bibtex   
In this study, a comprehensive analysis of the lexical dependency and pruning concepts for the text classification problem is presented. Dependencies are included in the feature vector as an extension to the stan- dard bag-of-words approach. The pruning process filters features with low frequencies so that fewer but more informative features remain in the solution vector. The pruning levels for words, dependencies, and dependency combinations for different datasets are analyzed in detail. The main motivation in this work is to make use of dependencies and pruning efficiently in text classification and to achieve more successful results using much smaller feature vector sizes. Three different datasets were used in the experiments and statistically significant improvements for most of the proposed approaches were obtained.
@article{
 title = {Optimization of dependency and pruning usage in text classification},
 type = {article},
 year = {2012},
 identifiers = {[object Object]},
 keywords = {Lexical dependency,Pruning analysis,Stanford parser,Text classification},
 pages = {45-58},
 volume = {15},
 month = {2},
 id = {08766672-4f83-3144-9dab-398686722674},
 created = {2019-10-12T10:47:45.895Z},
 accessed = {2019-10-12},
 file_attached = {false},
 profile_id = {1971c810-6732-3a00-9f6b-d217e1a53071},
 group_id = {cbcfbfec-195f-3b99-b6a1-d26e1dd80ff5},
 last_modified = {2019-10-12T10:47:45.972Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 private_publication = {false},
 abstract = {In this study, a comprehensive analysis of the lexical dependency and pruning concepts for the text classification problem is presented. Dependencies are included in the feature vector as an extension to the stan- dard bag-of-words approach. The pruning process filters features with low frequencies so that fewer but more informative features remain in the solution vector. The pruning levels for words, dependencies, and dependency combinations for different datasets are analyzed in detail. The main motivation in this work is to make use of dependencies and pruning efficiently in text classification and to achieve more successful results using much smaller feature vector sizes. Three different datasets were used in the experiments and statistically significant improvements for most of the proposed approaches were obtained.},
 bibtype = {article},
 author = {Özgür, Levent and Güngör, Tunga},
 journal = {Pattern Analysis and Applications},
 number = {1}
}

Downloads: 0