Examining the Impact of Feature Selection on Classification of User Reviews in Web Pages. Uzun, E. & Özhan, E. In 2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018, pages 430-437, 9, 2018. IEEE.
Examining the Impact of Feature Selection on Classification of User Reviews in Web Pages [link]Website  doi  abstract   bibtex   
The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an appropriate classification method using features that are derived from data. However, some features can be either redundant or irrelevant for this model. In this study, an imbalanced dataset including 47 shallow text features obtained from web pages is utilized for extracting of the user reviews. Then, various well-known feature selection techniques are applied to reduce the number of these features. The effects of this reduction on the classification methods are also examined. The experimental results indicate that approximately half of the features are sufficient for the classification task. Additionally, the AdaBoost classifier gives the best results concerning precision of about 0.930 for the review layout prediction.
@inproceedings{
 title = {Examining the Impact of Feature Selection on Classification of User Reviews in Web Pages},
 type = {inproceedings},
 year = {2018},
 keywords = {classification methods,feature selection,imbalanced dataset,review layout detection,web data extraction},
 pages = {430-437},
 websites = {https://ieeexplore.ieee.org/document/8620774/},
 month = {9},
 publisher = {IEEE},
 city = {Malatya, Turkey},
 id = {122b4e2c-e9ef-326e-9cee-7684cbb746b8},
 created = {2019-01-17T06:52:15.704Z},
 file_attached = {false},
 profile_id = {37fa15c3-e5d0-3212-8e18-e4c72814fd47},
 last_modified = {2022-04-08T18:45:10.149Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Uzun2018},
 private_publication = {false},
 abstract = {The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an appropriate classification method using features that are derived from data. However, some features can be either redundant or irrelevant for this model. In this study, an imbalanced dataset including 47 shallow text features obtained from web pages is utilized for extracting of the user reviews. Then, various well-known feature selection techniques are applied to reduce the number of these features. The effects of this reduction on the classification methods are also examined. The experimental results indicate that approximately half of the features are sufficient for the classification task. Additionally, the AdaBoost classifier gives the best results concerning precision of about 0.930 for the review layout prediction.},
 bibtype = {inproceedings},
 author = {Uzun, Erdinç and Özhan, Erkan},
 doi = {10.1109/IDAP.2018.8620774},
 booktitle = {2018 International Conference on Artificial Intelligence and Data Processing, IDAP 2018}
}

Downloads: 0