A survey on data preprocessing for data stream mining: Current status and future directions. Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., & Herrera, F. Neurocomputing, 239:39–57, May, 2017.
A survey on data preprocessing for data stream mining: Current status and future directions [link]Paper  doi  abstract   bibtex   
Data preprocessing and reduction have become essential techniques in current knowledge discovery scenarios, dominated by increasingly large datasets. These methods aim at reducing the complexity inherent to real-world datasets, so that they can be easily processed by current data mining solutions. Advantages of such approaches include, among others, a faster and more precise learning process, and more understandable structure of raw data. However, in the context of data preprocessing techniques for data streams have a long road ahead of them, despite online learning is growing in importance thanks to the development of Internet and technologies for massive data collection. Throughout this survey, we summarize, categorize and analyze those contributions on data preprocessing that cope with streaming data. This work also takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization). To enrich our study, we conduct thorough experiments using the most relevant contributions and present an analysis of their predictive performance, reduction rates, computational time, and memory usage. Finally, we offer general advices about existing data stream preprocessing algorithms, as well as discuss emerging future challenges to be faced in the domain of data stream preprocessing.
@article{ramirez-gallego_survey_2017,
	title = {A survey on data preprocessing for data stream mining: {Current} status and future directions},
	volume = {239},
	issn = {0925-2312},
	shorttitle = {A survey on data preprocessing for data stream mining},
	url = {http://www.sciencedirect.com/science/article/pii/S0925231217302631},
	doi = {10.1016/j.neucom.2017.01.078},
	abstract = {Data preprocessing and reduction have become essential techniques in current knowledge discovery scenarios, dominated by increasingly large datasets. These methods aim at reducing the complexity inherent to real-world datasets, so that they can be easily processed by current data mining solutions. Advantages of such approaches include, among others, a faster and more precise learning process, and more understandable structure of raw data. However, in the context of data preprocessing techniques for data streams have a long road ahead of them, despite online learning is growing in importance thanks to the development of Internet and technologies for massive data collection. Throughout this survey, we summarize, categorize and analyze those contributions on data preprocessing that cope with streaming data. This work also takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization). To enrich our study, we conduct thorough experiments using the most relevant contributions and present an analysis of their predictive performance, reduction rates, computational time, and memory usage. Finally, we offer general advices about existing data stream preprocessing algorithms, as well as discuss emerging future challenges to be faced in the domain of data stream preprocessing.},
	language = {en},
	urldate = {2020-12-12},
	journal = {Neurocomputing},
	author = {Ramírez-Gallego, Sergio and Krawczyk, Bartosz and García, Salvador and Woźniak, Michał and Herrera, Francisco},
	month = may,
	year = {2017},
	keywords = {Concept drift, Data discretization, Data mining, Data preprocessing, Data reduction, Data stream, Feature selection, Instance selection, Online learning},
	pages = {39--57},
}

Downloads: 0