On-line active learning: A new paradigm to improve practical useability of data stream modeling methods. Lughofer, E. Information Sciences, 415-416:356–376, November, 2017.
On-line active learning: A new paradigm to improve practical useability of data stream modeling methods [link]Paper  doi  abstract   bibtex   
The central purpose of this survey is to provide readers an insight into the recent advances and challenges in on-line active learning. Active learning has attracted the data mining and machine learning community since around 20 years. This is because it served for important purposes to increase practical applicability of machine learning techniques, such as (i) to reduce annotation and measurement costs for operators and measurement equipments, (ii) to reduce manual labeling effort for experts and (iii) to reduce computation time for model training. Almost all of the current techniques focus on the classical pool-based approach, which is off-line by nature as iterating over a pool of (unlabeled) reference samples a multiple times to choose the most promising ones for improving the performance of the classifiers. This is achieved by (time-intensive) re-training cycles on all labeled samples available so far. For the on-line, stream mining case, the challenge is that the sample selection strategy has to operate in a fast, ideally single-pass manner. Some first approaches have been proposed during the last decade (starting from around 2005) with the usage of machine learning (ML) oriented incremental classifiers, which are able to update their parameters based on selected samples, but not their structures. Since 2012, on-line active learning concepts have been proposed in connection with the paradigm of evolving models, which are able to expand their knowledge into feature space regions so far unexplored. This opened the possibility to address a particular type of uncertainty, namely that one which stems from a significant novelty content in streams, as, e.g., caused by drifts, new operation modes, changing system behaviors or non-stationary environments. We will provide an overview about the concepts and techniques for sample selection and active learning within these two principal major research lines (incremental ML models versus evolving systems), a comparison of their essential characteristics and properties (raising some advantages and disadvantages), and a study on possible evaluation techniques for them. We conclude with an overview of real-world application examples where various on-line AL approaches have been already successfully applied in order to significantly reduce user’s interaction efforts and costs for model updates.
@article{lughofer_-line_2017,
	title = {On-line active learning: {A} new paradigm to improve practical useability of data stream modeling methods},
	volume = {415-416},
	issn = {0020-0255},
	shorttitle = {On-line active learning},
	url = {https://www.sciencedirect.com/science/article/pii/S0020025517308083},
	doi = {10.1016/j.ins.2017.06.038},
	abstract = {The central purpose of this survey is to provide readers an insight into the recent advances and challenges in on-line active learning. Active learning has attracted the data mining and machine learning community since around 20 years. This is because it served for important purposes to increase practical applicability of machine learning techniques, such as (i) to reduce annotation and measurement costs for operators and measurement equipments, (ii) to reduce manual labeling effort for experts and (iii) to reduce computation time for model training. Almost all of the current techniques focus on the classical pool-based approach, which is off-line by nature as iterating over a pool of (unlabeled) reference samples a multiple times to choose the most promising ones for improving the performance of the classifiers. This is achieved by (time-intensive) re-training cycles on all labeled samples available so far. For the on-line, stream mining case, the challenge is that the sample selection strategy has to operate in a fast, ideally single-pass manner. Some first approaches have been proposed during the last decade (starting from around 2005) with the usage of machine learning (ML) oriented incremental classifiers, which are able to update their parameters based on selected samples, but not their structures. Since 2012, on-line active learning concepts have been proposed in connection with the paradigm of evolving models, which are able to expand their knowledge into feature space regions so far unexplored. This opened the possibility to address a particular type of uncertainty, namely that one which stems from a significant novelty content in streams, as, e.g., caused by drifts, new operation modes, changing system behaviors or non-stationary environments. We will provide an overview about the concepts and techniques for sample selection and active learning within these two principal major research lines (incremental ML models versus evolving systems), a comparison of their essential characteristics and properties (raising some advantages and disadvantages), and a study on possible evaluation techniques for them. We conclude with an overview of real-world application examples where various on-line AL approaches have been already successfully applied in order to significantly reduce user’s interaction efforts and costs for model updates.},
	language = {en},
	urldate = {2021-10-18},
	journal = {Information Sciences},
	author = {Lughofer, Edwin},
	month = nov,
	year = {2017},
	keywords = {Data stream mining, Evolving models, Incremental ML and DM methods, Interaction effort and cost reduction, On-line active learning, Single-pass sample selection, Uncertainty and novelty in streams},
	pages = {356--376},
}

Downloads: 0