Combining active learning with concept drift detection for data stream mining. Krawczyk, B., Pfahringer, B., & Woźniak, M. In 2018 IEEE International Conference on Big Data (Big Data), pages 2239–2244, December, 2018.
doi  abstract   bibtex   
Most of data stream classifier learning methods assume that a true class of an incoming object is available right after the instance has been processed and new and labeled instance may be used to update a classifier's model, drift detection or capturing novel concepts. However, assumption that we have an unlimited and infinite access to class labels is very naive and usually would require a very high labeling cost. Therefore the applicability of many supervised techniques is limited in real-life stream analytics scenarios. Active learning emerges as a potential solution to this problem, concentrating on selecting only the most valuable instances and learning an accurate predictive model with as few labeling queries as possible. However learning from data streams differ from online learning as distribution of examples may change over time. Therefore, an active learning strategy must be able to handle concept drift and quickly adapt to evolving nature of data. In this paper we present novel active learning strategies that are designed for effective tackling of such changes. We assume that most labeling effort is required when concept drift occurs, as we need a representative sample of new concept to retrain properly the predictive model. Therefore, we propose active learning strategies that are guided by drift detection module to save budget for difficult and evolving instances. Three proposed strategies are based on learner uncertainty, dynamic allocation of budget over time and search space randomization. Experimental evaluation of the proposed methods prove their usefulness for reducing labeling effort in learning from drifting data streams.
@inproceedings{krawczyk_combining_2018,
	title = {Combining active learning with concept drift detection for data stream mining},
	doi = {10.1109/BigData.2018.8622549},
	abstract = {Most of data stream classifier learning methods assume that a true class of an incoming object is available right after the instance has been processed and new and labeled instance may be used to update a classifier's model, drift detection or capturing novel concepts. However, assumption that we have an unlimited and infinite access to class labels is very naive and usually would require a very high labeling cost. Therefore the applicability of many supervised techniques is limited in real-life stream analytics scenarios. Active learning emerges as a potential solution to this problem, concentrating on selecting only the most valuable instances and learning an accurate predictive model with as few labeling queries as possible. However learning from data streams differ from online learning as distribution of examples may change over time. Therefore, an active learning strategy must be able to handle concept drift and quickly adapt to evolving nature of data. In this paper we present novel active learning strategies that are designed for effective tackling of such changes. We assume that most labeling effort is required when concept drift occurs, as we need a representative sample of new concept to retrain properly the predictive model. Therefore, we propose active learning strategies that are guided by drift detection module to save budget for difficult and evolving instances. Three proposed strategies are based on learner uncertainty, dynamic allocation of budget over time and search space randomization. Experimental evaluation of the proposed methods prove their usefulness for reducing labeling effort in learning from drifting data streams.},
	booktitle = {2018 {IEEE} {International} {Conference} on {Big} {Data} ({Big} {Data})},
	author = {Krawczyk, B. and Pfahringer, B. and Woźniak, M.},
	month = dec,
	year = {2018},
	keywords = {Big Data, Data mining, Detectors, Dynamic scheduling, Labeling, Predictive models, Resource management, accurate predictive model, active learning, active learning strategy, class labels, concept drift, concept drift detection, data handling, data mining, data stream classifier learning methods, data stream mining, data streams, drift detection, drift detection module, evolving instances, high labeling cost, labeled instance, labeling effort, labeling queries, learning (artificial intelligence), machine learning, online learning, pattern classification, query processing, valuable instances},
	pages = {2239--2244},
}

Downloads: 0