Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams. Korycki, Ł, Cano, A., & Krawczyk, B. In 2019 IEEE International Conference on Big Data (Big Data), pages 2334–2343, December, 2019. doi abstract bibtex Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.
@inproceedings{korycki_active_2019,
title = {Active {Learning} with {Abstaining} {Classifiers} for {Imbalanced} {Drifting} {Data} {Streams}},
doi = {10.1109/BigData47090.2019.9006453},
abstract = {Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.},
booktitle = {2019 {IEEE} {International} {Conference} on {Big} {Data} ({Big} {Data})},
author = {Korycki, Ł and Cano, A. and Krawczyk, B.},
month = dec,
year = {2019},
keywords = {Computer science, Data mining, Data models, Heuristic algorithms, Labeling, Machine learning, Uncertainty, abstaining classifiers, active learning, active learning strategy, binary classification, classification tasks, data stream mining, ensemble learning., imbalanced data, imbalanced drifting data streams, learning (artificial intelligence), machine learning, online data sources, pattern classification, real-time knowledge},
pages = {2334--2343},
}
Downloads: 0
{"_id":"MN3HQoYrDnr8ESmqW","bibbaseid":"korycki-cano-krawczyk-activelearningwithabstainingclassifiersforimbalanceddriftingdatastreams-2019","author_short":["Korycki, Ł","Cano, A.","Krawczyk, B."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams","doi":"10.1109/BigData47090.2019.9006453","abstract":"Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.","booktitle":"2019 IEEE International Conference on Big Data (Big Data)","author":[{"propositions":[],"lastnames":["Korycki"],"firstnames":["Ł"],"suffixes":[]},{"propositions":[],"lastnames":["Cano"],"firstnames":["A."],"suffixes":[]},{"propositions":[],"lastnames":["Krawczyk"],"firstnames":["B."],"suffixes":[]}],"month":"December","year":"2019","keywords":"Computer science, Data mining, Data models, Heuristic algorithms, Labeling, Machine learning, Uncertainty, abstaining classifiers, active learning, active learning strategy, binary classification, classification tasks, data stream mining, ensemble learning., imbalanced data, imbalanced drifting data streams, learning (artificial intelligence), machine learning, online data sources, pattern classification, real-time knowledge","pages":"2334–2343","bibtex":"@inproceedings{korycki_active_2019,\n\ttitle = {Active {Learning} with {Abstaining} {Classifiers} for {Imbalanced} {Drifting} {Data} {Streams}},\n\tdoi = {10.1109/BigData47090.2019.9006453},\n\tabstract = {Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Prohibitive time and memory constraints or non-stationary distributions are only some of the problems. When dealing with classification tasks, one has to remember that effective adaptation has to be achieved on weak foundations of partially labeled and often imbalanced data. In our work, we propose an online framework for binary classification, that aims to handle the complex problem of working with dynamic, sparsely labeled and imbalanced streams. The main part of it is a novel active learning strategy (MD-OAL) that is able to prioritize labeling of minority instances and, as a result, improve the balance of the learning process. We combine the strategy with a dynamic ensemble of base learners that can abstain from making decisions, if they are very uncertain. We adjust the abstaining mechanism in favor of minority instances, providing an effective method for handling remaining imbalance and a concept drift simultaneously. The conducted evaluation shows that in the challenging and realistic scenarios our framework outperforms state-of-the-art algorithms, providing higher resilience to the combined effect of limited labeling and imbalance.},\n\tbooktitle = {2019 {IEEE} {International} {Conference} on {Big} {Data} ({Big} {Data})},\n\tauthor = {Korycki, Ł and Cano, A. and Krawczyk, B.},\n\tmonth = dec,\n\tyear = {2019},\n\tkeywords = {Computer science, Data mining, Data models, Heuristic algorithms, Labeling, Machine learning, Uncertainty, abstaining classifiers, active learning, active learning strategy, binary classification, classification tasks, data stream mining, ensemble learning., imbalanced data, imbalanced drifting data streams, learning (artificial intelligence), machine learning, online data sources, pattern classification, real-time knowledge},\n\tpages = {2334--2343},\n}\n\n\n\n","author_short":["Korycki, Ł","Cano, A.","Krawczyk, B."],"key":"korycki_active_2019","id":"korycki_active_2019","bibbaseid":"korycki-cano-krawczyk-activelearningwithabstainingclassifiersforimbalanceddriftingdatastreams-2019","role":"author","urls":{},"keyword":["Computer science","Data mining","Data models","Heuristic algorithms","Labeling","Machine learning","Uncertainty","abstaining classifiers","active learning","active learning strategy","binary classification","classification tasks","data stream mining","ensemble learning.","imbalanced data","imbalanced drifting data streams","learning (artificial intelligence)","machine learning","online data sources","pattern classification","real-time knowledge"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/mh_lenguyen","dataSources":["iwKepCrWBps7ojhDx"],"keywords":["computer science","data mining","data models","heuristic algorithms","labeling","machine learning","uncertainty","abstaining classifiers","active learning","active learning strategy","binary classification","classification tasks","data stream mining","ensemble learning.","imbalanced data","imbalanced drifting data streams","learning (artificial intelligence)","machine learning","online data sources","pattern classification","real-time knowledge"],"search_terms":["active","learning","abstaining","classifiers","imbalanced","drifting","data","streams","korycki","cano","krawczyk"],"title":"Active Learning with Abstaining Classifiers for Imbalanced Drifting Data Streams","year":2019}