Efficient handling of concept drift and concept evolution over Stream Data

Efficient handling of concept drift and concept evolution over Stream Data. Haque, A., Khan, L., Baron, M., Thuraisingham, B., & Aggarwal, C. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pages 481–492, May, 2016.
doi abstract bibtex

To decide if an update to a data stream classifier is necessary, existing sliding window based techniques monitor classifier performance on recent instances. If there is a significant change in classifier performance, these approaches determine a chunk boundary, and update the classifier. However, monitoring classifier performance is costly due to scarcity of labeled data. In our previous work, we presented a semi-supervised framework SAND, which uses change detection on classifier confidence to detect a concept drift. Unlike most approaches, it requires only a limited amount of labeled data to detect chunk boundaries and to update the classifier. However, SAND is expensive in terms of execution time due to exhaustive invocation of the change detection module. In this paper, we present an efficient framework, which is based on the same principle as SAND, but exploits dynamic programming and executes the change detection module selectively. Moreover, we provide theoretical justification of the confidence calculation, and show effect of a concept drift on subsequent confidence scores. Experiment results show efficiency of the proposed framework in terms of both accuracy and execution time.

@inproceedings{haque_efficient_2016,
	title = {Efficient handling of concept drift and concept evolution over {Stream} {Data}},
	doi = {10.1109/ICDE.2016.7498264},
	abstract = {To decide if an update to a data stream classifier is necessary, existing sliding window based techniques monitor classifier performance on recent instances. If there is a significant change in classifier performance, these approaches determine a chunk boundary, and update the classifier. However, monitoring classifier performance is costly due to scarcity of labeled data. In our previous work, we presented a semi-supervised framework SAND, which uses change detection on classifier confidence to detect a concept drift. Unlike most approaches, it requires only a limited amount of labeled data to detect chunk boundaries and to update the classifier. However, SAND is expensive in terms of execution time due to exhaustive invocation of the change detection module. In this paper, we present an efficient framework, which is based on the same principle as SAND, but exploits dynamic programming and executes the change detection module selectively. Moreover, we provide theoretical justification of the confidence calculation, and show effect of a concept drift on subsequent confidence scores. Experiment results show efficiency of the proposed framework in terms of both accuracy and execution time.},
	booktitle = {2016 {IEEE} 32nd {International} {Conference} on {Data} {Engineering} ({ICDE})},
	author = {Haque, Ahsanul and Khan, Latifur and Baron, Michael and Thuraisingham, Bhavani and Aggarwal, Charu},
	month = may,
	year = {2016},
	keywords = {Classifier Confidence, Concept Drift, Data mining, Data models, Dynamic Chunk, Dynamic programming, Electronic mail, Error analysis, Labeling, Training data},
	pages = {481--492},
}

Downloads: 0

{"_id":"s3Cjm3fBfZkmck4xb","bibbaseid":"haque-khan-baron-thuraisingham-aggarwal-efficienthandlingofconceptdriftandconceptevolutionoverstreamdata-2016","author_short":["Haque, A.","Khan, L.","Baron, M.","Thuraisingham, B.","Aggarwal, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"Efficient handling of concept drift and concept evolution over Stream Data","doi":"10.1109/ICDE.2016.7498264","abstract":"To decide if an update to a data stream classifier is necessary, existing sliding window based techniques monitor classifier performance on recent instances. If there is a significant change in classifier performance, these approaches determine a chunk boundary, and update the classifier. However, monitoring classifier performance is costly due to scarcity of labeled data. In our previous work, we presented a semi-supervised framework SAND, which uses change detection on classifier confidence to detect a concept drift. Unlike most approaches, it requires only a limited amount of labeled data to detect chunk boundaries and to update the classifier. However, SAND is expensive in terms of execution time due to exhaustive invocation of the change detection module. In this paper, we present an efficient framework, which is based on the same principle as SAND, but exploits dynamic programming and executes the change detection module selectively. Moreover, we provide theoretical justification of the confidence calculation, and show effect of a concept drift on subsequent confidence scores. Experiment results show efficiency of the proposed framework in terms of both accuracy and execution time.","booktitle":"2016 IEEE 32nd International Conference on Data Engineering (ICDE)","author":[{"propositions":[],"lastnames":["Haque"],"firstnames":["Ahsanul"],"suffixes":[]},{"propositions":[],"lastnames":["Khan"],"firstnames":["Latifur"],"suffixes":[]},{"propositions":[],"lastnames":["Baron"],"firstnames":["Michael"],"suffixes":[]},{"propositions":[],"lastnames":["Thuraisingham"],"firstnames":["Bhavani"],"suffixes":[]},{"propositions":[],"lastnames":["Aggarwal"],"firstnames":["Charu"],"suffixes":[]}],"month":"May","year":"2016","keywords":"Classifier Confidence, Concept Drift, Data mining, Data models, Dynamic Chunk, Dynamic programming, Electronic mail, Error analysis, Labeling, Training data","pages":"481–492","bibtex":"@inproceedings{haque_efficient_2016,\n\ttitle = {Efficient handling of concept drift and concept evolution over {Stream} {Data}},\n\tdoi = {10.1109/ICDE.2016.7498264},\n\tabstract = {To decide if an update to a data stream classifier is necessary, existing sliding window based techniques monitor classifier performance on recent instances. If there is a significant change in classifier performance, these approaches determine a chunk boundary, and update the classifier. However, monitoring classifier performance is costly due to scarcity of labeled data. In our previous work, we presented a semi-supervised framework SAND, which uses change detection on classifier confidence to detect a concept drift. Unlike most approaches, it requires only a limited amount of labeled data to detect chunk boundaries and to update the classifier. However, SAND is expensive in terms of execution time due to exhaustive invocation of the change detection module. In this paper, we present an efficient framework, which is based on the same principle as SAND, but exploits dynamic programming and executes the change detection module selectively. Moreover, we provide theoretical justification of the confidence calculation, and show effect of a concept drift on subsequent confidence scores. Experiment results show efficiency of the proposed framework in terms of both accuracy and execution time.},\n\tbooktitle = {2016 {IEEE} 32nd {International} {Conference} on {Data} {Engineering} ({ICDE})},\n\tauthor = {Haque, Ahsanul and Khan, Latifur and Baron, Michael and Thuraisingham, Bhavani and Aggarwal, Charu},\n\tmonth = may,\n\tyear = {2016},\n\tkeywords = {Classifier Confidence, Concept Drift, Data mining, Data models, Dynamic Chunk, Dynamic programming, Electronic mail, Error analysis, Labeling, Training data},\n\tpages = {481--492},\n}\n\n\n\n","author_short":["Haque, A.","Khan, L.","Baron, M.","Thuraisingham, B.","Aggarwal, C."],"key":"haque_efficient_2016","id":"haque_efficient_2016","bibbaseid":"haque-khan-baron-thuraisingham-aggarwal-efficienthandlingofconceptdriftandconceptevolutionoverstreamdata-2016","role":"author","urls":{},"keyword":["Classifier Confidence","Concept Drift","Data mining","Data models","Dynamic Chunk","Dynamic programming","Electronic mail","Error analysis","Labeling","Training data"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/mh_lenguyen","dataSources":["iwKepCrWBps7ojhDx"],"keywords":["classifier confidence","concept drift","data mining","data models","dynamic chunk","dynamic programming","electronic mail","error analysis","labeling","training data"],"search_terms":["efficient","handling","concept","drift","concept","evolution","over","stream","data","haque","khan","baron","thuraisingham","aggarwal"],"title":"Efficient handling of concept drift and concept evolution over Stream Data","year":2016}