ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering. Li, Y., Li, H., Wang, Z., Liu, B., Cui, J., & Fei, H. IEEE Transactions on Knowledge and Data Engineering, 2020. Conference Name: IEEE Transactions on Knowledge and Data Engineering
doi  abstract   bibtex   2 downloads  
Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
@article{li_esa-stream_2020,
	title = {{ESA}-{Stream}: {Efficient} {Self}-{Adaptive} {Online} {Data} {Stream} {Clustering}},
	issn = {1558-2191},
	shorttitle = {{ESA}-{Stream}},
	doi = {10.1109/TKDE.2020.2990196},
	abstract = {Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.},
	journal = {IEEE Transactions on Knowledge and Data Engineering},
	author = {Li, Yanni and Li, Hui and Wang, Zhi and Liu, Bing and Cui, Jiangtao and Fei, Hang},
	year = {2020},
	note = {Conference Name: IEEE Transactions on Knowledge and Data Engineering},
	keywords = {Clustering algorithms, Clustering methods, Data Stream, Dimensionality reduction, Heuristic algorithms, Indexes, Online Clustering, Partitioning algorithms, Real-time systems, Self-Adaptive},
	pages = {1--1},
}

Downloads: 2