Robust anomaly detection algorithms for real-time big data: Comparison of algorithms. Hasani, Z. In 2017 6th Mediterranean Conference on Embedded Computing (MECO), pages 1–6, June, 2017.
doi  abstract   bibtex   
Most of the today's world data are streaming, time-series data, where anomalies detection gives significant information of possible critical situations. Yet, detecting anomalies in big streaming data is a difficult task, requiring detectors to acquire and process data in a real-time, as they occur, even before they are stored and instantly alarm on potential threats. Suitable to the need for real-time alarm and unsupervised procedures for massive streaming data anomaly detection, algorithms have to be robust, with low processing time, eventually at the cost of the accuracy. In this work we explore several such fast algorithms like MAD, RunMAD, Boxplot, Twitter ADVec, DBSCAN, Moving Range Technique, Statistical Control Chart Techniques, ARIMA and Moving Average. The algorithms are tested and results are visualized in the system R, on the three Numenta datasets, with known anomalies and own e-dnevnik dataset with unknown anomalies. Evaluation is done by comparing achieved results (the algorithm execution time, CPU usage and the number of anomalies found) with Numenta HTM algorithm that detects all the anomalies in their datasets. Our interest is monitoring of the streaming log data that are generating in the national educational network (e-dnevnk) that acquires a massive number of online queries and to detect anomalies in order to scale up performance, prevent network downs, alarm on possible attacks and similar.
@inproceedings{hasani_robust_2017,
	title = {Robust anomaly detection algorithms for real-time big data: {Comparison} of algorithms},
	shorttitle = {Robust anomaly detection algorithms for real-time big data},
	doi = {10.1109/MECO.2017.7977130},
	abstract = {Most of the today's world data are streaming, time-series data, where anomalies detection gives significant information of possible critical situations. Yet, detecting anomalies in big streaming data is a difficult task, requiring detectors to acquire and process data in a real-time, as they occur, even before they are stored and instantly alarm on potential threats. Suitable to the need for real-time alarm and unsupervised procedures for massive streaming data anomaly detection, algorithms have to be robust, with low processing time, eventually at the cost of the accuracy. In this work we explore several such fast algorithms like MAD, RunMAD, Boxplot, Twitter ADVec, DBSCAN, Moving Range Technique, Statistical Control Chart Techniques, ARIMA and Moving Average. The algorithms are tested and results are visualized in the system R, on the three Numenta datasets, with known anomalies and own e-dnevnik dataset with unknown anomalies. Evaluation is done by comparing achieved results (the algorithm execution time, CPU usage and the number of anomalies found) with Numenta HTM algorithm that detects all the anomalies in their datasets. Our interest is monitoring of the streaming log data that are generating in the national educational network (e-dnevnk) that acquires a massive number of online queries and to detect anomalies in order to scale up performance, prevent network downs, alarm on possible attacks and similar.},
	booktitle = {2017 6th {Mediterranean} {Conference} on {Embedded} {Computing} ({MECO})},
	author = {Hasani, Zirije},
	month = jun,
	year = {2017},
	keywords = {ARIMA, Autoregressive processes, Big Data, Boxplot, CPU usage, Classification algorithms, Clustering algorithms, Control charts, DBSCAN, HTM, Log data, MAD, Moving Average, Moving Range Technique, NuPIC, Numenta dataset, R, R system, Real-time systems, RunMAD, Statistical Control Chart Techniques, Time series analysis, Twitter ADVec, algorithm execution time, anomaly detection, autoregressive moving average processes, component, control charts, data processing, e-dnevnik dataset, educational administrative data processing, known-anomalies, moving average technique, moving range technique, national educational network, network down prevention, online queries, outlier detection, real-time Big Data, real-time alarm procedure, real-time big data, robust anomaly detection algorithm, social networking (online), statistical control chart technique, unknown anomalies, unsupervised learning, unsupervised procedure, visualization},
	pages = {1--6},
}

Downloads: 0