ABSTRACT
Distance-based outlier detection techniques is a wide-spread methodology for anomaly detection. Despite their effectiveness, a main limitation is that they heavily rely on the dataset and the parameters chosen in order to establish the right status of each data point. These parameters typically include, but are not limited to, the neighborhood radius and threshold. In continuous streaming environments, the need for real-time analysis does not permit for an algorithm to be restarted multiple times with different parameters until the right combination is specified. This gives rise to the need for one technique that combines an arbitrary number of parameterizations with the use of minimal yet sufficient computer resources. In this work we both compare the state-of-the-art techniques for handling multiple queries in distance-based outlier detection algorithms and we propose a novel technique for multi-parameter distance-based outlier detection tailored to distributed continuous streaming environments, such as Spark and Flink.
- Charu C. Aggarwal. 2013. Outlier Analysis. Springer.Google Scholar
- F. Angiulli and F. Fassetti. 2007. Detecting distance-based outliers in streams of data. In CIKM. 811–820.Google Scholar
- Lei Cao, Jiayuan Wang, and Elke A Rundensteiner. 2016. Sharing-aware outlier analytics over high-volume data streams. In Proceedings of the 2016 International Conference on Management of Data. ACM, 527–540.Google ScholarDigital Library
- Lei Cao, Yizhou Yan, Caitlin Kuhlman, Qingyang Wang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh. 2017. Multi-Tactic Distance-Based Outlier Detection. In ICDE. 959–970.Google Scholar
- Lei Cao, Di Yang, Qingyang Wang, Yanwei Yu, Jiayuan Wang, and Elke A Rundensteiner. 2014. Scalable distance-based outlier detection over high-volume data streams. In ICDE. 76–87.Google Scholar
- Edwin M. Knorr, Raymond T. Ng, and Vladimir Tucakov. 2000. Distance-based Outliers: Algorithms and Applications. The VLDB Journal 8, 3-4 (2000).Google ScholarDigital Library
- Maria Kontaki, Anastasios Gounaris, Apostolos N Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2016. Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Information systems 55(2016), 37–53.Google Scholar
- Theodoros Toliopoulos, Anastasios Gounaris, Kostas Tsichlas, Apostolos Papadopoulos, and Sandra Sampaio. 2018. Parallel Continuous Outlier Mining in Streaming Data. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 227–236.Google Scholar
- Theodoros Toliopoulos, Anastasios Gounaris, Kostas Tsichlas, Apostolos Papadopoulos, and Sandra Sampaio. 2019. Continuous Outlier Mining of Streaming Data in Flink. CoRR abs/1902.07901(2019).Google Scholar
- Luan Tran, Liyue Fan, and Cyrus Shahabi. 2016. Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment 9, 12 (2016), 1089–1100.Google ScholarDigital Library
- D. Yang, E.A. Rundensteiner, and M.O. Ward. 2009. Neighbor-based pattern detection for windows over streaming data. In EDBT. 529–540.Google Scholar
- Peter N Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In SODA, Vol. 93. 311–321.Google Scholar
- Guanzhe Zhao, Yanwei Yu, Peng Song, Geng Zhao, and Zhe Ji. 2018. A Parameter Space Framework for Online Outlier Detection Over High-Volume Data Streams. IEEE Access 6(2018), 38124–38136.Google ScholarCross Ref
Index Terms
- Multi-parameter streaming outlier detection
Recommendations
PROUD: PaRallel OUtlier Detection for Streams
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataWe introduce PROUD, standing for PaRallel OUtlier Detection for streams, which is an extensible engine for continuous multi-parameter parallel distance-based outlier (or anomaly) detection tailored to big data streams. PROUD is built on top of Flink. It ...
Enhancing Outlier Detection by an Outlier Indicator
Machine Learning and Data Mining in Pattern RecognitionAbstractOutlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection ...
A minimum spanning tree-inspired clustering-based outlier detection technique
ICDM'12: Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspectsDue to its important applications in data mining, many techniques have been developed for outlier detection. In this paper, an efficient three-phase outlier detection technique. First, we modify the famous k-means algorithm for an efficient construction ...
Comments