Abstract
Outliers are the data points which deviate significantly from the majority of the data points. Finding outliers is an important task in various applications, especially in data mining. The unsupervised technique is very popular to mine outliers in a dataset over supervised techniques. Various unsupervised approaches have been proposed over the last decades. Clustering-based, distance-based, and density-based outlier approaches are found to be successful for detecting outlier points. However, the main focus of clustering-based method is to identifying clustering structure. Many distance-based and density-based techniques are not suitable for varying density datasets, and they are also very sensitive with their parameter (number of nearest-neighbor (k)). In this paper, we propose a hybrid approach named RDPOD, which utilizes distance-based and density-based clustering approaches efficiently for identifying the density of each point correctly. We obtain local density and relative distance of each data instance. From this density and distance information, we identify outlier points. Experimental results with real-world datasets show that our proposed approach outperforms the popular techniques LOF, LDOF, symmetric neighborhood, and recently introduced approaches NOF and RDOS.
Similar content being viewed by others
Data availability
The datasets used during this work are available in the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets.php) and ODDS outlier detection dataset repository (http://odds.cs.stonybrook.edu/).
References
Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288
Alshawabkeh M, Jang B, Kaeli D (2010) Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pp 104–110
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60
Auskalnis J, Paulauskas N, Baskys A (2018) Application of local outlier factor algorithm to detect anomalies in computer network. Elektron Elektrotechn 24(3):96–99
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the SIAM international conference on data mining, pp 47–58
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Gao J, Hu W, Zhang ZM, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 270–283
He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77
Hubballi N, Patra BK, Nandi S (2011) Ndot: nearest neighbor distance based outlier detection technique. In: Proceedings of the international conference on pattern recognition and machine intelligence, pp 36–42
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Jiang J, Chen Y, Meng X, Wang L, Li K (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys A 523:702–713
Jin W, Tung AK, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 577–593
Kotu V, Deshpande B (2018) Data science: concepts and practice. Morgan Kaufmann, Burlington
Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: Proceedings of the international workshop on machine learning and data mining in pattern recognition, pp 61–75
Li L, Zhang H, Peng H, Yang Y (2018) Nearest neighbors based density peaks approach to intrusion detection. Chaos Solitons Fractals 110:33–40
Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering, pp 315–326
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Schubert E, Zimek A, Kriegel HP (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM international conference on data mining, pp 542–550
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 535–548
Tong B (2019) Density peak clustering algorithm based on the nearest neighbor. In: Proceeding of the 3rd international conference on mechatronics engineering and information technology, pp 665–670
Tripathi D, Sharma Y, Lone T, Dwivedi S (2018) Credit card fraud detection using local outlier factor. Int J Pure Appl Math 118(7):229–234
Wang L, Li M, Han X, Zhou R, Zheng K, Liu M (2018) Improved density peak clustering algorithm based on choosing strategy automatically for cut-off distance and cluster centre. Tehn Vjesnik 25(2):536–545
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193
Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 813–822
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declared that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abhaya, A., Patra, B.K. RDPOD: an unsupervised approach for outlier detection. Neural Comput & Applic 34, 1065–1077 (2022). https://doi.org/10.1007/s00521-021-06432-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06432-6