Skip to main content
Log in

RDPOD: an unsupervised approach for outlier detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Outliers are the data points which deviate significantly from the majority of the data points. Finding outliers is an important task in various applications, especially in data mining. The unsupervised technique is very popular to mine outliers in a dataset over supervised techniques. Various unsupervised approaches have been proposed over the last decades. Clustering-based, distance-based, and density-based outlier approaches are found to be successful for detecting outlier points. However, the main focus of clustering-based method is to identifying clustering structure. Many distance-based and density-based techniques are not suitable for varying density datasets, and they are also very sensitive with their parameter (number of nearest-neighbor (k)). In this paper, we propose a hybrid approach named RDPOD, which utilizes distance-based and density-based clustering approaches efficiently for identifying the density of each point correctly. We obtain local density and relative distance of each data instance. From this density and distance information, we identify outlier points. Experimental results with real-world datasets show that our proposed approach outperforms the popular techniques LOF, LDOF, symmetric neighborhood, and recently introduced approaches NOF and RDOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets used during this work are available in the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets.php) and ODDS outlier detection dataset repository (http://odds.cs.stonybrook.edu/).

References

  1. Ahmed M, Mahmood AN, Islam MR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288

    Article  Google Scholar 

  2. Alshawabkeh M, Jang B, Kaeli D (2010) Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pp 104–110

  3. Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM Sigmod Rec 28(2):49–60

    Article  Google Scholar 

  4. Auskalnis J, Paulauskas N, Baskys A (2018) Application of local outlier factor algorithm to detect anomalies in computer network. Elektron Elektrotechn 24(3):96–99

    Google Scholar 

  5. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104

  6. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58

    Article  Google Scholar 

  7. Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145

    Article  Google Scholar 

  8. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the SIAM international conference on data mining, pp 47–58

  9. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231

  10. Gao J, Hu W, Zhang ZM, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 270–283

  11. He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650

    Article  Google Scholar 

  12. Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  Google Scholar 

  13. Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77

    Article  Google Scholar 

  14. Hubballi N, Patra BK, Nandi S (2011) Ndot: nearest neighbor distance based outlier detection technique. In: Proceedings of the international conference on pattern recognition and machine intelligence, pp 36–42

  15. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  16. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  17. Jiang J, Chen Y, Meng X, Wang L, Li K (2019) A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys A 523:702–713

    Article  Google Scholar 

  18. Jin W, Tung AK, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 577–593

  19. Kotu V, Deshpande B (2018) Data science: concepts and practice. Morgan Kaufmann, Burlington

  20. Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: Proceedings of the international workshop on machine learning and data mining in pattern recognition, pp 61–75

  21. Li L, Zhang H, Peng H, Yang Y (2018) Nearest neighbors based density peaks approach to intrusion detection. Chaos Solitons Fractals 110:33–40

    Article  MathSciNet  Google Scholar 

  22. Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: Fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering, pp 315–326

  23. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  24. Schubert E, Zimek A, Kriegel HP (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the SIAM international conference on data mining, pp 542–550

  25. Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180

    Article  Google Scholar 

  26. Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 535–548

  27. Tong B (2019) Density peak clustering algorithm based on the nearest neighbor. In: Proceeding of the 3rd international conference on mechatronics engineering and information technology, pp 665–670

  28. Tripathi D, Sharma Y, Lone T, Dwivedi S (2018) Credit card fraud detection using local outlier factor. Int J Pure Appl Math 118(7):229–234

    Google Scholar 

  29. Wang L, Li M, Han X, Zhou R, Zheng K, Liu M (2018) Improved density peak clustering algorithm based on choosing strategy automatically for cut-off distance and cluster centre. Tehn Vjesnik 25(2):536–545

    Google Scholar 

  30. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193

    Article  MathSciNet  Google Scholar 

  31. Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 813–822

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bidyut Kr. Patra.

Ethics declarations

Conflict of interest

All authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abhaya, A., Patra, B.K. RDPOD: an unsupervised approach for outlier detection. Neural Comput & Applic 34, 1065–1077 (2022). https://doi.org/10.1007/s00521-021-06432-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06432-6

Keywords

Navigation