Abstract
Outlier detection is an essential task in data mining applications which include, military surveillance, tax fraud detection, telecommunication, etc. In recent years, outlier detection received significant attention compared to other problem of discoveries. The focus on this has resulted in the growth of several outlier detection algorithms, mostly concerning the strategy based on distance or density. However, each strategy has intrinsic weaknesses. The distance-based techniques have the problem of local density, while the density-based method is recognized as having an issue of a low-density pattern. Also, most of the existing outlier detection algorithms have a parameter selection problem, which leads to poor detection results. In this article, we present an unsupervised density-based outlier detection algorithm to deal with these shortcomings. The proposed algorithm uses a Natural Neighbour (NaN) concept, to obtain a parameter called Natural Value (NV) adaptively, and a Weighted Kernel Density Estimation (WKDE) method to estimate the density at the location of an object. Besides, our proposed algorithm employed two different categories of nearest neighbours, k Nearest Neighbours (kNN), and Reverse Nearest Neighbours (RNN), which make our system flexible in modelling different data patterns. A Gaussian kernel function is adopted to achieve smoothness in the measure. Further, we use an adaptive kernel width concept to enhance the discrimination power between normal and outlier samples. The formal analysis and extensive experiments carried out on both artificial and real datasets demonstrate that this technique can achieve better outlier detection performance.
Similar content being viewed by others
References
Gladitz J, Barnett V, Lewis T (1988) Outliers in statistical data. Biom J 30(7):866–867 (john wiley & sons, chi-chester–new york–brisbane–toronto–singapore, 1984, xiv, 463 s., 26 abb.,£ 29.95, isbn 0471905070)
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Ramotsoela D, Abu-Mahfouz A, Hancke G (2018) A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 18(8):2491
Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Proc Soc Behav Sci 62:989–994
Andrysiak T (2020) Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms. Neural Comput Appl 32(5):1269–1285
Denning DE (1987) An intrusion-detection model. IEEE Trans Softw Eng SE-13(2):222–232
Wang B, Mao Z (2020) Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput Appl 32(12):8047–8063
Ngai EW, Hu Y, Wong YH, Chen Y, Sun X (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50(3):559–569
Chan KY, Kwong C, Fogarty TC (2010) Modeling manufacturing processes using a genetic programming-based fuzzy regression with detection of outliers. Inf Sci 180(4):506–518
Barnett V, Lewis T (1974) Outliers in statistical data. Wiley, Chichester
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM sigmod record, Vol. 29, ACM, pp 93–104
Schubert E, Zimek A, Kriegel H-P (2014) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on data mining, SIAM, pp 542–550
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180
Vázquez FI, Zseby T, Zimek A (2018) Outlier detection based on low density models. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 970–979
Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77
Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237
Zhu Q, Feng J, Huang J (2016) Natural neighbor: a self-adaptive neighborhood method without parameter k. Pattern Recognit Lett 80:30–36
Tang J, Chen Z, Fu AW-C, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 535–548
Jin W, Tung AK, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 577–593
Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, pp 61–75
Gao J, Hu W, Zhang ZM, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Pacific-Asia Conference on knowledge discovery and data mining, Springer, pp 270–283
Li J-B, Pan J-S, Lu Z-M (2009) Kernel optimization-based discriminant analysis for face recognition. Neural Comput Appl 18(6):603–612
Pan J-S, Li J-B, Lu Z-M (2008) Adaptive quasiconformal kernel discriminant analysis. Neurocomputing 71(13–15):2754–2760
Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517
Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63
Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Boca Raton
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod record, Vol. 29, ACM, pp. 427–438
Hautamaki V, Karkkainen I, Franti P (2004) Outlier detection using k-nearest neighbour graph. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 3, IEEE, pp 430–433
Ha J, Seok S, Lee J-S (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23
Kriegel H-P, Kroger P, Schubert E, Zimek A (2011) Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SIAM, pp 13–24
Lee J-S, Olafsson S (2013) A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci 232:208–224
Acknowledgements
We would like to show our gratitude to Professor Jin Ningour from Department of Computer Science and Engineering, University of Electronic Science and Technology of China, China, and Professor Jinlong Huang from Chongqing Key Lab of Software Theory and Technology, College of Computer Science, Chongqing University, China, for providing some artificial datasets for this research.
We would also like to show our gratitude and thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, India, for providing the facility and support for this research work. The authors would like to thank the associate editor and anonymous referees for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We hereby declare that we have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wahid, A., Annavarapu, C.S.R. NaNOD: A natural neighbour-based outlier detection algorithm. Neural Comput & Applic 33, 2107–2123 (2021). https://doi.org/10.1007/s00521-020-05068-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05068-2