Abstract
Outlier detection is a hot research topic in data mining, and its requirements for algorithms to engage with various complex-shaped datasets more effectively are also increasing. This paper conducts in-depth research on the existing problems, which focuses on the low-density pattern and the local outliers detection of the outlier detection algorithms. In order to resolve these problems, We present a double-weighted outlier detection (DDW) algorithm considering the dense direction, which simultaneously considers the distance and orientation relationship of the neighborhood distribution. In DDW, we first propose a concept of dense direction, which moves the research object of the algorithm from a point to a region to explore the relationship between the data points and the distribution of their neighbors more comprehensively. Then, we design a new point weighting strategy by exploring the point distribution of the neighborhood indicated by the dense directions of different data points and design a new edge weighting strategy where we give the edge weights to the edges between data points and their neighbors to better represent the closeness of data points. After that, we design a new double-weighted method that further actualizes the complementary advantages of the point weighting strategy and the edge weighting strategy to solve the problem that the existing outlier detection algorithms cannot fully characterize the potential structural information inside the data. The final comprehensive experiment shows that our proposed method not only eliminates the defect that traditional outlier detection algorithms are sensitive to neighboring parameters but our proposal also has higher detection accuracy of local outlier detection than many current methods on both synthetic and UCI datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Gao X, Yu J, Zha S, Fu S, Xue B, Ye P, Huang Z, Zhang G (2022) An ensemble-based outlier detection method for clustered and local outliers with differential potential spread loss. Knowledge-Based Systems 110003
Hawkins D (1980) Identification of outliers. Chapman and Hall
Mandhare HC, Idate SR (2017) A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques. In: 2017 international conference on intelligent computing and control systems (ICICCS). pp 931–935. https://doi.org/10.1109/ICCONS.2017.8250601
Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recogn 74:406–421. https://doi.org/10.1016/j.patcog.2017.09.037
Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180. https://doi.org/10.1016/j.neucom.2017.02.039
Caroline CP, Thomas GS (2001) An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning. In: Intelligence in big data technologies—beyond the hype. pp 125–135
Yang Y, Fan CJ, Chen L, Xiong HL (2002) IPMOD: An efficient outlier detection model for high-dimensional medical data streams. Expert Syst Appl 191:116212. https://doi.org/10.1016/j.eswa.2021.116212
Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput 76:505–516. https://doi.org/10.1016/j.asoc.2018.12.029
Lu S, He T, Zhou Q, Wen J, Liu Y, Zhang M(2020) Research on a distribution-outlier detection algorithm based on logistics distribution data. J Phys Confer Ser (6pp) 1624:042002
Li, Z, Zhao Y, Hu X, Botta N, Ionescu C, Chen GH (2022) ECOD: unsupervised outlier detection using empirical cumulative distribution functions. CoRR arXiv:2201.00382
Issac J, Wüthrich M, Cifuentes CG, Bohg J, Trimpe S, Schaal S (2016) Depth-based object tracking using a robust gaussian filter. In: 2016 IEEE international conference on robotics and automation (ICRA). pp 608–615. https://doi.org/10.1109/ICRA.2016.7487184
Dang X, Serfling R (2010) Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. J Stat Plann Inference 140(1):198–213. https://doi.org/10.1016/j.jspi.2009.07.004
Angiulli F, Basta S, Lodi S, Sartori C (2020) Reducing distance computations for distance-based outliers. Expert Syst Appl 147:113215. https://doi.org/10.1016/j.eswa.2020.113215
Knorr E, Ng R (1997) A unified notion of outliers. Properties and computation
Muhr D, Affenzeller M (2022) Little data is often enough for distance-based outlier detection. Proc Comput Sci 200:984–992. https://doi.org/10.1016/j.procs.2022.01.297
Li K, Gao X, Fu S, Diao X, Ye P, Xue P, Yu J, Huang Z (2022) Robust outlier detection based on the changing rate of directed density rati. Expert Syst Appl 207:117988. https://doi.org/10.1016/j.eswa.2022.117988
Ranjan Gaurav K, Prusty Rajanarayan B (2022) A detailed analysis of adaptive kernel density-based outlier detection in volatile time series. In: Machine learning, advances in computing, renewable energy and communication. pp 359–369
Breunig M et al (2000) Lof: identifying density-based local outliers. ACM Sigmod Record
Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63. https://doi.org/10.1016/j.knosys.2017.10.009
Degirmenci A, Karal O (2022) Efficient density and cluster based incremental outlier detection in data streams. Inf Sci 607:901–920. https://doi.org/10.1016/j.ins.2022.06.013
Beulah J, Rene, Nalini M, Irene D, Shiny, Punithavathani D, Shalini (2022) Enhancing detection of R2L attacks by multistage clustering based outlier detection, wireless personal communications
Lazhar F (2018) Fuzzy clustering-based semi-supervised approach for outlier detection in big text data. Prog Artif Intell 8(6)
Xiong Z, Gao Q, Gao Q, Zhang Y, Li L, Zhang M (2022) ADD: a new average divergence difference-based outlier detection method with skewed distribution of data objects. Appl Intell 52:5100–5124. https://doi.org/10.1007/s10489-021-02399-y
Nozad SAN, Haeri MA, Folino G (2021) SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowl-Based Syst 228:107256. https://doi.org/10.1016/j.knosys.2021.107256
Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121(2017):32–40. https://doi.org/10.1016/j.knosys.2017.01.013
Dashdondov K, Kim MH (2021) Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction. Neural Process Lett. https://doi.org/10.1007/s11063-021-10663-y
Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recogn 74:406–421. https://doi.org/10.1016/j.patcog.2017.09.037
Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. Knowledge discovery and data mining, 535–548
Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. Advances in Knowledge Discovery and Data Mining. Springer, Berlin Heidelberg, pp 577–593
Tang B, He B (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180. https://doi.org/10.1016/j.neucom.2017.02.039
Lin CH, Hsu KC, Johnson KR, Luby M, Fann YC (2019) Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. Int J Med Informatics 132:103988. https://doi.org/10.1016/j.ijmedinf.2019.103988
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23. https://doi.org/10.1016/j.knosys.2014.03.001
Zhang S, Wan J (2018) Weight-based method for inside outlier detection. Optik 154:145–156. https://doi.org/10.1016/j.ijleo.2017.09.116
Zhu Q, Feng J, Huang J (2016) Natural neighbor: A self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36. https://doi.org/10.1016/j.patrec.2016.05.007
Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77. https://doi.org/10.1016/j.knosys.2015.10.014
Bentley J (1975) Multidimensional binary search trees used for associated searching. Commun ACM 18(9):509–517
Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23. https://doi.org/10.1016/j.knosys.2014.03.001
Wang X, Wang X, Wilkes M (2021) A k-nearest neighbor centroid-based outlier detection method. New Dev Unsupervised Outlier Detection 4:71–112. https://doi.org/10.1007/978-981-15-9519-6-4
Jouan-Rimbaud D, Bouveresse E, Massart D, de Noord O (1999) Detection of prediction outliers and inliers in multivariate calibration. Anal Chim Acta 388(3):283–301. https://doi.org/10.1016/S0003-2670(98)00626-6
Xi J (2008) Outlier detection algorithms in data mining. In: 2008 2nd international symposium on intelligent information technology application, vol 1. pp 94–97. https://doi.org/10.1109/IITA.2008.26
Wang C, Liu Z, Gao H, Fu Y (2019) Vos: A new outlier detection model using virtual graph. Knowl-Based Syst 185:104907. https://doi.org/10.1016/j.knosys.2019.104907
Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331. https://doi.org/10.1016/j.knosys.2019.105331
Ha J, Seok S, Lee JS (2015) A precise ranking method for outlier detection. Inf Sci 324:88–107. https://doi.org/10.1016/j.ins.2015.06.030
Pai HT, Wu F, Hsueh PYSS (2014) A relative patterns discovery for enhancing outlier detection in categorical data. Decis Support Syst 67:90–99. https://doi.org/10.1016/j.dss.2014.08.006
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions. This work is funded by the National Natural Science Foundation of China (no. 61701051), Fundamental Research Funds for Central Universities (no. 2019CDCGJSJ329) and the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant no.CYS20067).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The author(s) declare no potential conflicts of interest with respect to the research, authorship and/or publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Q., Gao, QQ., Xiong, ZY. et al. A double-weighted outlier detection algorithm considering the neighborhood orientation distribution of data objects. Appl Intell 53, 21961–21983 (2023). https://doi.org/10.1007/s10489-023-04593-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04593-6