Skip to main content
Log in

A double-weighted outlier detection algorithm considering the neighborhood orientation distribution of data objects

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Outlier detection is a hot research topic in data mining, and its requirements for algorithms to engage with various complex-shaped datasets more effectively are also increasing. This paper conducts in-depth research on the existing problems, which focuses on the low-density pattern and the local outliers detection of the outlier detection algorithms. In order to resolve these problems, We present a double-weighted outlier detection (DDW) algorithm considering the dense direction, which simultaneously considers the distance and orientation relationship of the neighborhood distribution. In DDW, we first propose a concept of dense direction, which moves the research object of the algorithm from a point to a region to explore the relationship between the data points and the distribution of their neighbors more comprehensively. Then, we design a new point weighting strategy by exploring the point distribution of the neighborhood indicated by the dense directions of different data points and design a new edge weighting strategy where we give the edge weights to the edges between data points and their neighbors to better represent the closeness of data points. After that, we design a new double-weighted method that further actualizes the complementary advantages of the point weighting strategy and the edge weighting strategy to solve the problem that the existing outlier detection algorithms cannot fully characterize the potential structural information inside the data. The final comprehensive experiment shows that our proposed method not only eliminates the defect that traditional outlier detection algorithms are sensitive to neighboring parameters but our proposal also has higher detection accuracy of local outlier detection than many current methods on both synthetic and UCI datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Gao X, Yu J, Zha S, Fu S, Xue B, Ye P, Huang Z, Zhang G (2022) An ensemble-based outlier detection method for clustered and local outliers with differential potential spread loss. Knowledge-Based Systems 110003

  2. Hawkins D (1980) Identification of outliers. Chapman and Hall

  3. Mandhare HC, Idate SR (2017) A comparative study of cluster based outlier detection, distance based outlier detection and density based outlier detection techniques. In: 2017 international conference on intelligent computing and control systems (ICICCS). pp 931–935. https://doi.org/10.1109/ICCONS.2017.8250601

  4. Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recogn 74:406–421. https://doi.org/10.1016/j.patcog.2017.09.037

    Article  MATH  Google Scholar 

  5. Tang B, He H (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180. https://doi.org/10.1016/j.neucom.2017.02.039

    Article  Google Scholar 

  6. Caroline CP, Thomas GS (2001) An outlier detection approach on credit card fraud detection using machine learning: a comparative analysis on supervised and unsupervised learning. In: Intelligence in big data technologies—beyond the hype. pp 125–135

  7. Yang Y, Fan CJ, Chen L, Xiong HL (2002) IPMOD: An efficient outlier detection model for high-dimensional medical data streams. Expert Syst Appl 191:116212. https://doi.org/10.1016/j.eswa.2021.116212

    Article  Google Scholar 

  8. Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput 76:505–516. https://doi.org/10.1016/j.asoc.2018.12.029

    Article  Google Scholar 

  9. Lu S, He T, Zhou Q, Wen J, Liu Y, Zhang M(2020) Research on a distribution-outlier detection algorithm based on logistics distribution data. J Phys Confer Ser (6pp) 1624:042002

  10. Li, Z, Zhao Y, Hu X, Botta N, Ionescu C, Chen GH (2022) ECOD: unsupervised outlier detection using empirical cumulative distribution functions. CoRR arXiv:2201.00382

  11. Issac J, Wüthrich M, Cifuentes CG, Bohg J, Trimpe S, Schaal S (2016) Depth-based object tracking using a robust gaussian filter. In: 2016 IEEE international conference on robotics and automation (ICRA). pp 608–615. https://doi.org/10.1109/ICRA.2016.7487184

  12. Dang X, Serfling R (2010) Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. J Stat Plann Inference 140(1):198–213. https://doi.org/10.1016/j.jspi.2009.07.004

    Article  MathSciNet  MATH  Google Scholar 

  13. Angiulli F, Basta S, Lodi S, Sartori C (2020) Reducing distance computations for distance-based outliers. Expert Syst Appl 147:113215. https://doi.org/10.1016/j.eswa.2020.113215

    Article  Google Scholar 

  14. Knorr E, Ng R (1997) A unified notion of outliers. Properties and computation

  15. Muhr D, Affenzeller M (2022) Little data is often enough for distance-based outlier detection. Proc Comput Sci 200:984–992. https://doi.org/10.1016/j.procs.2022.01.297

    Article  Google Scholar 

  16. Li K, Gao X, Fu S, Diao X, Ye P, Xue P, Yu J, Huang Z (2022) Robust outlier detection based on the changing rate of directed density rati. Expert Syst Appl 207:117988. https://doi.org/10.1016/j.eswa.2022.117988

    Article  Google Scholar 

  17. Ranjan Gaurav K, Prusty Rajanarayan B (2022) A detailed analysis of adaptive kernel density-based outlier detection in volatile time series. In: Machine learning, advances in computing, renewable energy and communication. pp 359–369

  18. Breunig M et al (2000) Lof: identifying density-based local outliers. ACM Sigmod Record

  19. Zhang L, Lin J, Karim R (2018) Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl-Based Syst 139:50–63. https://doi.org/10.1016/j.knosys.2017.10.009

    Article  Google Scholar 

  20. Degirmenci A, Karal O (2022) Efficient density and cluster based incremental outlier detection in data streams. Inf Sci 607:901–920. https://doi.org/10.1016/j.ins.2022.06.013

    Article  Google Scholar 

  21. Beulah J, Rene, Nalini M, Irene D, Shiny, Punithavathani D, Shalini (2022) Enhancing detection of R2L attacks by multistage clustering based outlier detection, wireless personal communications

  22. Lazhar F (2018) Fuzzy clustering-based semi-supervised approach for outlier detection in big text data. Prog Artif Intell 8(6)

  23. Xiong Z, Gao Q, Gao Q, Zhang Y, Li L, Zhang M (2022) ADD: a new average divergence difference-based outlier detection method with skewed distribution of data objects. Appl Intell 52:5100–5124. https://doi.org/10.1007/s10489-021-02399-y

    Article  Google Scholar 

  24. Nozad SAN, Haeri MA, Folino G (2021) SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowl-Based Syst 228:107256. https://doi.org/10.1016/j.knosys.2021.107256

    Article  Google Scholar 

  25. Huang J, Zhu Q, Yang L, Cheng D, Wu Q (2017) A novel outlier cluster detection algorithm without top-n parameter. Knowl-Based Syst 121(2017):32–40. https://doi.org/10.1016/j.knosys.2017.01.013

    Article  Google Scholar 

  26. Dashdondov K, Kim MH (2021) Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction. Neural Process Lett. https://doi.org/10.1007/s11063-021-10663-y

  27. Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recogn 74:406–421. https://doi.org/10.1016/j.patcog.2017.09.037

    Article  MATH  Google Scholar 

  28. Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. Knowledge discovery and data mining, 535–548

  29. Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. Advances in Knowledge Discovery and Data Mining. Springer, Berlin Heidelberg, pp 577–593

    Chapter  Google Scholar 

  30. Tang B, He B (2017) A local density-based approach for outlier detection. Neurocomputing 241:171–180. https://doi.org/10.1016/j.neucom.2017.02.039

    Article  Google Scholar 

  31. Lin CH, Hsu KC, Johnson KR, Luby M, Fann YC (2019) Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. Int J Med Informatics 132:103988. https://doi.org/10.1016/j.ijmedinf.2019.103988

    Article  Google Scholar 

  32. Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23. https://doi.org/10.1016/j.knosys.2014.03.001

    Article  Google Scholar 

  33. Zhang S, Wan J (2018) Weight-based method for inside outlier detection. Optik 154:145–156. https://doi.org/10.1016/j.ijleo.2017.09.116

    Article  Google Scholar 

  34. Zhu Q, Feng J, Huang J (2016) Natural neighbor: A self-adaptive neighborhood method without parameter k. Pattern Recogn Lett 80:30–36. https://doi.org/10.1016/j.patrec.2016.05.007

    Article  Google Scholar 

  35. Huang J, Zhu Q, Yang L, Feng J (2016) A non-parameter outlier detection algorithm based on natural neighbor. Knowl-Based Syst 92:71–77. https://doi.org/10.1016/j.knosys.2015.10.014

    Article  Google Scholar 

  36. Bentley J (1975) Multidimensional binary search trees used for associated searching. Commun ACM 18(9):509–517

    Article  MATH  Google Scholar 

  37. Ha J, Seok S, Lee JS (2014) Robust outlier detection using the instability factor. Knowl-Based Syst 63:15–23. https://doi.org/10.1016/j.knosys.2014.03.001

    Article  Google Scholar 

  38. Wang X, Wang X, Wilkes M (2021) A k-nearest neighbor centroid-based outlier detection method. New Dev Unsupervised Outlier Detection 4:71–112. https://doi.org/10.1007/978-981-15-9519-6-4

  39. Jouan-Rimbaud D, Bouveresse E, Massart D, de Noord O (1999) Detection of prediction outliers and inliers in multivariate calibration. Anal Chim Acta 388(3):283–301. https://doi.org/10.1016/S0003-2670(98)00626-6

    Article  Google Scholar 

  40. Xi J (2008) Outlier detection algorithms in data mining. In: 2008 2nd international symposium on intelligent information technology application, vol 1. pp 94–97. https://doi.org/10.1109/IITA.2008.26

  41. Wang C, Liu Z, Gao H, Fu Y (2019) Vos: A new outlier detection model using virtual graph. Knowl-Based Syst 185:104907. https://doi.org/10.1016/j.knosys.2019.104907

    Article  Google Scholar 

  42. Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331. https://doi.org/10.1016/j.knosys.2019.105331

    Article  Google Scholar 

  43. Ha J, Seok S, Lee JS (2015) A precise ranking method for outlier detection. Inf Sci 324:88–107. https://doi.org/10.1016/j.ins.2015.06.030

    Article  MathSciNet  MATH  Google Scholar 

  44. Pai HT, Wu F, Hsueh PYSS (2014) A relative patterns discovery for enhancing outlier detection in categorical data. Decis Support Syst 67:90–99. https://doi.org/10.1016/j.dss.2014.08.006

    Article  Google Scholar 

  45. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions. This work is funded by the National Natural Science Foundation of China (no. 61701051), Fundamental Research Funds for Central Universities (no. 2019CDCGJSJ329) and the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant no.CYS20067).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Gao.

Ethics declarations

Competing Interest

The author(s) declare no potential conflicts of interest with respect to the research, authorship and/or publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Q., Gao, QQ., Xiong, ZY. et al. A double-weighted outlier detection algorithm considering the neighborhood orientation distribution of data objects. Appl Intell 53, 21961–21983 (2023). https://doi.org/10.1007/s10489-023-04593-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04593-6

Keywords

Navigation