Skip to main content
Log in

Correlation-based feature partition regression method for unsupervised anomaly detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Anomaly detection problem has been extensively studied in a variety of application domains, where the data tags are difficult to obtain. Most unsupervised algorithms rely on some notions such as distance and density to detect anomalies. However, the performance of such algorithms is easier to decrease as the dimension of the datasets increases. Some studies which use features as pseudo-labels for prediction detect anomalies according to the deviation value of the prediction model. Even so, the improvement of model performance is still restricted to ignoring the correlation between feature attributes. In this paper, we propose a correlation-based feature partition regression prediction method called CFPR, which can alleviate the adverse effects of dataset dimensions and irrelevant attributes on model performance to a certain extent. According to the correlation between the features, the high-dimensional datasets will be divided into multiple feature subspaces. In each subspace, the feature with the highest correlation coefficient will be conducted as a pseudo-label. After that, we use the remaining features as the prediction attributes to train a supervised regression prediction model. We can calculate the anomaly score of each sample in the subspace according to the difference between the regression prediction value and the true value of the pseudo-label. Furthermore, we define a weighting strategy based on the level of correlation in the subspace integration stage to obtain the final anomaly score ranking table. Extensive experiments on twenty-eight UCI public datasets show that the CFPR performs better than several state-of-art anomaly algorithms at the AUC metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Izotova A, Valiullin A (2021) Comparison of Poisson process and machine learning algorithms approach for credit card fraud detection. Procedia Comput Sci 186:721–726. https://doi.org/10.1016/j.procs.2021.04.214

    Article  Google Scholar 

  2. Herrera-Semenets V, Bustio-Martínez L, Hernández-León R, Jan (2021) A multi-measure feature selection algorithm for efficacious intrusion detection. Knowl-Based Syst 227:107264. https://doi.org/10.1016/j.knosys.2021.107264

    Article  Google Scholar 

  3. Ma Q, Sun C, Cui B, Jin X (2021) A novel model for anomaly detection in network traffic based on kernel support vector machine. Comput Secur 104:102215. https://doi.org/10.1016/j.cose.2021.102215

    Article  Google Scholar 

  4. Mirsky Y, Golomb T, Elovici Y (2020) Lightweight collaborative anomaly detection for the iot using blockchain. J Parallel Distrib Comput 145:75–97. https://doi.org/10.1016/j.jpdc.2020.06.008

    Article  Google Scholar 

  5. Wang Z, Yang Z, Zhang Y-J (2020) A promotion method for generation error-based video anomaly detection. Pattern Recogn Lett 140:88–94. https://doi.org/10.1016/j.patrec.2020.09.019

    Article  Google Scholar 

  6. MR GR, Somu N, Mathur AP (2020) A multilayer perceptron model for anomaly detection in water treatment plants. Int J Crit Infrastruct Prot 31:100393. https://doi.org/10.1016/j.ijcip.2020.100393

    Article  Google Scholar 

  7. Carrasco J, López D, Aguilera-Martos I, García-Gil D, Markova I, García-Barzana M, Arias-Rodil M, Luengo J, Herrera F (2021) Anomaly detection in predictive maintenance: A new evaluation framework for temporal unsupervised anomaly detection algorithms. Neurocomputing 462:440–452. https://doi.org/10.1016/j.neucom.2021.07.095

    Article  Google Scholar 

  8. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. Association for Computing Machinery, New York, pp 427–438

  9. Sarmadi H, Karamodin A (2020) A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects. Mech Syst Signal Process 140:106495. https://doi.org/10.1016/j.ymssp.2019.106495

    Article  Google Scholar 

  10. Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331. https://doi.org/10.1016/j.knosys.2019.105331

    Article  Google Scholar 

  11. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. Association for Computing Machinery, New York, pp 93–104

  12. Naghavi Nozad SA, Amir Haeri M, Folino G (2021) SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowl-Based Syst 228:107256. https://doi.org/10.1016/j.knosys.2021.107256

    Article  Google Scholar 

  13. Li Z, Zhao Y, Botta N, Ionescu C, Hu X (2020) Copod: Copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp 1118–1123

  14. vander Maaten L, Postma E, Herik H (2007) Dimensionality reduction: A comparative review. J Mach Learn Res 10(1)

  15. Liang J, He R, Sun Z, Tan T (2019) Exploring uncertainty in pseudo-label guided unsupervised domain adaptation. Pattern Recogn 96:106996. https://doi.org/10.1016/j.patcog.2019.106996

    Article  Google Scholar 

  16. Zhong M, LeBien J, CamposCerqueira M, Dodhia R, LavistaFerres J, Velev J, Aide TM (2020) Multispecies bioacoustics classification using transfer learning of deep convolutional neural networks with pseudo-labeling. J Acoust Soc Amer 148:2442–2442. https://doi.org/10.1121/1.5146738

    Google Scholar 

  17. Ahn H-S, Yu HC, Kwak HS, Park S-H (2020) Assessment of renal perfusion in transplanted kidney patients using pseudo-continuous arterial spin labeling with multiple post-labeling delays. Eur J Radiol 130:109200. https://doi.org/10.1016/j.ejrad.2020.109200

    Article  Google Scholar 

  18. Zhang K, Chen J, Zhang T, He S, Pan T, Zhou Z (2020) Intelligent fault diagnosis of mechanical equipment under varying working condition via iterative matching network augmented with selective signal reuse strategy. J Manuf Syst 57:400–415. https://doi.org/10.1016/j.jmsy.2020.10.007

    Article  Google Scholar 

  19. Dai J, Zhang P, Lu H, Wang H (2020) Dynamic imposter based online instance matching for person search. Pattern Recogn 100:107120. https://doi.org/10.1016/j.patcog.2019.107120

    Article  Google Scholar 

  20. Chen W, Hu H (2020) Generative attention adversarial classification network for unsupervised domain adaptation. Pattern Recogn 107:107440. https://doi.org/10.1016/j.patcog.2020.107440

    Article  Google Scholar 

  21. Asghar S, Choi J, Yoon D, Byun J (2020) Spatial pseudo-labeling for semi-supervised facies classification. J Pet Sci Eng 195:107834. https://doi.org/10.1016/j.petrol.2020.107834

    Article  Google Scholar 

  22. Zhang Q, Yu X (2020) Growingnet: An end-to-end growing network for semi-supervised learning. Comput Commun 151:208–215. https://doi.org/10.1016/j.comcom.2020.01.003

    Article  Google Scholar 

  23. Ju H, Lee D, Hwang J, Namkung J, Yu H (2020) Pumad: Pu metric learning for anomaly detection. Inf Sci 523:167–183. https://doi.org/10.1016/j.ins.2020.03.021

    Article  MathSciNet  Google Scholar 

  24. Wagstaff KL, Lanza N, Thompson D, Dietterich TG, Gilmore M (2013) Guiding scientific discovery with explanations using demud. Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013, pp 905–911. http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6171

  25. Paulheim H, Meusel R (2015) A decomposition of the outlier detection problem into a set of supervised learning problems. Mach Learn 100(2):509–531. https://doi.org/10.1007/s10994-015-5507-y

    Article  MathSciNet  Google Scholar 

  26. Zhang J, Li Z, Nai K, Gu Y, Sallam A (2019) Delr: A double-level ensemble learning method for unsupervised anomaly detection. Knowl-Based Syst 181:104783. https://doi.org/10.1016/j.knosys.2019.05.026

    Article  Google Scholar 

  27. Teng C-M (1999) Correcting noisy data. Proceedings of the Sixteenth International Conference on Machine Learning, pp 239–248

  28. Dewan I, Rao B L SP (2005) Wilcoxon-signed rank test for associated sequences. Stat Probab Lett 71(2):131–142. https://doi.org/10.1016/j.spl.2004.10.034

    Article  MathSciNet  Google Scholar 

  29. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Gao.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Gao, X., Jia, X. et al. Correlation-based feature partition regression method for unsupervised anomaly detection. Appl Intell 52, 15074–15090 (2022). https://doi.org/10.1007/s10489-022-03247-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03247-3

Keywords

Navigation