Abstract
Anomaly detection problem has been extensively studied in a variety of application domains, where the data tags are difficult to obtain. Most unsupervised algorithms rely on some notions such as distance and density to detect anomalies. However, the performance of such algorithms is easier to decrease as the dimension of the datasets increases. Some studies which use features as pseudo-labels for prediction detect anomalies according to the deviation value of the prediction model. Even so, the improvement of model performance is still restricted to ignoring the correlation between feature attributes. In this paper, we propose a correlation-based feature partition regression prediction method called CFPR, which can alleviate the adverse effects of dataset dimensions and irrelevant attributes on model performance to a certain extent. According to the correlation between the features, the high-dimensional datasets will be divided into multiple feature subspaces. In each subspace, the feature with the highest correlation coefficient will be conducted as a pseudo-label. After that, we use the remaining features as the prediction attributes to train a supervised regression prediction model. We can calculate the anomaly score of each sample in the subspace according to the difference between the regression prediction value and the true value of the pseudo-label. Furthermore, we define a weighting strategy based on the level of correlation in the subspace integration stage to obtain the final anomaly score ranking table. Extensive experiments on twenty-eight UCI public datasets show that the CFPR performs better than several state-of-art anomaly algorithms at the AUC metric.
Similar content being viewed by others
References
Izotova A, Valiullin A (2021) Comparison of Poisson process and machine learning algorithms approach for credit card fraud detection. Procedia Comput Sci 186:721–726. https://doi.org/10.1016/j.procs.2021.04.214
Herrera-Semenets V, Bustio-Martínez L, Hernández-León R, Jan (2021) A multi-measure feature selection algorithm for efficacious intrusion detection. Knowl-Based Syst 227:107264. https://doi.org/10.1016/j.knosys.2021.107264
Ma Q, Sun C, Cui B, Jin X (2021) A novel model for anomaly detection in network traffic based on kernel support vector machine. Comput Secur 104:102215. https://doi.org/10.1016/j.cose.2021.102215
Mirsky Y, Golomb T, Elovici Y (2020) Lightweight collaborative anomaly detection for the iot using blockchain. J Parallel Distrib Comput 145:75–97. https://doi.org/10.1016/j.jpdc.2020.06.008
Wang Z, Yang Z, Zhang Y-J (2020) A promotion method for generation error-based video anomaly detection. Pattern Recogn Lett 140:88–94. https://doi.org/10.1016/j.patrec.2020.09.019
MR GR, Somu N, Mathur AP (2020) A multilayer perceptron model for anomaly detection in water treatment plants. Int J Crit Infrastruct Prot 31:100393. https://doi.org/10.1016/j.ijcip.2020.100393
Carrasco J, López D, Aguilera-Martos I, García-Gil D, Markova I, García-Barzana M, Arias-Rodil M, Luengo J, Herrera F (2021) Anomaly detection in predictive maintenance: A new evaluation framework for temporal unsupervised anomaly detection algorithms. Neurocomputing 462:440–452. https://doi.org/10.1016/j.neucom.2021.07.095
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. Association for Computing Machinery, New York, pp 427–438
Sarmadi H, Karamodin A (2020) A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects. Mech Syst Signal Process 140:106495. https://doi.org/10.1016/j.ymssp.2019.106495
Xie J, Xiong Z, Dai Q, Wang X, Zhang Y (2020) A local-gravitation-based method for the detection of outliers and boundary points. Knowl-Based Syst 192:105331. https://doi.org/10.1016/j.knosys.2019.105331
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00. Association for Computing Machinery, New York, pp 93–104
Naghavi Nozad SA, Amir Haeri M, Folino G (2021) SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets. Knowl-Based Syst 228:107256. https://doi.org/10.1016/j.knosys.2021.107256
Li Z, Zhao Y, Botta N, Ionescu C, Hu X (2020) Copod: Copula-based outlier detection. In: 2020 IEEE International Conference on Data Mining (ICDM), pp 1118–1123
vander Maaten L, Postma E, Herik H (2007) Dimensionality reduction: A comparative review. J Mach Learn Res 10(1)
Liang J, He R, Sun Z, Tan T (2019) Exploring uncertainty in pseudo-label guided unsupervised domain adaptation. Pattern Recogn 96:106996. https://doi.org/10.1016/j.patcog.2019.106996
Zhong M, LeBien J, CamposCerqueira M, Dodhia R, LavistaFerres J, Velev J, Aide TM (2020) Multispecies bioacoustics classification using transfer learning of deep convolutional neural networks with pseudo-labeling. J Acoust Soc Amer 148:2442–2442. https://doi.org/10.1121/1.5146738
Ahn H-S, Yu HC, Kwak HS, Park S-H (2020) Assessment of renal perfusion in transplanted kidney patients using pseudo-continuous arterial spin labeling with multiple post-labeling delays. Eur J Radiol 130:109200. https://doi.org/10.1016/j.ejrad.2020.109200
Zhang K, Chen J, Zhang T, He S, Pan T, Zhou Z (2020) Intelligent fault diagnosis of mechanical equipment under varying working condition via iterative matching network augmented with selective signal reuse strategy. J Manuf Syst 57:400–415. https://doi.org/10.1016/j.jmsy.2020.10.007
Dai J, Zhang P, Lu H, Wang H (2020) Dynamic imposter based online instance matching for person search. Pattern Recogn 100:107120. https://doi.org/10.1016/j.patcog.2019.107120
Chen W, Hu H (2020) Generative attention adversarial classification network for unsupervised domain adaptation. Pattern Recogn 107:107440. https://doi.org/10.1016/j.patcog.2020.107440
Asghar S, Choi J, Yoon D, Byun J (2020) Spatial pseudo-labeling for semi-supervised facies classification. J Pet Sci Eng 195:107834. https://doi.org/10.1016/j.petrol.2020.107834
Zhang Q, Yu X (2020) Growingnet: An end-to-end growing network for semi-supervised learning. Comput Commun 151:208–215. https://doi.org/10.1016/j.comcom.2020.01.003
Ju H, Lee D, Hwang J, Namkung J, Yu H (2020) Pumad: Pu metric learning for anomaly detection. Inf Sci 523:167–183. https://doi.org/10.1016/j.ins.2020.03.021
Wagstaff KL, Lanza N, Thompson D, Dietterich TG, Gilmore M (2013) Guiding scientific discovery with explanations using demud. Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013, pp 905–911. http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6171
Paulheim H, Meusel R (2015) A decomposition of the outlier detection problem into a set of supervised learning problems. Mach Learn 100(2):509–531. https://doi.org/10.1007/s10994-015-5507-y
Zhang J, Li Z, Nai K, Gu Y, Sallam A (2019) Delr: A double-level ensemble learning method for unsupervised anomaly detection. Knowl-Based Syst 181:104783. https://doi.org/10.1016/j.knosys.2019.05.026
Teng C-M (1999) Correcting noisy data. Proceedings of the Sixteenth International Conference on Machine Learning, pp 239–248
Dewan I, Rao B L SP (2005) Wilcoxon-signed rank test for associated sequences. Stat Probab Lett 71(2):131–142. https://doi.org/10.1016/j.spl.2004.10.034
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Z., Gao, X., Jia, X. et al. Correlation-based feature partition regression method for unsupervised anomaly detection. Appl Intell 52, 15074–15090 (2022). https://doi.org/10.1007/s10489-022-03247-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03247-3