Abstract
Outlier detection becomes challenging when data are featured by high-dimension. Using dimensionality reduction (DR) techniques to discard the irrelevant attributes is a straightforward solution. However, it appears to be rather difficult for single DR algorithm to discover all outliers, owing to the rarity, heterogeneity, and boundless nature of outliers. In this paper, we propose a hybrid DR method dedicated to outlier detection base on ensemble learning. Multiple algorithms with different specifications of parameters are used to generate accurate and diverse base detectors at the phase of ensemble generation. A two-stage combination function is used at the phase of ensemble combination. Both variance reduction and bias reduction are taken into account in our framework. More importantly, the high flexibility of the proposed detection framework implies that any outlier detection algorithm can be applicable. 15 high-dimensional data sets from KEEL repository and one image data set are used to validate the performance of our method. One semi-supervised and one unsupervised outlier detection algorithms are used in separate experiments. In spite of subtle differences, the advantage of our method has been approved by both experiments. Moreover, contributions of two ingredients of our method are also verified via two pairs of experimental comparisons.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and materials
The datasets generated during and analyzed during the current study are available in the KEEL repository, [https://sci2s.ugr.es/keel/imbalanced.php].
References
Hawkins, D.M., Identification of outliers. 1980: Springer.
Zimek A, Campello RJ, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explor Newsl 15(1):11–22
Ahmed I et al (2022) Graph regularized autoencoder and its application in unsupervised anomaly detection. IEEE Trans Pattern Anal Mach Intell 44(8):4110–4124
Zimek A, Schubert E, Kriegel H-P (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Analy Data Min ASA Data Sci J 5(5):363–387
Sakurada, M. and T. Yairi, Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction, in Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. 2014, Association for Computing Machinery: Gold Coast, Australia QLD, Australia. p. 4–11.
Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer Berlin Heidelberg, Berlin Heidelberg, pp 1–15
Wang B, Mao Z (2020) Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput Appl 32:8047–8063
Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33(6):2107–2123
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58
Pimentel MA et al (2014) A review of novelty detection. Signal Process 99:215–249
Smiti A (2020) A critical overview of outlier detection methods. Comput Sci Rev 38:100306
Tax, D.M.J., One-class classification: Concept learning in the absence of counter-examples. 2002, Technische Universiteit Delft
Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253
Breunig, M.M., et al. 2000 LOF: identifying density-based local outliers, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data. Association for Computing Machinery: Dallas, Texas, USA. pp. 93–104.
Çelik, M., F. Dadaşer-Çelik, and A.Ş. Dokuz. Anomaly detection in temperature data using DBSCAN algorithm. in 2011 International Symposium on innovations in intelligent systems and applications. 2011.
Kuncheva LI, Faithfull WJ (2014) PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80
Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput Netw 148:164–175
Song H et al (2017) A hybrid semi-supervised anomaly detection model for high-dimensional data. Comput Intell Neurosci 2017:8501683
Zhang C et al (2021) Unsupervised anomaly detection based on deep autoencoding and clustering. Secur Commun Netw 2021:7389943
Dawoud A, Shahristani S, Raun C (2019) Dimensionality reduction for network anomalies detection: a deep learning approach. In: Barolli Leonard, Takizawa Makoto, Xhafa Fatos, Enokido Tomoya (eds) Web, artificial intelligence and network applications. Springer International Publishing, Cham, pp 957–965
Chakraborty D, Narayanan V, Ghosh A (2019) Integration of deep feature extraction and ensemble learning for outlier detection. Pattern Recogn 89:161–171
Kieu, T., B. Yang, and C.S. Jensen. 2018 Outlier Detection for Multidimensional Time Series Using Deep Neural Networks. in 2018 19th IEEE International Conference on Mobile Data Management (MDM).
Amarbayasgalan T, Jargalsaikhan B, Ryu KH (2018) Unsupervised novelty detection using deep autoencoders with density based clustering. Appl Sci 8(9):1468
Zhang Z et al (2017) Robust neighborhood preserving projection by nuclear/L2,1-norm regularization for image feature extraction. IEEE Trans Image Process 26(4):1607–1622
Zhang Y et al (2021) Partial-label and structure-constrained deep coupled factorization network. Proceed AAAI Conf Artif Intell 35:10948–10955
Zhang Y et al (2021) Dual-constrained deep semi-supervised coupled factorization network with enriched prior. Int J Comput Vision 129(12):3233–3254
Zheng J et al (2022) A deep hypersphere approach to high-dimensional anomaly detection. Appl Soft Comput 125:109146
Tra V, Amayri M, Bouguila N (2022) Outlier detection via multiclass deep autoencoding Gaussian mixture model for building chiller diagnosis. Energy Build 259:111893
Aggarwal CC, Sathe S (2015) Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor Newsl 17(1):24–47
Lazarevic, A. and V. Kumar. Feature bagging for outlier detection. in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 2005.
Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: Gerstner Wulfram, Germond Alain, Hasler Martin, Nicoud Jean-Daniel (eds) Artificial Neural Networks—ICANN’97. Springer, Berlin Heidelberg, Berlin, pp 583–588
Vincent, P., et al. 2008 Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning Association for Computing Machinery. Helsinki, Finland. p. 1096–1103.
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Riahi-Madvar, M., B. Nasersharif, and A.A. Azirani. Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces. in 2021 26th International Computer Conference, Computer Society of Iran (CSICC). 2021.
Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inform Fusion 51:244–258
Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule. Inform Fusion 63:30–40
Zhao, Y., et al. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM). 2019.
Ruff, L., et al. 2018 Deep One-Class Classification, In Proceedings of the 35th International Conference on Machine Learning, D. Jennifer and K. Andreas, Editors., PMLR: Proceedings of Machine Learning Research. p. 4393--4402.
Deng X, Zhang Z (2020) Nonlinear chemical process fault diagnosis using ensemble deep support vector data description. Sensors 20(16):4599
Acknowledgements
This work is partially supported by Liaoning Xingliao Talent Plan No. XLYC2007144, Shenyang Natural Science Foundation No. 22-315-6-09. National Science Foundation of China under Grant 62003337.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meng, G., Wang, B., Wu, Y. et al. A hybrid dimensionality reduction method for outlier detection in high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 3705–3718 (2023). https://doi.org/10.1007/s13042-023-01859-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01859-w