Skip to main content
Log in

A hybrid dimensionality reduction method for outlier detection in high-dimensional data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Outlier detection becomes challenging when data are featured by high-dimension. Using dimensionality reduction (DR) techniques to discard the irrelevant attributes is a straightforward solution. However, it appears to be rather difficult for single DR algorithm to discover all outliers, owing to the rarity, heterogeneity, and boundless nature of outliers. In this paper, we propose a hybrid DR method dedicated to outlier detection base on ensemble learning. Multiple algorithms with different specifications of parameters are used to generate accurate and diverse base detectors at the phase of ensemble generation. A two-stage combination function is used at the phase of ensemble combination. Both variance reduction and bias reduction are taken into account in our framework. More importantly, the high flexibility of the proposed detection framework implies that any outlier detection algorithm can be applicable. 15 high-dimensional data sets from KEEL repository and one image data set are used to validate the performance of our method. One semi-supervised and one unsupervised outlier detection algorithms are used in separate experiments. In spite of subtle differences, the advantage of our method has been approved by both experiments. Moreover, contributions of two ingredients of our method are also verified via two pairs of experimental comparisons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
Fig.6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability and materials

The datasets generated during and analyzed during the current study are available in the KEEL repository, [https://sci2s.ugr.es/keel/imbalanced.php].

References

  1. Hawkins, D.M., Identification of outliers. 1980: Springer.

  2. Zimek A, Campello RJ, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM SIGKDD Explor Newsl 15(1):11–22

    Article  Google Scholar 

  3. Ahmed I et al (2022) Graph regularized autoencoder and its application in unsupervised anomaly detection. IEEE Trans Pattern Anal Mach Intell 44(8):4110–4124

    MathSciNet  Google Scholar 

  4. Zimek A, Schubert E, Kriegel H-P (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Analy Data Min ASA Data Sci J 5(5):363–387

    Article  MathSciNet  MATH  Google Scholar 

  5. Sakurada, M. and T. Yairi, Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction, in Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis. 2014, Association for Computing Machinery: Gold Coast, Australia QLD, Australia. p. 4–11.

  6. Dietterich TG (2000) Ensemble methods in machine learning. Multiple classifier systems. Springer Berlin Heidelberg, Berlin Heidelberg, pp 1–15

    Google Scholar 

  7. Wang B, Mao Z (2020) Detecting outliers in industrial systems using a hybrid ensemble scheme. Neural Comput Appl 32:8047–8063

    Article  Google Scholar 

  8. Wahid A, Annavarapu CSR (2021) NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl 33(6):2107–2123

    Article  Google Scholar 

  9. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58

    Article  Google Scholar 

  10. Pimentel MA et al (2014) A review of novelty detection. Signal Process 99:215–249

    Article  Google Scholar 

  11. Smiti A (2020) A critical overview of outlier detection methods. Comput Sci Rev 38:100306

    Article  MathSciNet  MATH  Google Scholar 

  12. Tax, D.M.J., One-class classification: Concept learning in the absence of counter-examples. 2002, Technische Universiteit Delft

  13. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  14. Breunig, M.M., et al. 2000 LOF: identifying density-based local outliers, in Proceedings of the 2000 ACM SIGMOD international conference on Management of data. Association for Computing Machinery: Dallas, Texas, USA. pp. 93–104.

  15. Çelik, M., F. Dadaşer-Çelik, and A.Ş. Dokuz. Anomaly detection in temperature data using DBSCAN algorithm. in 2011 International Symposium on innovations in intelligent systems and applications. 2011.

  16. Kuncheva LI, Faithfull WJ (2014) PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80

    Article  Google Scholar 

  17. Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Comput Netw 148:164–175

    Article  Google Scholar 

  18. Song H et al (2017) A hybrid semi-supervised anomaly detection model for high-dimensional data. Comput Intell Neurosci 2017:8501683

    Article  Google Scholar 

  19. Zhang C et al (2021) Unsupervised anomaly detection based on deep autoencoding and clustering. Secur Commun Netw 2021:7389943

    Google Scholar 

  20. Dawoud A, Shahristani S, Raun C (2019) Dimensionality reduction for network anomalies detection: a deep learning approach. In: Barolli Leonard, Takizawa Makoto, Xhafa Fatos, Enokido Tomoya (eds) Web, artificial intelligence and network applications. Springer International Publishing, Cham, pp 957–965

    Chapter  Google Scholar 

  21. Chakraborty D, Narayanan V, Ghosh A (2019) Integration of deep feature extraction and ensemble learning for outlier detection. Pattern Recogn 89:161–171

    Article  Google Scholar 

  22. Kieu, T., B. Yang, and C.S. Jensen. 2018 Outlier Detection for Multidimensional Time Series Using Deep Neural Networks. in 2018 19th IEEE International Conference on Mobile Data Management (MDM).

  23. Amarbayasgalan T, Jargalsaikhan B, Ryu KH (2018) Unsupervised novelty detection using deep autoencoders with density based clustering. Appl Sci 8(9):1468

    Article  Google Scholar 

  24. Zhang Z et al (2017) Robust neighborhood preserving projection by nuclear/L2,1-norm regularization for image feature extraction. IEEE Trans Image Process 26(4):1607–1622

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhang Y et al (2021) Partial-label and structure-constrained deep coupled factorization network. Proceed AAAI Conf Artif Intell 35:10948–10955

    Google Scholar 

  26. Zhang Y et al (2021) Dual-constrained deep semi-supervised coupled factorization network with enriched prior. Int J Comput Vision 129(12):3233–3254

    Article  Google Scholar 

  27. Zheng J et al (2022) A deep hypersphere approach to high-dimensional anomaly detection. Appl Soft Comput 125:109146

    Article  Google Scholar 

  28. Tra V, Amayri M, Bouguila N (2022) Outlier detection via multiclass deep autoencoding Gaussian mixture model for building chiller diagnosis. Energy Build 259:111893

    Article  Google Scholar 

  29. Aggarwal CC, Sathe S (2015) Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor Newsl 17(1):24–47

    Article  Google Scholar 

  30. Lazarevic, A. and V. Kumar. Feature bagging for outlier detection. in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 2005.

  31. Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: Gerstner Wulfram, Germond Alain, Hasler Martin, Nicoud Jean-Daniel (eds) Artificial Neural Networks—ICANN’97. Springer, Berlin Heidelberg, Berlin, pp 583–588

    Chapter  Google Scholar 

  32. Vincent, P., et al. 2008 Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning Association for Computing Machinery. Helsinki, Finland. p. 1096–1103.

  33. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  34. Riahi-Madvar, M., B. Nasersharif, and A.A. Azirani. Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces. in 2021 26th International Computer Conference, Computer Society of Iran (CSICC). 2021.

  35. Wang B, Mao Z (2019) Outlier detection based on a dynamic ensemble model: applied to process monitoring. Inform Fusion 51:244–258

    Article  Google Scholar 

  36. Wang B, Mao Z (2020) A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule. Inform Fusion 63:30–40

    Article  Google Scholar 

  37. Zhao, Y., et al. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM). 2019.

  38. Ruff, L., et al. 2018 Deep One-Class Classification, In Proceedings of the 35th International Conference on Machine Learning, D. Jennifer and K. Andreas, Editors., PMLR: Proceedings of Machine Learning Research. p. 4393--4402.

  39. Deng X, Zhang Z (2020) Nonlinear chemical process fault diagnosis using ensemble deep support vector data description. Sensors 20(16):4599

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by Liaoning Xingliao Talent Plan No. XLYC2007144, Shenyang Natural Science Foundation No. 22-315-6-09. National Science Foundation of China under Grant 62003337.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biao Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, G., Wang, B., Wu, Y. et al. A hybrid dimensionality reduction method for outlier detection in high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 3705–3718 (2023). https://doi.org/10.1007/s13042-023-01859-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01859-w

Keywords

Navigation