Skip to main content
Log in

Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Manifold learning, which has emerged in recent years, plays an increasingly important role in machine learning. However, because inevitable noises and outliers destroy the manifold structure of data, the dimensionality reduction effect of manifold learning will be reduced. Therefore, this paper proposes a denoising algorithm for high-dimensional data based on manifold learning. The algorithm first projects noisy sample vectors onto the local manifold, thereby achieving noise reduction. Then, a statistical analysis of noises is performed to obtain a data boundary. Because all the data come from the same background and obey the same distribution, the sample vectors that are not within the data boundary are marked as outliers, and these outliers are eliminated. Finally, the dimension reduction of the data after noise reduction and outlier detection is performed. Experimental results show that the algorithm can effectively eliminate the interference of noises and outliers in high-dimensional datasets to some extent for manifold learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Izenman AJ (2012) Introduction to manifold learning[J]. Wiley Interdiscip Rev Comput Stat 4(5):439–446

    Article  Google Scholar 

  2. Ayala D, Francis J (2015) Factorization homology of topological manifolds[J]. J Topol 8(4):1045–1084

    Article  MathSciNet  MATH  Google Scholar 

  3. Huang L, Lu J, Tan YP (2014) Multi-manifold metric learning for face recognition based on image sets[J]. J Vis Commun Image Represent 7(25):1774–1783

    Article  Google Scholar 

  4. Wang B, Sun Y, Chu Y et al (2022) Refining electronic medical records representation in manifold subspace[J]. Bioinformatics 23(1):1–17

    Google Scholar 

  5. Jiang Q, Jia M, Hu J et al (2009) Machinery fault diagnosis using supervised manifold learning[J]. Mech Syst Signal Process 23(7):2301–2311

    Article  Google Scholar 

  6. Chen Z, Fu A, Deng RH et al (2021) Secure and verifiable outsourced data dimension reduction on dynamic data[J]. Inform Sci 573:182–193

    Article  MathSciNet  Google Scholar 

  7. Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction[J]. Science 290(5500):2319–2323

    Article  Google Scholar 

  8. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding[J]. Science 290(5500):2323–2326

    Article  Google Scholar 

  9. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation[J]. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  10. Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J]. SIAM J Sci Comput 26(1):313–338

    Article  MathSciNet  MATH  Google Scholar 

  11. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE[J]. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  12. Becht E, McInnes L, Healy J et al (2019) Dimensionality reduction for visualizing single-cell data using UMAP[J]. Nat Biotechnol 37(1):38–44

    Article  Google Scholar 

  13. Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning, pp 1278–1286

  14. Creswell A, White T, Dumoulin V et al (2018) Generative adversarial networks: an overview[J]. IEEE Signal Process Mag 35(1):53–65

    Article  Google Scholar 

  15. Yang T, Meng J (2023) Manifold fitting algorithm of noisy manifold data based on variable-scale spectral graph[J]. Soft Comput 27(1):471–482

    Article  Google Scholar 

  16. Yao Z, Wang Z, Liu X et al (2021) An improved low-frequency noise reduction method in shock wave pressure measurement based on mode classification and recursion extraction[J]. ISA Trans 109:315–326

    Article  Google Scholar 

  17. Ahuja A, Al-Zogbi L, Krieger A (2021) Application of noise-reduction techniques to machine learning algorithms for breast cancer tumor identification[J]. Comput Biol Med 135:104576–104576

    Article  Google Scholar 

  18. Park CR, Lee Y (2019) Fast non-local means noise reduction algorithm with acceleration function for improvement of image quality in gamma camera system: A phantom study[J]. Nucl Eng Technol 51(3):719–722

    Article  Google Scholar 

  19. Lin Z, Kang Z, Zhang L et al (2023) Multi-view Attributed Graph Clustering[J]. IEEE Trans Knowl Data Eng 35(02):1872–1880

    Google Scholar 

  20. Guo Y, Tierney S, Gao J (2020) Robust functional manifold clustering[J]. IEEE Trans Neural Netw Learn Syst 32(2):777–787

    Article  MathSciNet  Google Scholar 

  21. Sober B, Aizenbud Y, Levin D (2020) Approximation of functions over manifolds: a moving least-squares approach[J]. J Comput Appl Math 52(3):433–478

    MATH  Google Scholar 

  22. Li J, Kang Z, Peng C, et al (2021) Self-paced two-dimensional PCA[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8392–8400

  23. Jin C, Bachmann CM (2015) Modeling and mitigating noise in graph and manifold representations of hyperspectral imagery[C] Imaging Spectrometry XXI, pp 324–336

  24. Liu Y, Nie F, Gao Q et al (2019) Flexible unsupervised feature extraction for image classification[J]. Neural Netw 115:65–71

    Article  MATH  Google Scholar 

  25. Shuhui LI, Zhihong D, Xiaoxue F et al (2022) Joint parameter and state estimation for stochastic uncertain system with multivariate skew t noises[J]. Chin J Aeron 35(5):69–86

    Article  Google Scholar 

  26. Chang YS, Bai DS (2001) Control charts for positively-skewed populations with weighted standard deviations[J]. Qual Reliab Eng Int 17(5):397–406

    Article  Google Scholar 

  27. Hartigan JA, Wong MA (1979) A k-means clustering algorithm[J]. Appl Stat 28(1):100–108

    Article  MATH  Google Scholar 

  28. Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models[J]. Biometrics 65(3):701–709

    Article  MathSciNet  MATH  Google Scholar 

  29. Von Luxburg U (2007) A tutorial on spectral clustering[J]. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  30. Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods[J]. J Classif 1(1):7–24

    Article  MATH  Google Scholar 

  31. Matthey L, Higgins I, Hassabis D, et al (2017) dsprites: disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/

  32. Amid E, Warmuth MK (2019) TriMap: large-scale dimensionality reduction using triplets[J]. arXiv preprint arXiv:1910.00204

  33. Yoon R, Osting B (2023) A Dynamical System-Based Framework for Dimension Reduction[J]. Commun Appl Math Computat 2:1–33

    Google Scholar 

  34. Moon KR, van Dijk D, Wang Z et al (2019) Visualizing structure and transitions in high-dimensional biological data[J]. Nature Biotechnol 37(12):1482–1492

    Article  Google Scholar 

  35. LeCun Y, Cortes C, Burges CJC (1998) e MNIST database of handwri en digits. http://yann.lecun.com/exdb/mnist/

  36. Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms[J]. J Mach Learn Res 18(185):1–35

    Google Scholar 

  37. Myung IJ (2003) Tutorial on maximum likelihood estimation[J]. J Math Psychol 47(1):90–100

    Article  MathSciNet  MATH  Google Scholar 

  38. Liu Y, Nie F, Gao Q et al (2019) Flexible unsupervised feature extraction for image classification[J]. Neural Netw 115:65–71

    Article  MATH  Google Scholar 

  39. Kang Z, Lu X, Lu Y et al (2020) Structure learning with similarity preserving[J]. Neural Netw 129:138–148

    Article  Google Scholar 

Download references

Acknowledgements

This paper is supported by Science and Technology Basic Resources Investigation Project (No. 2019FY101404) and National Natural Science Foundation of China (No. 61903029) and Scientific and Technological Innovation Foundation of Shunde Graduate School, USTB (No. BK19CE017) and Scientific and Technological Innovation Foundation of Foshan Municipal People’s Government (No. BK20AE004)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, G., Yang, T. & Fu, D. Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 3923–3942 (2023). https://doi.org/10.1007/s13042-023-01873-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01873-y

Keywords

Navigation