Abstract
Manifold learning, which has emerged in recent years, plays an increasingly important role in machine learning. However, because inevitable noises and outliers destroy the manifold structure of data, the dimensionality reduction effect of manifold learning will be reduced. Therefore, this paper proposes a denoising algorithm for high-dimensional data based on manifold learning. The algorithm first projects noisy sample vectors onto the local manifold, thereby achieving noise reduction. Then, a statistical analysis of noises is performed to obtain a data boundary. Because all the data come from the same background and obey the same distribution, the sample vectors that are not within the data boundary are marked as outliers, and these outliers are eliminated. Finally, the dimension reduction of the data after noise reduction and outlier detection is performed. Experimental results show that the algorithm can effectively eliminate the interference of noises and outliers in high-dimensional datasets to some extent for manifold learning.

























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Izenman AJ (2012) Introduction to manifold learning[J]. Wiley Interdiscip Rev Comput Stat 4(5):439–446
Ayala D, Francis J (2015) Factorization homology of topological manifolds[J]. J Topol 8(4):1045–1084
Huang L, Lu J, Tan YP (2014) Multi-manifold metric learning for face recognition based on image sets[J]. J Vis Commun Image Represent 7(25):1774–1783
Wang B, Sun Y, Chu Y et al (2022) Refining electronic medical records representation in manifold subspace[J]. Bioinformatics 23(1):1–17
Jiang Q, Jia M, Hu J et al (2009) Machinery fault diagnosis using supervised manifold learning[J]. Mech Syst Signal Process 23(7):2301–2311
Chen Z, Fu A, Deng RH et al (2021) Secure and verifiable outsourced data dimension reduction on dynamic data[J]. Inform Sci 573:182–193
Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction[J]. Science 290(5500):2319–2323
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding[J]. Science 290(5500):2323–2326
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation[J]. Neural Comput 15(6):1373–1396
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment[J]. SIAM J Sci Comput 26(1):313–338
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE[J]. J Mach Learn Res 9:2579–2605
Becht E, McInnes L, Healy J et al (2019) Dimensionality reduction for visualizing single-cell data using UMAP[J]. Nat Biotechnol 37(1):38–44
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning, pp 1278–1286
Creswell A, White T, Dumoulin V et al (2018) Generative adversarial networks: an overview[J]. IEEE Signal Process Mag 35(1):53–65
Yang T, Meng J (2023) Manifold fitting algorithm of noisy manifold data based on variable-scale spectral graph[J]. Soft Comput 27(1):471–482
Yao Z, Wang Z, Liu X et al (2021) An improved low-frequency noise reduction method in shock wave pressure measurement based on mode classification and recursion extraction[J]. ISA Trans 109:315–326
Ahuja A, Al-Zogbi L, Krieger A (2021) Application of noise-reduction techniques to machine learning algorithms for breast cancer tumor identification[J]. Comput Biol Med 135:104576–104576
Park CR, Lee Y (2019) Fast non-local means noise reduction algorithm with acceleration function for improvement of image quality in gamma camera system: A phantom study[J]. Nucl Eng Technol 51(3):719–722
Lin Z, Kang Z, Zhang L et al (2023) Multi-view Attributed Graph Clustering[J]. IEEE Trans Knowl Data Eng 35(02):1872–1880
Guo Y, Tierney S, Gao J (2020) Robust functional manifold clustering[J]. IEEE Trans Neural Netw Learn Syst 32(2):777–787
Sober B, Aizenbud Y, Levin D (2020) Approximation of functions over manifolds: a moving least-squares approach[J]. J Comput Appl Math 52(3):433–478
Li J, Kang Z, Peng C, et al (2021) Self-paced two-dimensional PCA[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8392–8400
Jin C, Bachmann CM (2015) Modeling and mitigating noise in graph and manifold representations of hyperspectral imagery[C] Imaging Spectrometry XXI, pp 324–336
Liu Y, Nie F, Gao Q et al (2019) Flexible unsupervised feature extraction for image classification[J]. Neural Netw 115:65–71
Shuhui LI, Zhihong D, Xiaoxue F et al (2022) Joint parameter and state estimation for stochastic uncertain system with multivariate skew t noises[J]. Chin J Aeron 35(5):69–86
Chang YS, Bai DS (2001) Control charts for positively-skewed populations with weighted standard deviations[J]. Qual Reliab Eng Int 17(5):397–406
Hartigan JA, Wong MA (1979) A k-means clustering algorithm[J]. Appl Stat 28(1):100–108
Maugis C, Celeux G, Martin-Magniette ML (2009) Variable selection for clustering with Gaussian mixture models[J]. Biometrics 65(3):701–709
Von Luxburg U (2007) A tutorial on spectral clustering[J]. Stat Comput 17(4):395–416
Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods[J]. J Classif 1(1):7–24
Matthey L, Higgins I, Hassabis D, et al (2017) dsprites: disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/
Amid E, Warmuth MK (2019) TriMap: large-scale dimensionality reduction using triplets[J]. arXiv preprint arXiv:1910.00204
Yoon R, Osting B (2023) A Dynamical System-Based Framework for Dimension Reduction[J]. Commun Appl Math Computat 2:1–33
Moon KR, van Dijk D, Wang Z et al (2019) Visualizing structure and transitions in high-dimensional biological data[J]. Nature Biotechnol 37(12):1482–1492
LeCun Y, Cortes C, Burges CJC (1998) e MNIST database of handwri en digits. http://yann.lecun.com/exdb/mnist/
Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms[J]. J Mach Learn Res 18(185):1–35
Myung IJ (2003) Tutorial on maximum likelihood estimation[J]. J Math Psychol 47(1):90–100
Liu Y, Nie F, Gao Q et al (2019) Flexible unsupervised feature extraction for image classification[J]. Neural Netw 115:65–71
Kang Z, Lu X, Lu Y et al (2020) Structure learning with similarity preserving[J]. Neural Netw 129:138–148
Acknowledgements
This paper is supported by Science and Technology Basic Resources Investigation Project (No. 2019FY101404) and National Natural Science Foundation of China (No. 61903029) and Scientific and Technological Innovation Foundation of Shunde Graduate School, USTB (No. BK19CE017) and Scientific and Technological Innovation Foundation of Foshan Municipal People’s Government (No. BK20AE004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, G., Yang, T. & Fu, D. Manifold-based denoising, outlier detection, and dimension reduction algorithm for high-dimensional data. Int. J. Mach. Learn. & Cyber. 14, 3923–3942 (2023). https://doi.org/10.1007/s13042-023-01873-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01873-y