Abstract
Considering some problems of local linear embedding methods in semi-supervised scenarios, a robust scheme for generating soft labels is designed and a semi-supervised discrimination embedding method combined with soft labels in the kernel space is proposed in this paper. The method uses the kernel trick to perform the following related operations in the regenerative kernel Hilbert space. Firstly, in order to deal with the different distribution of labeled data, the confidence of generating soft labels is introduced in the label propagation process, and then the label density of data within the hypersphere whose unlabeled data is the center of the sphere is used to generate final soft labels. The scheme is experimentally demonstrated to generate more reliable soft labels even with lower labeling ratios. In order to make full use of the prior information provided by soft labels and the confidence of soft labels, a distance distortion function is used in feature learning to introduce the soft label prior information, and a penalty term about the global information is added to the optimization objective function, so that a more suitable low-dimensional representation for data classification can be obtained. Finally, the method is experimentally compared with various feature learning methods and visualized on handwritten digit dataset. The experiments show that this method performs well on various datasets of the UCI database and is successfully applied in image recognition, especially when the ratio of labeled data is in a low range.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Chall Lect 1:32
Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 1:1
Van Engelen JE, Hoos H (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Sorzano C, Vargas J, Montano A P (2014) A survey of dimensionality reduction techniques. Comput Sci
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
Verleysen M (2004) Learning high-dimensional data, limitations and future trends in neural computation. NATO Sci Ser v186:141–162
Muller K, Mika S, Ratsch G et al (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Networks 12(2):181
Shawetaylor J (2005) Kernel methods for pattern analysis. China Machine Press, Beijing
Roweis ST, Saul LK (2000) Nonliner dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimension reduction via local tangent space alignment. J Shanghai Univ 8(4):406–424
Wang Z, Liu N, Guo Y (2021) Adaptive sliding window LSTM NN based RUL prediction for lithium-ion batteries integrating LTSA feature reconstruction. Neurocomputing 466:178–189
Decoste D (2002) Visualizing mercer kernel feature spaces via kernelized locally-linear embeddings
Li X, Lin S, Yan S et al (2008) Discriminant locally linear embedding with high-order tensor data. IEEE Trans Cybern 38(2):342–352
Deng C, He X, Han J (2007) Semi-supervised discriminant analysis. In: Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE
Ying H, Zhuo S (2016) Semi-supervised locality preserving discriminant analysis for hyperspectral classify-cation. In: 2016 9th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI). IEEE
Cai D, He X, Zhou K, et al (2007) Locality sensitive discriminant analysis. In: IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12
Feng J, Zhang J (2020) Unsupervised feature extraction in hyperspectral image based on improved neighborhood preserving embedding. In: IGARSS 2020–2020 IEEE international geoscience and remote sensing symposium. IEEE, pp 1291–1294
Li X, Zhang Y, Zhang R (2021) Semi-supervised feature selection via generalized uncorrelated constraint and manifold embedding. IEEE Trans Neural Netw Learn Syst
Zhao M, Zhang Z, Chow T et al (2014) Soft label based linear discriminant analysis for image recognition and retrieval. Comput Vis Image Underst 121:86–99
Hong D, Yokoya N, Chanussot J et al (2019) Learning to propagate labels on graphs: an iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction. ISPRS J Photogramm Remote Sens 158:35–49
Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926
Ghojogh B, Ghodsi A, Karray F, et al. (2020) Locally linear embedding and its variants: Tutorial and survey. arXiv preprint http://arxiv.org/abs/2011.10925
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305
Yolchuyeva S, Németh G, Gyires-Tóth B (2018) Text normalization with convolutional neural networks. Int J Speech Technol 21(3):589–600
Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. In: Advances in neural information processing systems 17 [Neural information processing systems, NIPS 2004, December 13–18, 2004, Vancouver, British Columbia, Canada]
Iscen A, Tolias G, Avrithis Y, et al (2019) Label propagation for deep semi-supervised learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE
Xin G, Zhan DC, Zhou ZH (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern Part B Cybern Publ IEEE Syst Man Cybern Soc 35(6):1098–1107
Deng TQ, Wang Q (2021) Semi-supervised class preserving locally linear embedding. CAAI Trans Intell Syst 16(1):98–107
Sugiyama M, Idé T, Nakajima S et al (2010) Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach Learn 78(1):35–61
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8(1):1027–1061
Zhang Y, Fu Y, Wang Z, et al (2017) Fault detection based on modified kernel semi-supervised locally linear embedding. IEEE Access
Su Z, Xiao H, Zhang Y et al (2017) Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction. Meas Sci Technol 28(5):055009
Ismail Fawaz H, Forestier G, Weber J et al (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963
Tendeiro JN, Kiers HAL (2019) A review of issues about null hypothesis Bayesian testing. Psychol Methods 24(6):774
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161
Garcia S, Herrera F (2008) An Extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(12)
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Aleryani A, Wang W (2018) Dealing with missing data and uncertainty in the context of data mining. International conference on hybrid artificial intelligence systems. Springer, Cham, pp 289–301
Shui W, Zhou M, Maddock S et al (2017) A PCA-based method for determining craniofacial relationship and sexual dimorphism of facial shapes. Comput Biol Med 90:33–49
Acknowledgements
This research was supported by the Fundamental Research Funds for the Central Universities under Grant No. NS2022027 and the National Science and Technology Major Project under Grant No. J2019-I-0010-0010.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Appendix 1: Proof of convergence of Eq. (12)
By mathematical induction, Eq. (12) can be reduced to the following generalized form with initial values.
When \(t\) tends to positive infinity, we can get \(\underset{t\to \infty }{\mathrm{lim}}{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{t}=0\) because the elements in \({\varvec{F}}\) and \({{\varvec{D}}}_{\alpha }\) are ranging from 0 to 1. Using the summation formula of the infinite series, we can get \(\sum_{\tau =0}^{\infty }{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{\tau }={\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}\). Since each term in \(\underset{t\to \infty }{\mathrm{lim}}\overline{{\varvec{Y}} }\left(t\right)\) exists a definite real number corresponding to it, the iteration of Eq. (12) is eventually convergent and converges to the definite matrix \(\overline{{\varvec{Y}} }\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right){\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}\).
1.2 Appendix 2: Proof that the summation of each column of \(\overline{{\varvec{Y}} }\left({\varvec{t}}\right)\) in Eq. (13) is constant 1
Assuming that \({{\varvec{e}}}_{n}={\left[\mathrm{1,1},\dots ,1\right]}_{1\times n}\) is an all-ones vector, the problem can be transformed into proving that \({{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(t\right)={{\varvec{e}}}_{n}\) holds for any \(t\). In fact, the previous work has obtained that the following constraints hold, i.e., \({{\varvec{e}}}_{n}{\varvec{F}}={{\varvec{e}}}_{n}, {{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }={{\varvec{e}}}_{n}\). Using some tricks, the two equations are, respectively, right multiplied by the matrices \({{\varvec{D}}}_{\alpha },\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right)\). Subsequently, by adding the two equations, the following form can be obtained.
Thus, the problem can be proved by mathematical induction. In this paper, we choose \(\overline{{\varvec{Y}} }\left(0\right)=\overline{{\varvec{Y}} }\). When \(t=0\), according to Eq. (12) and Eq. (25), we get
When \(t=k>1\), assuming that there is \({{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(k\right)={{\varvec{e}}}_{n}\) holds, according to Eq. (12) and the result of Eq. (25), it is obtained that
Thus, the conclusion is proved to be valid.
1.3 Appendix 3: Proof of the range of values of the Euclidean distance in the kernel space in this paper
Centering the kernel matrix is equivalent to centering the data in the kernel space, which is a translation operation and does not change the distance between data points. Thus, the study of the maximum distance between data points in the kernel space is equivalent to the study of the data before kernel matrix centrality. For any \({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\in {\varvec{X}}\), there are two points \(\phi \left({{\varvec{x}}}_{i}\right),\phi \left({{\varvec{x}}}_{j}\right)\in {\mathfrak{R}}^{\nu }\) on the kernel space. It is not difficult to know from Eq. (7) that each element \({\varvec{K}}\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)\) of the kernel matrix takes values in the range \((\mathrm{0,1}]\). The original problem can be understood as follows: solve for the maximum distance between two points in the region of the positive semi-axis of a hypersphere with a radius of 1, in a high-dimensional kernel space. Thus, Eq. (8) can be simplified as follows.
where \(\Vert \cdot \Vert\) denotes the modulus of the vector formed between \(\phi \left({{\varvec{x}}}_{i}\right)\) and the origin, whose value is less than or equal to 1. On the other hand, for \(\nu \ge 2\), the points can be found that satisfies the given condition and make the distance of Eq. (28) obtain its maximum value, e.g., the point \(\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {{\left[ {1,0, \ldots ,0} \right]}}_{\nu }\) and \(\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {0,1, \ldots ,0}_{\nu }\). More generally, the equality sign holds when and only when the vectors formed by each of these two points and the origin satisfy the modulus of 1 and the inner product of 0. Moreover, since the distance is non-negative, the Euclidean distance between two points on the kernel space of this paper takes values between 0 and \(\sqrt{2}\).
1.4 Appendix 4: Proof of a simplified form
It is assumed that there are two data sets \({\varvec{X}}={[{{\varvec{x}}}_{1},{{\varvec{x}}}_{2},\dots ,{{\varvec{x}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}\) and \({\varvec{Y}}={[{{\varvec{y}}}_{1},{{\varvec{y}}}_{2},\dots ,{{\varvec{y}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}\). For any \({{\varvec{x}}}_{i},{{\varvec{y}}}_{i}\), the following simplification can be performed.
where \({\varvec{A}}^{ij} = \left[ {A^{ij}_{pq} } \right]_{n \times n} ,A^{ij}_{pq} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {p = q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ { - 1} \hfill & {p \ne q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, P., Zhao, YP. Robust semi-supervised discriminant embedding method with soft label in kernel space. Neural Comput & Applic 35, 8601–8623 (2023). https://doi.org/10.1007/s00521-022-08134-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08134-z