Skip to main content
Log in

Robust semi-supervised discriminant embedding method with soft label in kernel space

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Considering some problems of local linear embedding methods in semi-supervised scenarios, a robust scheme for generating soft labels is designed and a semi-supervised discrimination embedding method combined with soft labels in the kernel space is proposed in this paper. The method uses the kernel trick to perform the following related operations in the regenerative kernel Hilbert space. Firstly, in order to deal with the different distribution of labeled data, the confidence of generating soft labels is introduced in the label propagation process, and then the label density of data within the hypersphere whose unlabeled data is the center of the sphere is used to generate final soft labels. The scheme is experimentally demonstrated to generate more reliable soft labels even with lower labeling ratios. In order to make full use of the prior information provided by soft labels and the confidence of soft labels, a distance distortion function is used in feature learning to introduce the soft label prior information, and a penalty term about the global information is added to the optimization objective function, so that a more suitable low-dimensional representation for data classification can be obtained. Finally, the method is experimentally compared with various feature learning methods and visualized on handwritten digit dataset. The experiments show that this method performs well on various datasets of the UCI database and is successfully applied in image recognition, especially when the ratio of labeled data is in a low range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Chall Lect 1:32

    Google Scholar 

  2. Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 1:1

    Google Scholar 

  3. Van Engelen JE, Hoos H (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440

    Article  MathSciNet  MATH  Google Scholar 

  4. Sorzano C, Vargas J, Montano A P (2014) A survey of dimensionality reduction techniques. Comput Sci

  5. Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058

    Article  Google Scholar 

  6. Verleysen M (2004) Learning high-dimensional data, limitations and future trends in neural computation. NATO Sci Ser v186:141–162

    Google Scholar 

  7. Muller K, Mika S, Ratsch G et al (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Networks 12(2):181

    Article  Google Scholar 

  8. Shawetaylor J (2005) Kernel methods for pattern analysis. China Machine Press, Beijing

    Google Scholar 

  9. Roweis ST, Saul LK (2000) Nonliner dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  10. Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimension reduction via local tangent space alignment. J Shanghai Univ 8(4):406–424

    Article  MathSciNet  MATH  Google Scholar 

  11. Wang Z, Liu N, Guo Y (2021) Adaptive sliding window LSTM NN based RUL prediction for lithium-ion batteries integrating LTSA feature reconstruction. Neurocomputing 466:178–189

    Article  Google Scholar 

  12. Decoste D (2002) Visualizing mercer kernel feature spaces via kernelized locally-linear embeddings

  13. Li X, Lin S, Yan S et al (2008) Discriminant locally linear embedding with high-order tensor data. IEEE Trans Cybern 38(2):342–352

    Article  Google Scholar 

  14. Deng C, He X, Han J (2007) Semi-supervised discriminant analysis. In: Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE

  15. Ying H, Zhuo S (2016) Semi-supervised locality preserving discriminant analysis for hyperspectral classify-cation. In: 2016 9th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI). IEEE

  16. Cai D, He X, Zhou K, et al (2007) Locality sensitive discriminant analysis. In: IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12

  17. Feng J, Zhang J (2020) Unsupervised feature extraction in hyperspectral image based on improved neighborhood preserving embedding. In: IGARSS 2020–2020 IEEE international geoscience and remote sensing symposium. IEEE, pp 1291–1294

  18. Li X, Zhang Y, Zhang R (2021) Semi-supervised feature selection via generalized uncorrelated constraint and manifold embedding. IEEE Trans Neural Netw Learn Syst

  19. Zhao M, Zhang Z, Chow T et al (2014) Soft label based linear discriminant analysis for image recognition and retrieval. Comput Vis Image Underst 121:86–99

    Article  Google Scholar 

  20. Hong D, Yokoya N, Chanussot J et al (2019) Learning to propagate labels on graphs: an iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction. ISPRS J Photogramm Remote Sens 158:35–49

    Article  Google Scholar 

  21. Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926

    Article  MATH  Google Scholar 

  22. Ghojogh B, Ghodsi A, Karray F, et al. (2020) Locally linear embedding and its variants: Tutorial and survey. arXiv preprint http://arxiv.org/abs/2011.10925

  23. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305

    MATH  Google Scholar 

  24. Yolchuyeva S, Németh G, Gyires-Tóth B (2018) Text normalization with convolutional neural networks. Int J Speech Technol 21(3):589–600

    Article  Google Scholar 

  25. Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. In: Advances in neural information processing systems 17 [Neural information processing systems, NIPS 2004, December 13–18, 2004, Vancouver, British Columbia, Canada]

  26. Iscen A, Tolias G, Avrithis Y, et al (2019) Label propagation for deep semi-supervised learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE

  27. Xin G, Zhan DC, Zhou ZH (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern Part B Cybern Publ IEEE Syst Man Cybern Soc 35(6):1098–1107

    Article  Google Scholar 

  28. Deng TQ, Wang Q (2021) Semi-supervised class preserving locally linear embedding. CAAI Trans Intell Syst 16(1):98–107

    Google Scholar 

  29. Sugiyama M, Idé T, Nakajima S et al (2010) Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach Learn 78(1):35–61

    Article  MathSciNet  MATH  Google Scholar 

  30. Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8(1):1027–1061

    MATH  Google Scholar 

  31. Zhang Y, Fu Y, Wang Z, et al (2017) Fault detection based on modified kernel semi-supervised locally linear embedding. IEEE Access

  32. Su Z, Xiao H, Zhang Y et al (2017) Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction. Meas Sci Technol 28(5):055009

    Article  Google Scholar 

  33. Ismail Fawaz H, Forestier G, Weber J et al (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963

    Article  MathSciNet  MATH  Google Scholar 

  34. Tendeiro JN, Kiers HAL (2019) A review of issues about null hypothesis Bayesian testing. Psychol Methods 24(6):774

    Article  Google Scholar 

  35. Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161

    MathSciNet  MATH  Google Scholar 

  36. Garcia S, Herrera F (2008) An Extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(12)

  37. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  38. Aleryani A, Wang W (2018) Dealing with missing data and uncertainty in the context of data mining. International conference on hybrid artificial intelligence systems. Springer, Cham, pp 289–301

    Google Scholar 

  39. Shui W, Zhou M, Maddock S et al (2017) A PCA-based method for determining craniofacial relationship and sexual dimorphism of facial shapes. Comput Biol Med 90:33–49

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the Fundamental Research Funds for the Central Universities under Grant No. NS2022027 and the National Science and Technology Major Project under Grant No. J2019-I-0010-0010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong-Ping Zhao.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Appendix 1: Proof of convergence of Eq. (12)

By mathematical induction, Eq. (12) can be reduced to the following generalized form with initial values.

$$\begin{array}{*{20}c} {\overline{\user2{Y}}\left( t \right) = \overline{\user2{Y}}\left( 0 \right)\left( {{\varvec{FD}}_{\alpha } } \right)^{t} + \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right)\mathop \sum \limits_{\tau = 0}^{t - 1} \left( {{\varvec{FD}}_{\alpha } } \right)^{\tau } } \\ \end{array}$$
(24)

When \(t\) tends to positive infinity, we can get \(\underset{t\to \infty }{\mathrm{lim}}{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{t}=0\) because the elements in \({\varvec{F}}\) and \({{\varvec{D}}}_{\alpha }\) are ranging from 0 to 1. Using the summation formula of the infinite series, we can get \(\sum_{\tau =0}^{\infty }{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{\tau }={\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}\). Since each term in \(\underset{t\to \infty }{\mathrm{lim}}\overline{{\varvec{Y}} }\left(t\right)\) exists a definite real number corresponding to it, the iteration of Eq. (12) is eventually convergent and converges to the definite matrix \(\overline{{\varvec{Y}} }\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right){\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}\).

1.2 Appendix 2: Proof that the summation of each column of \(\overline{{\varvec{Y}} }\left({\varvec{t}}\right)\) in Eq. (13) is constant 1

Assuming that \({{\varvec{e}}}_{n}={\left[\mathrm{1,1},\dots ,1\right]}_{1\times n}\) is an all-ones vector, the problem can be transformed into proving that \({{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(t\right)={{\varvec{e}}}_{n}\) holds for any \(t\). In fact, the previous work has obtained that the following constraints hold, i.e., \({{\varvec{e}}}_{n}{\varvec{F}}={{\varvec{e}}}_{n}, {{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }={{\varvec{e}}}_{n}\). Using some tricks, the two equations are, respectively, right multiplied by the matrices \({{\varvec{D}}}_{\alpha },\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right)\). Subsequently, by adding the two equations, the following form can be obtained.

$$\begin{array}{*{20}c} {\begin{array}{*{20}l} {} \hfill & {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) + {\varvec{e}}_{n} {\varvec{FD}}_{\alpha } = {\varvec{e}}_{n} } \hfill \\ \Rightarrow \hfill & {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) = {\varvec{e}}_{n} \left( {{\varvec{I}}_{n} - {\varvec{FD}}_{\alpha } } \right)} \hfill \\ \end{array} } \\ \end{array}$$
(25)

Thus, the problem can be proved by mathematical induction. In this paper, we choose \(\overline{{\varvec{Y}} }\left(0\right)=\overline{{\varvec{Y}} }\). When \(t=0\), according to Eq. (12) and Eq. (25), we get

$$\begin{array}{*{20}c} {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( 1 \right) = {\varvec{e}}_{c + 1} \overline{\user2{Y}}F{\varvec{D}}_{\alpha } + {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) = {\varvec{e}}_{n} } \\ \end{array}$$
(26)

When \(t=k>1\), assuming that there is \({{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(k\right)={{\varvec{e}}}_{n}\) holds, according to Eq. (12) and the result of Eq. (25), it is obtained that

$$\begin{aligned} {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {k + 1} \right) & = {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( k \right){\varvec{FD}}_{\alpha } + {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) \\ & = {\varvec{e}}_{n} {\varvec{FD}}_{\alpha } + {\varvec{e}}_{n} \left( {{\varvec{I}}_{n} - {\varvec{FD}}_{\alpha } } \right) = {\varvec{e}}_{n} \\ \end{aligned}$$
(27)

Thus, the conclusion is proved to be valid.

1.3 Appendix 3: Proof of the range of values of the Euclidean distance in the kernel space in this paper

Centering the kernel matrix is equivalent to centering the data in the kernel space, which is a translation operation and does not change the distance between data points. Thus, the study of the maximum distance between data points in the kernel space is equivalent to the study of the data before kernel matrix centrality. For any \({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\in {\varvec{X}}\), there are two points \(\phi \left({{\varvec{x}}}_{i}\right),\phi \left({{\varvec{x}}}_{j}\right)\in {\mathfrak{R}}^{\nu }\) on the kernel space. It is not difficult to know from Eq. (7) that each element \({\varvec{K}}\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)\) of the kernel matrix takes values in the range \((\mathrm{0,1}]\). The original problem can be understood as follows: solve for the maximum distance between two points in the region of the positive semi-axis of a hypersphere with a radius of 1, in a high-dimensional kernel space. Thus, Eq. (8) can be simplified as follows.

$$\begin{aligned} d_{ij} & = \left\| {\phi \left( {{\varvec{x}}_{i} } \right) - \phi \left( {{\varvec{x}}_{j} } \right)} \right\| = \sqrt {{\varvec{K}}\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{i} } \right) + {\varvec{K}}\left( {{\varvec{x}}_{j} ,{\varvec{x}}_{j} } \right) - 2{\varvec{K}}\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} } \right)} \\ & = \sqrt {\left\| {\phi \left( {{\varvec{x}}_{i} } \right)} \right\|^{2} + \left\| {\phi \left( {{\varvec{x}}_{j} } \right)} \right\|^{2} - 2\phi \left( {{\varvec{x}}_{i} } \right)^{T} \phi \left( {{\varvec{x}}_{j} } \right)} \\ & \le \sqrt {1 + 1 - 0} = \sqrt 2 \\ \end{aligned}$$
(28)

where \(\Vert \cdot \Vert\) denotes the modulus of the vector formed between \(\phi \left({{\varvec{x}}}_{i}\right)\) and the origin, whose value is less than or equal to 1. On the other hand, for \(\nu \ge 2\), the points can be found that satisfies the given condition and make the distance of Eq. (28) obtain its maximum value, e.g., the point \(\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {{\left[ {1,0, \ldots ,0} \right]}}_{\nu }\) and \(\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {0,1, \ldots ,0}_{\nu }\). More generally, the equality sign holds when and only when the vectors formed by each of these two points and the origin satisfy the modulus of 1 and the inner product of 0. Moreover, since the distance is non-negative, the Euclidean distance between two points on the kernel space of this paper takes values between 0 and \(\sqrt{2}\).

1.4 Appendix 4: Proof of a simplified form

It is assumed that there are two data sets \({\varvec{X}}={[{{\varvec{x}}}_{1},{{\varvec{x}}}_{2},\dots ,{{\varvec{x}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}\) and \({\varvec{Y}}={[{{\varvec{y}}}_{1},{{\varvec{y}}}_{2},\dots ,{{\varvec{y}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}\). For any \({{\varvec{x}}}_{i},{{\varvec{y}}}_{i}\), the following simplification can be performed.

$$\begin{array}{*{20}c} {\left( {{\varvec{x}}_{i} - {\varvec{x}}_{j} } \right)^{T} \left( {{\varvec{y}}_{i} - {\varvec{y}}_{j} } \right) = \mathop \sum \limits_{\ell = 1}^{d} \left( {x_{i\ell } - x_{j\ell } } \right)\left( {y_{i\ell } - y_{j\ell } } \right) = tr\left( {{\varvec{X}}^{T} {\varvec{A}}^{ij} {\varvec{Y}}} \right)} \\ \end{array}$$
(29)

where \({\varvec{A}}^{ij} = \left[ {A^{ij}_{pq} } \right]_{n \times n} ,A^{ij}_{pq} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {p = q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ { - 1} \hfill & {p \ne q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, P., Zhao, YP. Robust semi-supervised discriminant embedding method with soft label in kernel space. Neural Comput & Applic 35, 8601–8623 (2023). https://doi.org/10.1007/s00521-022-08134-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08134-z

Keywords

Navigation