Robust semi-supervised discriminant embedding method with soft label in kernel space

Peng, Pei; Zhao, Yong-Ping

doi:10.1007/s00521-022-08134-z

Robust semi-supervised discriminant embedding method with soft label in kernel space

Original Article
Published: 26 December 2022

Volume 35, pages 8601–8623, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

183 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Considering some problems of local linear embedding methods in semi-supervised scenarios, a robust scheme for generating soft labels is designed and a semi-supervised discrimination embedding method combined with soft labels in the kernel space is proposed in this paper. The method uses the kernel trick to perform the following related operations in the regenerative kernel Hilbert space. Firstly, in order to deal with the different distribution of labeled data, the confidence of generating soft labels is introduced in the label propagation process, and then the label density of data within the hypersphere whose unlabeled data is the center of the sphere is used to generate final soft labels. The scheme is experimentally demonstrated to generate more reliable soft labels even with lower labeling ratios. In order to make full use of the prior information provided by soft labels and the confidence of soft labels, a distance distortion function is used in feature learning to introduce the soft label prior information, and a penalty term about the global information is added to the optimization objective function, so that a more suitable low-dimensional representation for data classification can be obtained. Finally, the method is experimentally compared with various feature learning methods and visualized on handwritten digit dataset. The experiments show that this method performs well on various datasets of the UCI database and is successfully applied in image recognition, especially when the ratio of labeled data is in a low range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DKE-RLS: A Manifold Reconstruction Algorithm in Label Spaces with Double Kernel Embedding-Regularized Least Square

Robust Soft Semi-supervised Discriminant Projection for Feature Learning

A Flexible Semi-supervised Feature Extraction Method for Image Classification

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Chall Lect 1:32
Google Scholar
Zhou ZH (2018) A brief introduction to weakly supervised learning. Natl Sci Rev 1:1
Google Scholar
Van Engelen JE, Hoos H (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Article MathSciNet MATH Google Scholar
Sorzano C, Vargas J, Montano A P (2014) A survey of dimensionality reduction techniques. Comput Sci
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
Article Google Scholar
Verleysen M (2004) Learning high-dimensional data, limitations and future trends in neural computation. NATO Sci Ser v186:141–162
Google Scholar
Muller K, Mika S, Ratsch G et al (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Networks 12(2):181
Article Google Scholar
Shawetaylor J (2005) Kernel methods for pattern analysis. China Machine Press, Beijing
Google Scholar
Roweis ST, Saul LK (2000) Nonliner dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimension reduction via local tangent space alignment. J Shanghai Univ 8(4):406–424
Article MathSciNet MATH Google Scholar
Wang Z, Liu N, Guo Y (2021) Adaptive sliding window LSTM NN based RUL prediction for lithium-ion batteries integrating LTSA feature reconstruction. Neurocomputing 466:178–189
Article Google Scholar
Decoste D (2002) Visualizing mercer kernel feature spaces via kernelized locally-linear embeddings
Li X, Lin S, Yan S et al (2008) Discriminant locally linear embedding with high-order tensor data. IEEE Trans Cybern 38(2):342–352
Article Google Scholar
Deng C, He X, Han J (2007) Semi-supervised discriminant analysis. In: Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE
Ying H, Zhuo S (2016) Semi-supervised locality preserving discriminant analysis for hyperspectral classify-cation. In: 2016 9th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI). IEEE
Cai D, He X, Zhou K, et al (2007) Locality sensitive discriminant analysis. In: IJCAI 2007, proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, January 6–12
Feng J, Zhang J (2020) Unsupervised feature extraction in hyperspectral image based on improved neighborhood preserving embedding. In: IGARSS 2020–2020 IEEE international geoscience and remote sensing symposium. IEEE, pp 1291–1294
Li X, Zhang Y, Zhang R (2021) Semi-supervised feature selection via generalized uncorrelated constraint and manifold embedding. IEEE Trans Neural Netw Learn Syst
Zhao M, Zhang Z, Chow T et al (2014) Soft label based linear discriminant analysis for image recognition and retrieval. Comput Vis Image Underst 121:86–99
Article Google Scholar
Hong D, Yokoya N, Chanussot J et al (2019) Learning to propagate labels on graphs: an iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction. ISPRS J Photogramm Remote Sens 158:35–49
Article Google Scholar
Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926
Article MATH Google Scholar
Ghojogh B, Ghodsi A, Karray F, et al. (2020) Locally linear embedding and its variants: Tutorial and survey. arXiv preprint http://arxiv.org/abs/2011.10925
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305
MATH Google Scholar
Yolchuyeva S, Németh G, Gyires-Tóth B (2018) Text normalization with convolutional neural networks. Int J Speech Technol 21(3):589–600
Article Google Scholar
Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. In: Advances in neural information processing systems 17 [Neural information processing systems, NIPS 2004, December 13–18, 2004, Vancouver, British Columbia, Canada]
Iscen A, Tolias G, Avrithis Y, et al (2019) Label propagation for deep semi-supervised learning. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE
Xin G, Zhan DC, Zhou ZH (2005) Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern Part B Cybern Publ IEEE Syst Man Cybern Soc 35(6):1098–1107
Article Google Scholar
Deng TQ, Wang Q (2021) Semi-supervised class preserving locally linear embedding. CAAI Trans Intell Syst 16(1):98–107
Google Scholar
Sugiyama M, Idé T, Nakajima S et al (2010) Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach Learn 78(1):35–61
Article MathSciNet MATH Google Scholar
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8(1):1027–1061
MATH Google Scholar
Zhang Y, Fu Y, Wang Z, et al (2017) Fault detection based on modified kernel semi-supervised locally linear embedding. IEEE Access
Su Z, Xiao H, Zhang Y et al (2017) Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction. Meas Sci Technol 28(5):055009
Article Google Scholar
Ismail Fawaz H, Forestier G, Weber J et al (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917–963
Article MathSciNet MATH Google Scholar
Tendeiro JN, Kiers HAL (2019) A review of issues about null hypothesis Bayesian testing. Psychol Methods 24(6):774
Article Google Scholar
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161
MathSciNet MATH Google Scholar
Garcia S, Herrera F (2008) An Extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9(12)
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Aleryani A, Wang W (2018) Dealing with missing data and uncertainty in the context of data mining. International conference on hybrid artificial intelligence systems. Springer, Cham, pp 289–301
Google Scholar
Shui W, Zhou M, Maddock S et al (2017) A PCA-based method for determining craniofacial relationship and sexual dimorphism of facial shapes. Comput Biol Med 90:33–49
Article Google Scholar

Download references

Acknowledgements

This research was supported by the Fundamental Research Funds for the Central Universities under Grant No. NS2022027 and the National Science and Technology Major Project under Grant No. J2019-I-0010-0010.

Author information

Authors and Affiliations

College of Energy and Power Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, People’s Republic of China
Pei Peng & Yong-Ping Zhao

Authors

Pei Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Ping Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong-Ping Zhao.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix 1: Proof of convergence of Eq. (12)

By mathematical induction, Eq. (12) can be reduced to the following generalized form with initial values.

$$\begin{array}{*{20}c} {\overline{\user2{Y}}\left( t \right) = \overline{\user2{Y}}\left( 0 \right)\left( {{\varvec{FD}}_{\alpha } } \right)^{t} + \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right)\mathop \sum \limits_{\tau = 0}^{t - 1} \left( {{\varvec{FD}}_{\alpha } } \right)^{\tau } } \\ \end{array}$$

(24)

When $t$ tends to positive infinity, we can get $\underset{t\to \infty }{\mathrm{lim}}{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{t}=0$ because the elements in ${\varvec{F}}$ and ${{\varvec{D}}}_{\alpha }$ are ranging from 0 to 1. Using the summation formula of the infinite series, we can get $\sum_{\tau =0}^{\infty }{\left({\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{\tau }={\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}$. Since each term in $\underset{t\to \infty }{\mathrm{lim}}\overline{{\varvec{Y}} }\left(t\right)$ exists a definite real number corresponding to it, the iteration of Eq. (12) is eventually convergent and converges to the definite matrix $\overline{{\varvec{Y}} }\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right){\left({{\varvec{I}}}_{n}-{\varvec{F}}{{\varvec{D}}}_{\alpha }\right)}^{-1}$.

1.2 Appendix 2: Proof that the summation of each column of $\overline{{\varvec{Y}} }\left({\varvec{t}}\right)$ in Eq. (13) is constant 1

Assuming that ${{\varvec{e}}}_{n}={\left[\mathrm{1,1},\dots ,1\right]}_{1\times n}$ is an all-ones vector, the problem can be transformed into proving that ${{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(t\right)={{\varvec{e}}}_{n}$ holds for any $t$. In fact, the previous work has obtained that the following constraints hold, i.e., ${{\varvec{e}}}_{n}{\varvec{F}}={{\varvec{e}}}_{n}, {{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }={{\varvec{e}}}_{n}$. Using some tricks, the two equations are, respectively, right multiplied by the matrices ${{\varvec{D}}}_{\alpha },\left({{\varvec{I}}}_{n}-{{\varvec{D}}}_{\alpha }\right)$. Subsequently, by adding the two equations, the following form can be obtained.

$$\begin{array}{*{20}c} {\begin{array}{*{20}l} {} \hfill & {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) + {\varvec{e}}_{n} {\varvec{FD}}_{\alpha } = {\varvec{e}}_{n} } \hfill \\ \Rightarrow \hfill & {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) = {\varvec{e}}_{n} \left( {{\varvec{I}}_{n} - {\varvec{FD}}_{\alpha } } \right)} \hfill \\ \end{array} } \\ \end{array}$$

(25)

Thus, the problem can be proved by mathematical induction. In this paper, we choose $\overline{{\varvec{Y}} }\left(0\right)=\overline{{\varvec{Y}} }$. When $t=0$, according to Eq. (12) and Eq. (25), we get

$$\begin{array}{*{20}c} {{\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( 1 \right) = {\varvec{e}}_{c + 1} \overline{\user2{Y}}F{\varvec{D}}_{\alpha } + {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) = {\varvec{e}}_{n} } \\ \end{array}$$

(26)

When $t=k>1$, assuming that there is ${{\varvec{e}}}_{c+1}\overline{{\varvec{Y}} }\left(k\right)={{\varvec{e}}}_{n}$ holds, according to Eq. (12) and the result of Eq. (25), it is obtained that

$$\begin{aligned} {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {k + 1} \right) & = {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( k \right){\varvec{FD}}_{\alpha } + {\varvec{e}}_{c + 1} \overline{\user2{Y}}\left( {{\varvec{I}}_{n} - {\varvec{D}}_{\alpha } } \right) \\ & = {\varvec{e}}_{n} {\varvec{FD}}_{\alpha } + {\varvec{e}}_{n} \left( {{\varvec{I}}_{n} - {\varvec{FD}}_{\alpha } } \right) = {\varvec{e}}_{n} \\ \end{aligned}$$

(27)

Thus, the conclusion is proved to be valid.

1.3 Appendix 3: Proof of the range of values of the Euclidean distance in the kernel space in this paper

Centering the kernel matrix is equivalent to centering the data in the kernel space, which is a translation operation and does not change the distance between data points. Thus, the study of the maximum distance between data points in the kernel space is equivalent to the study of the data before kernel matrix centrality. For any ${{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\in {\varvec{X}}$, there are two points $\phi \left({{\varvec{x}}}_{i}\right),\phi \left({{\varvec{x}}}_{j}\right)\in {\mathfrak{R}}^{\nu }$ on the kernel space. It is not difficult to know from Eq. (7) that each element ${\varvec{K}}\left({{\varvec{x}}}_{i},{{\varvec{x}}}_{j}\right)$ of the kernel matrix takes values in the range $(\mathrm{0,1}]$. The original problem can be understood as follows: solve for the maximum distance between two points in the region of the positive semi-axis of a hypersphere with a radius of 1, in a high-dimensional kernel space. Thus, Eq. (8) can be simplified as follows.

$$\begin{aligned} d_{ij} & = \left\| {\phi \left( {{\varvec{x}}_{i} } \right) - \phi \left( {{\varvec{x}}_{j} } \right)} \right\| = \sqrt {{\varvec{K}}\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{i} } \right) + {\varvec{K}}\left( {{\varvec{x}}_{j} ,{\varvec{x}}_{j} } \right) - 2{\varvec{K}}\left( {{\varvec{x}}_{i} ,{\varvec{x}}_{j} } \right)} \\ & = \sqrt {\left\| {\phi \left( {{\varvec{x}}_{i} } \right)} \right\|^{2} + \left\| {\phi \left( {{\varvec{x}}_{j} } \right)} \right\|^{2} - 2\phi \left( {{\varvec{x}}_{i} } \right)^{T} \phi \left( {{\varvec{x}}_{j} } \right)} \\ & \le \sqrt {1 + 1 - 0} = \sqrt 2 \\ \end{aligned}$$

(28)

where $\Vert \cdot \Vert$ denotes the modulus of the vector formed between $\phi \left({{\varvec{x}}}_{i}\right)$ and the origin, whose value is less than or equal to 1. On the other hand, for $\nu \ge 2$, the points can be found that satisfies the given condition and make the distance of Eq. (28) obtain its maximum value, e.g., the point $\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {{\left[ {1,0, \ldots ,0} \right]}}_{\nu }$ and $\phi \left( {{\varvec{x}}_{i} } \right) = \underbrace {0,1, \ldots ,0}_{\nu }$. More generally, the equality sign holds when and only when the vectors formed by each of these two points and the origin satisfy the modulus of 1 and the inner product of 0. Moreover, since the distance is non-negative, the Euclidean distance between two points on the kernel space of this paper takes values between 0 and $\sqrt{2}$.

1.4 Appendix 4: Proof of a simplified form

It is assumed that there are two data sets ${\varvec{X}}={[{{\varvec{x}}}_{1},{{\varvec{x}}}_{2},\dots ,{{\varvec{x}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}$ and ${\varvec{Y}}={[{{\varvec{y}}}_{1},{{\varvec{y}}}_{2},\dots ,{{\varvec{y}}}_{n}]}^{T}\in {\mathfrak{R}}^{n\times d}$. For any ${{\varvec{x}}}_{i},{{\varvec{y}}}_{i}$, the following simplification can be performed.

$$\begin{array}{*{20}c} {\left( {{\varvec{x}}_{i} - {\varvec{x}}_{j} } \right)^{T} \left( {{\varvec{y}}_{i} - {\varvec{y}}_{j} } \right) = \mathop \sum \limits_{\ell = 1}^{d} \left( {x_{i\ell } - x_{j\ell } } \right)\left( {y_{i\ell } - y_{j\ell } } \right) = tr\left( {{\varvec{X}}^{T} {\varvec{A}}^{ij} {\varvec{Y}}} \right)} \\ \end{array}$$

(29)

where ${\varvec{A}}^{ij} = \left[ {A^{ij}_{pq} } \right]_{n \times n} ,A^{ij}_{pq} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {p = q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ { - 1} \hfill & {p \ne q\,{\text{ and}}\,{ }p,q \in \left\{ {i,j} \right\}} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peng, P., Zhao, YP. Robust semi-supervised discriminant embedding method with soft label in kernel space. Neural Comput & Applic 35, 8601–8623 (2023). https://doi.org/10.1007/s00521-022-08134-z

Download citation

Received: 24 February 2022
Accepted: 29 November 2022
Published: 26 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08134-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust semi-supervised discriminant embedding method with soft label in kernel space

Abstract

Access this article

Similar content being viewed by others

DKE-RLS: A Manifold Reconstruction Algorithm in Label Spaces with Double Kernel Embedding-Regularized Least Square

Robust Soft Semi-supervised Discriminant Projection for Feature Learning

A Flexible Semi-supervised Feature Extraction Method for Image Classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 Appendix 1: Proof of convergence of Eq. (12)

1.2 Appendix 2: Proof that the summation of each column of \(\overline{{\varvec{Y}} }\left({\varvec{t}}\right)\) in Eq. (13) is constant 1

1.3 Appendix 3: Proof of the range of values of the Euclidean distance in the kernel space in this paper

1.4 Appendix 4: Proof of a simplified form

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust semi-supervised discriminant embedding method with soft label in kernel space

Abstract

Access this article

Similar content being viewed by others

DKE-RLS: A Manifold Reconstruction Algorithm in Label Spaces with Double Kernel Embedding-Regularized Least Square

Robust Soft Semi-supervised Discriminant Projection for Feature Learning

A Flexible Semi-supervised Feature Extraction Method for Image Classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix 1: Proof of convergence of Eq. (12)

1.2 Appendix 2: Proof that the summation of each column of \(\overline{{\varvec{Y}} }\left({\varvec{t}}\right)\) in Eq. (13) is constant 1

1.3 Appendix 3: Proof of the range of values of the Euclidean distance in the kernel space in this paper

1.4 Appendix 4: Proof of a simplified form

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation