LMSVCR: novel effective method of semi-supervised multi-classification

Dong, Zijie; Qin, Yimo; Zou, Bin; Xu, Jie; Tang, Yuan Yan

doi:10.1007/s00521-021-06647-7

LMSVCR: novel effective method of semi-supervised multi-classification

Original Article
Published: 08 November 2021

Volume 34, pages 3857–3873, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zijie Dong¹,
Yimo Qin¹,
Bin Zou ORCID: orcid.org/0000-0002-8649-1538¹,
Jie Xu² &
…
Yuan Yan Tang³

415 Accesses
4 Citations
Explore all metrics

Abstract

The previously known works studying the learning performance of multi-classification algorithm are usually based on supervised samples, but large amount of data generated in real-life is usually unlabeled. This paper introduces a novel Laplacian multi-classification support vector classification and regression (LMSVCR) algorithm for the case of semi-supervised learning. We first establish the fast learning rate of LMSVCR algorithm with semi-supervised multi-classification samples, and prove that LMSVCR algorithm with semi-supervised multi-classification samples is consistent. We show the numerical investigation on the learning performance of LMSVCR algorithm. The experimental studies indicate that the proposed LMSVCR algorithm has better learning performance in terms of prediction accuracy, sampling and training total time than other semi-supervised multi-classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A meta-learning network method for few-shot multi-class classification problems with numerical data

Article Open access 11 December 2023

Methods for class-imbalanced learning with support vector machines: a review and an empirical evaluation

Article 25 July 2024

Multi-class support vector machine based on the minimization of class variance

Article 03 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://archive.ics.uci.edu/ml/datasets.html.

References

Altun Y, McAllester D, Belkin M (2005) Maximum margin semi-supervised learning for structured variables. Adv Neural Inf Process Syst 18:33–40
Google Scholar
Hady MFA, Schwenker F (2013) Semi-supervised learning. Handbook on Neural Information Processing. Springer, Berlin, Heidelberg, pp 215–239
Chapter Google Scholar
Chapelle O, Schölkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542
Article Google Scholar
Zhu XJ (2005) Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences
Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping cohen Cgrossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415
Article Google Scholar
Du B, Liu Y, Abbas IA (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Franklin Institute 353(2):448–461
Article MathSciNet Google Scholar
Rebai I, BenAyed Y, Mahdi W (2016) Deep multilayer multiple kernel learning. Neural Comput. Appl. 27:2305–2314
Article Google Scholar
Li X, Mao W, Jiang W (2016) Multiple-kernel-learning-based extreme learning machine for classification design. Neural Comput. Appl. 27:175–184
Article Google Scholar
Carballal A, Fernandez-Lozano C, Heras J, Romero J (2020) Transfer learning features for predicting aesthetics through a novel hybrid machine learning method. Neural Comput. Appl. 32:5889–5900
Article Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. Int Conf Mach Learn 99:200–209
Google Scholar
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434
MathSciNet MATH Google Scholar
Bennett K, Mangasarian OL (1999) Combining support vector and mathematical programming methods for induction. Advances in Kernel Methods-SV Learning 307–326
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In Esann, pp. 219–224
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81
Article MathSciNet Google Scholar
Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Jackel LD, LeCun Y, Muller UA, Sackinger E, Simard P, Vapnik V (1994) Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77–82
Krebel UHG (1999) Pairwise classification and support vector machines. In Advances in kernel methods: support vector learning, pp. 255–268
Angulo C, Parra X, Catala A (2003) K-SVCR. A support vector machine for multi-class classification. Neurocomputing 55(1–2):57–77
Article Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404
Article MathSciNet Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50
Article MathSciNet Google Scholar
Feng Y, Yang Y, Zhao Y, Lv S, Suykens JA (2014) Learning with kernelized elastic net regularization. KU Leuven, Leuven Belgium
Google Scholar
Xu Y, Yang Z (2014) Elastic-net regression algorithm based on multi-scale gaussian kernel. Sci J Inf Eng 4(1):19–25
Google Scholar
Wang W, Xu Z, Lu W, Zhang X (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3–4):643–663
Article Google Scholar
Wu Q, Zhou DX (2005) SVM soft margin classifiers: linear programming versus quadratic programming. Neural Comput 17(5):1160–1187
Article MathSciNet Google Scholar
Wu Q, Ying Y, Zhou DX (2006) Learning rates of least-square regularized regression. Foundations Comput Math 6(2):171–192
Article MathSciNet Google Scholar
Lv SG, Zhou F (2015) Optimal learning rates of $l^{p}$-type multiple kernel learning under general conditions. Inf Sci 294:255–268
Article Google Scholar
Chen DR, Wu Q, Ying Y, Zhou DX (2004) Support vector machine soft margin classifiers: error analysis. J Mach Learn Res 5:1143–1175
MathSciNet MATH Google Scholar
Tong H, Chen DR, Peng L (2009) Analysis of support vector machines regression. Foundations Comput Math 9(2):243–257
Article MathSciNet Google Scholar
Chen DR, Xiang DH (2006) The consistency of multicategory support vector machines. Adv Comput Math 24(1–4):155–169
Article MathSciNet Google Scholar
Chen H, Li L (2009) Semisupervised multicategory classification with imperfect model. IEEE Trans Neural Netw 20(10):1594–1603
Article Google Scholar
Bamakan SMH, Wang H, Shi Y (2017) Ramp loss k-support vector classification-regression; a robust and sparse multi-class approach to the intrusion detection problem. Knowledge-Based Syst 126:113–126
Article Google Scholar
Huang CL, Dun JF (2008) A distributed PSO CSVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391
Article Google Scholar
Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Article Google Scholar
Qian M, Nie F, Zhang C (2009) Efficient multi-class unlabeled constrained semi-supervised SVM. In Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1665–1668
Pan H, Kang Z (2018) Robust graph learning for semi-supervised classification. In 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics, pp 265–268
Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202
Chapter Google Scholar
Cucker F, Smale S (2002) Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations Comput Math 2(4):413–428
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported in part by National Nature Science Foundation of China (No. 61772011), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2018002), and National Key Research and Development Program of China (No. 2020YFA0714200).

Author information

Authors and Affiliations

Hubei Key Laboratory of Applied Mathematics, Faculty of Mathematics and Statistics, Hubei University, Wuhan, 430062, China
Zijie Dong, Yimo Qin & Bin Zou
Faculty of Computer Science and Information Engineering, Hubei University, Wuhan, 430062, China
Jie Xu
Faculty of Science and Technology, University of Macau, Macau, 999078, China
Yuan Yan Tang

Authors

Zijie Dong
View author publications
You can also search for this author inPubMed Google Scholar
Yimo Qin
View author publications
You can also search for this author inPubMed Google Scholar
Bin Zou
View author publications
You can also search for this author inPubMed Google Scholar
Jie Xu
View author publications
You can also search for this author inPubMed Google Scholar
Yuan Yan Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Bin Zou or Jie Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

To bound the excess generalization error ${{\mathcal {E}}}(f_{\mathbf{z}}) - {{\mathcal {E}}}(f_B)$ of LMSVCR, by Proposition 1, we should estimate the errors $T_1, T_2, T_3$. Thus we first present the main tools as follows:

Lemma 1

[24] Let $\xi$ be a random variable on a probability space Z with mean $E(\xi )$, variance $\sigma ^2(\xi ) = \sigma ^2$, and satisfying $|\xi (z) - E(\xi )| \le M_{\xi }$ for almost all $z \in Z$. Then for all $\varepsilon > 0$,

$$\begin{aligned} \mathrm{P} \left \{ \frac{1}{m}\sum _{i=1}^{m} \xi (z_i) - E(\xi ) \ge \varepsilon \right \} \le \exp \left \{ -\frac{m \varepsilon ^2}{2(\sigma ^2 + \frac{1}{3} M_{\xi } \varepsilon )}\right \}. \end{aligned}$$

Lemma 2

[24] Let ${\mathcal {G}}$ be a set of functions on Z such that for some ${c_\rho } \ge 0$, $|g - E(g)| \le B$ almost everywhere and $E(g^2) \le {c_\rho }E(g)$ for each $g \in {{\mathcal {G}}}$ Then for every $\varepsilon > 0$ and $0 < \alpha \le 1$,

$$P\left\{ {\mathop {\sup \frac{{E\left( g \right) - \frac{1}{m}\sum\nolimits_{{i = 1}}^{m} {g\left( {z_{i} } \right)} }}{{\sqrt {E\left( g \right) + \varepsilon } }} \ge 4\alpha \sqrt \varepsilon }\limits_{{g \in \mathcal {G}}} } \right\} \le {\mathcal {N}}\left( {\mathcal {G},\alpha \varepsilon } \right){\text{exp}}\left\{ { - \frac{{\alpha ^{2} m\varepsilon }}{{2c_{\rho } + \frac{2}{3}B}}} \right\}$$

Lemma 3

[36] Let $c_1,c_2 > 0$, $p_1> p_2 > 0$. The equation $x^{p_1} - c_1 x^{p_2} - c_2 = 0$ has a unique positive zero $x^{*}$. And $x^{*} \le \max \{(2 c_1)^{1/(p_1-p_2)}, (2 c_2)^{1/{p_1}}\}$.

Proof of Proposition 1:

Since for any ${{\mathbf {z}}} \in {Z}^m$, $\lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2\ge 0, \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 \ge 0$, we have the following error decomposition

$$\begin{aligned}&{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)\\&\quad \le {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2\\&\quad = \big \{{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{{\mathcal {E}}}_{\mathbf{z}}(f_{{\mathbf {z}}})} + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}})} - {{{\mathcal {E}}}(f_B)} + {{{\mathcal {E}}}(f_B)}\\&\qquad + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)} - {{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)}\\&\qquad + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1})} - {{\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1})} - {{{\mathcal {E}}}(f_{\lambda _1})} + {{{\mathcal {E}}}(f_{\lambda _1})} - {{{\mathcal {E}}}_{\mathbf{z}}(f_{{{\mathbf {z}}},\lambda _1})}\\&\qquad + {{{\mathcal {E}}}_{\mathbf{z}}(f_{{{\mathbf {z}}},\lambda _1})} - {{\mathcal {E}}}(f_B)\big \}\\&\qquad +\big \{ {\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2} - {\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2} + {\lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2} - {\lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2}\\&\qquad + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2\big \} \\&\quad =\big \{{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \\&\qquad + {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) - {\mathcal E}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1})\\&\qquad + \lambda _1\Vert f_{\mathbf{z}}\Vert _{{\mathcal {K}}}^2 +\lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 - \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1}) - {{\mathcal {E}}}(f_{\lambda _1}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\\&\qquad + {{\mathcal {E}}}(f_B)\big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1}) +\\&\qquad \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 - \lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2\big \}\\&\qquad +\big \{{{\mathcal {E}}}(f_{\lambda _1}) - {{\mathcal {E}}}(f_B) + \lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2\big \}\\&\quad =: T_1 + T_2 + T_3 + T_4 + D(\lambda _1)\\&\quad \le T_1 + T_2 + T_3 + D(\lambda _1). \end{aligned}$$

The last inequality above is follows from the fact that $T_4 \le 0$ since by the definition $f_{{\mathbf {z}},\lambda _1}$, we have ${{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) + \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1}) + \lambda _1\Vert f_{\lambda _1}\Vert _{\mathcal K}^2.$ where $T_1, T_2, T_3$ are defined in Proposition 1. Then we complete the proof of Proposition 1. $\square$

Proposition 4

Assume ${{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m$ are i.i.d. sample set. We have that for any $0< \delta < 1$, with confidence at least $1 - \delta /2$,

$$\begin{aligned} T_1&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2), \end{aligned}$$

where ${\varepsilon }^{*}(m,2/\delta ) = \max \big \{\frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}, (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}} \big \}$.

Proof

Set $\zeta _1 = W(y,f) - W(y,f_B)$. Obviously, $\zeta _1$ varies among a set of functions in accordance with the varying sample ${\mathbf {z}}$. Applying Lemma 2 to the function set

$$\begin{aligned} {{\mathcal {F}}}_{R} = \{ g := W(y, f) - W(y, f_B), f \in {\mathcal B}_{R} \}. \end{aligned}$$

Hence we first make sure that the functions have a bound in ${{\mathcal {F}}}_{R}$. Not only $E(g) = {{\mathcal {E}}}(f) - {\mathcal E}(f_B) \ge 0$, $\frac{1}{m} \sum _{i = 1}^{m} g(z_i) = {\mathcal E}_{{\mathbf {z}}}(f) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)$, and $g = C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 [(f(x) - f_{B}(x))(f(x) + f_{B}(x))] \cdot {\mathbf{1}}_{\{y = 0\}}.$ But also $\Vert f\Vert _{\infty } \le \kappa \Vert f\Vert _{{\mathcal {K}}} \le \kappa R$ and $|f_B(x)| \le M$ almost everywhere. We have

$$\begin{aligned} |g|\le & {} C_{1}(\kappa R + M)+ C_{2}(\kappa R + M)(\kappa R + M)\\\le & {} 2C(\kappa R + M)^2, \end{aligned}$$

where $C=\max \{C_{1}, C_{2}\}$. Then we get $|g - E(g)| \le 4C(\kappa R + M)^2$ almost everywhere. Also,

$$\begin{aligned} g^2 =&[ C_1 ((1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2 (f^2(x) - f_{B}^2(x)) \cdot {{\mathbf {1}}}_{\{y = 0\}}]^2\\ =&C_1^2 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}]^2 \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2^2 [ f^2(x) - f_{B}^2(x)]^2 \cdot {{\mathbf {1}}}_{\{y = 0\}}\\ \le&C C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}](\kappa R + M) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C C_2 [ f^2(x) - f_{B}^2(x)]({\kappa }^2 R^2 + M^2) \cdot {{\mathbf {1}}}_{\{y = 0\}}\\ \le&C(\kappa R + M)^2 [C_1 ((1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2 (f^2(x) - f_{B}^2(x)) \cdot {{\mathbf {1}}}_{\{y = 0\}}]. \end{aligned}$$

Thus $E(g^2) \le C(\kappa R + M)^2 E(g)$. Applying Lemma 2 to the function set ${{\mathcal {F}}}_R$, we have that inequality

$$\begin{aligned}&\sup _{f \in {{\mathcal {B}}}_R} \frac{{{\mathcal {E}}}(f)-{\mathcal E}(f_B)-({{\mathcal {E}}}_{{\mathbf {z}}}(f)-{{\mathcal {E}}}_{\mathbf{z}}(f_B))}{\sqrt{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)+\varepsilon }}\\&\quad = \sup _{g \in {{\mathcal {F}}}_R} \frac{E(g) - \frac{1}{m} \sum _{i=1}^{m} g(z_i)}{\sqrt{E(g) + \varepsilon }} \le \sqrt{\varepsilon }, \end{aligned}$$

is valid with probability at least

$$\begin{aligned}&1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \\&\quad \Big \{-\frac{m \varepsilon }{16(2 \cdot C(\kappa R + M)^2+\frac{2}{3} \cdot 4C(\kappa R + M)^2)} \Big \}\\&\quad \ge 1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \Big \{ -\frac{3m \varepsilon }{224C(\kappa +1)^2 R^2}\Big \}\\&\quad \ge 1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \Big \{ -\frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\Big \}. \end{aligned}$$

Here we use the restriction $R \ge M$. By Definition 3, we can get

$$\begin{aligned}&\mathrm{P} \left \{ \sup _{f \in {{\mathcal {B}}}_R} \frac{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)-({{\mathcal {E}}}_{{\mathbf {z}}}(f)-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B))}{\sqrt{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)+\varepsilon }} \ge \sqrt{\varepsilon } \right \}\\&\quad \le {{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \left \{ -\frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}\\&\quad \le \exp \left \{ C_s (\frac{4R}{\varepsilon })^s - \frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}. \end{aligned}$$

For any $\delta \in (0,1)$, let

$$\begin{aligned} \delta = \exp \left \{ C_s (\frac{4R}{\varepsilon })^s - \frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}. \end{aligned}$$

It follows that,

$$\begin{aligned} {\varepsilon }^{s+1} - \frac{75C (\kappa +1)^2 R^2 \ln (\frac{1}{\delta })}{m} \cdot {\varepsilon }^s - \frac{75C (\kappa +1)^2 R^2 C_s (4R)^s}{m} = 0. \end{aligned}$$

By Lemma 3, we have $varepsilon\le {\varepsilon }^{*}(m,\delta )$, where

$${\varepsilon }^{*}(m,\delta )= \max \left \{\frac{150C (\kappa +1)^2 R^2 \ln (\frac{1}{\delta })}{m}, T \left( \frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right) ^{\frac{1}{s+1}} \right \}.$$

Because $\sqrt{\varepsilon }\sqrt{{{\mathcal {E}}}(f) + \varepsilon } \le \frac{1}{2} {{\mathcal {E}}}(f) + \varepsilon$ holds for any $\varepsilon >0$, we have that for any $\delta \in (0,1)$, the following inequality holds with the probability at least $1-\delta$,

$$\begin{aligned}&{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)-({{\mathcal {E}}}_{\mathbf{z}}(f)-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)) \le \frac{1}{2}[{\mathcal E}(f)-{{\mathcal {E}}}(f_B)]\\&\quad + {\varepsilon }^{*}(m,\delta ). \end{aligned}$$

Replacing f by $f_{{\mathbf {z}}}$, we have with probability at least $1-\delta /2$,

$$\begin{aligned} T_1&= {{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)-({\mathcal E}_{{\mathbf {z}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B))\\&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2) \end{aligned}$$

is valid. This completes the proof of Proposition 4. $\square$

Proposition 5

For any ${{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m$, $T_2 \le 1$.

Proof

By the representor theorem in [20], we know that $f_{{{\mathbf {z}}},\lambda _1}$ can be written as $f_{{{\mathbf {z}}},\lambda _1} = \sum _{i=1}^{m} {\alpha }_i^{\lambda _1} {{\mathcal {K}}}_{x_i}$, and $f_{{\mathbf {z}}} = \arg \min \left \{ \lambda _1 \Vert f\Vert _{{\mathcal {K}}}^2 + \lambda _2 \Vert f\Vert _I^2 + {{\mathcal {E}}}_{\mathbf{z}}(f) \right \}$. It follows that

$$\begin{aligned} T_2 \le {{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{\mathbf{z}}\Vert _I^2 \le {{\mathcal {E}}}_{{\mathbf {z}}}(0) \le 1, \end{aligned}$$

where ${{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) \ge 0$ and $\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \ge 0$. Then we accomplished the proof of Proposition 5. $\square$

Proposition 6

For any $0< \delta < 1$, the following inequality holds with the probability at least $1-\delta /2$,

$$\begin{aligned} T_3 \le D(\lambda _1)\left ( 1 + \frac{7C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right ) + \frac{7 M^2 C \ln (\frac{2}{\delta })}{m} + 1. \end{aligned}$$

Proof

From the definitions of $f_{\lambda _1}$ and $D(\lambda _1)$, we have

$$\begin{aligned} \lambda _1 \Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}(f_{\lambda _1}) - {{\mathcal {E}}}(f_B) + \lambda _1 \Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2 = D(\lambda _1). \end{aligned}$$

(9)

It follows from inequality (9), we find that $|f_{\lambda _1}\Vert _{\infty } \le \kappa \Vert f_{\lambda _1}\Vert _{\mathcal K} \le \kappa \sqrt{D(\lambda _1)/\lambda _1}$. Set

$$\begin{aligned} \zeta _2&= W(y,f_{\lambda _1}) - W(y,f_B)\\&= C_1 (1 - y f_{\lambda _1}(x))_{+} \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 (y - f_{\lambda _1}(x))^2 \cdot {{\mathbf {1}}}_{\{y = 0\}} \\&- C_1 (1 - y f_{B}(x))_{+} \cdot {{\mathbf {1}}}_{\{y \ne 0\}} - C_2 (y - f_{B}(x))^2 \cdot {{\mathbf {1}}}_{\{y = 0\}}\\&= C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 [f_{\lambda _1}^2(x) - f_{B}^2(x)] \cdot {{\mathbf {1}}}_{\{y = 0\}}, \end{aligned}$$

then $T_3 = \frac{1}{m} \sum _{i=1}^{m} \zeta _2(z_i) - E(\zeta _2)$. Since $|f_B| \le M$ almost everywhere, we have

$$\begin{aligned} |\zeta _2|&\le | C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 [(f_{\lambda _1}(x)-f_{B}(x))(f_{\lambda _1}(x) + f_{B}(x))] \cdot {{\mathbf {1}}}_{\{y = 0\}} |\\&\le C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M) + C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)\\&\quad (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)\\&\le 2Cb := 2C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2. \end{aligned}$$

Hence $|\zeta _2 - E(\zeta _2)| \le M_{\zeta _2} := 4Cb$. Moreover,

$$\begin{aligned} E(\zeta _2^2)&= E[C_1 ((1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 (f_{\lambda _1}(x)^2 - f_{B}(x)^2) \cdot {{\mathbf {1}}}_{\{y = 0\}}]^2\\&= E\{C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}]\\&\quad \cdot {{\mathbf {1}}}_{\{y \ne 0\}}\}^2 \\&\quad + E\{C_2 [(f_{\lambda _1}(x)-f_{B}(x))(f_{\lambda _1}(x) + f_{B}(x))]\\&\quad \cdot {{\mathbf {1}}}_{\{y =0\}} \}^2\\&\le C^2(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2 + C^2 \Vert f_{\lambda _1}(x)-f_{B}(x)\Vert ^2_{\rho } (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\\&\le C^2(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2 + C^2 D(\lambda _1) \\&\quad (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\\&\le C^2 (D(\lambda _1)+1) (\kappa \sqrt{D(\lambda _1)/\lambda _1} \\&\quad + M)^2 \le C^2 (D(\lambda _1)+1) b. \end{aligned}$$

By the one-side Bernstein inequality, we have that for any $t > 0$, $\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t$, with confidence at least

$$\begin{aligned}&1 - \exp \left \{ -\frac{m t^2}{2( \sigma ^2(\zeta _2) + \frac{1}{3} M_{\zeta _2} t)} \right \}\\&\quad \ge 1 - \exp \left \{ -\frac{m t^2}{2[C^2 (D(\lambda _1)+1) b + \frac{1}{3} \cdot 4Cbt]} \right \}\\&\quad = 1 - \exp \left \{ - \frac{m t^2}{2Cb [C(D(\lambda _1)+1) + \frac{4}{3}t]} \right \}. \end{aligned}$$

Set $t^{*}$ to be the only positive solution of the above equation, we have

$$\begin{aligned} - \frac{m t^2}{2Cb [C(D(\lambda _1)+1) + \frac{4}{3}t]} = \ln (\delta ). \end{aligned}$$

So, $\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t^{*}$ holds with probability at least $1 - \delta$, where

$$\begin{aligned} t^{*}&= \frac{\frac{4Cb}{3} \ln (\frac{1}{\delta }) + \sqrt{(\frac{4Cb}{3} \ln (\frac{1}{\delta }))^2 + 2C^2bm(D(\lambda _1)+1) \ln (\frac{1}{\delta })}}{m}\\&\quad \le \frac{ 8Cb \ln (\frac{1}{\delta })}{3m} + \sqrt{\frac{2C^2bm(D(\lambda _1)+1) \ln (\frac{1}{\delta })}{m}}\\&\quad \le \frac{ 8Cb \ln (\frac{1}{\delta })}{3m} + D(\lambda _1) + 1 + \frac{ Cb \ln (\frac{1}{\delta })}{2m}. \end{aligned}$$

Recall $b = (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2$. It follows that

$$\begin{aligned} t^{*} \le D(\lambda _1)\Big ( 1 + \frac{7C \kappa ^2 \ln (\frac{1}{\delta })}{m \lambda _1} \Big ) + \frac{7 M^2 C \ln (\frac{1}{\delta })}{m} + 1. \end{aligned}$$

We accomplished the proof of Proposition 6. $\square$

Appendix B

Proof of Proposition 2:

By Propositions 4-6 and Definition 3, we have that for any $\delta \in (0,1)$, with confidence at least $1 - \delta$, the following inequality is valid,

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)&\le T_1 + T_2 + T_3 + D(\lambda _1)\\&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2) + 2 \\&\quad + D(\lambda _1)\Big ( 2 + \frac{7C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \Big ) + \frac{7 M^2 C \ln (\frac{2}{\delta })}{m}. \end{aligned}$$

For ${\varepsilon }^{*}(m,\delta /2)$, the inequality $\left(\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right)^{\frac{1}{s+1}} \ge \frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}$ is valid with $m \ge 37C (\kappa +1)^2 R \ln ({2}/{\delta }) ({\ln ({2}/{\delta })}/{C_s})^{1/s}$, we get ${\varepsilon }^{*}(m,\delta /2) = (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}}$. Thus, for any $0< \delta < 1$, with probability at least $1- \delta$, we have

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)&\le D(\lambda _1)\Big ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \Big ) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m} \\&\quad + 4 + 2\Big ( \frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m} \Big )^{\frac{1}{s+1}}. \end{aligned}$$

We accomplished the proof of Proposition 2. $\square$

Proof of Theorem 1:

By Definition 1, for any $\lambda _1>0$, we have $D(\lambda _1) \le {\lambda _1}^q$. Let $R = M$, then we have that for any $0< \delta < 1$, with probability at least $1- \delta$,

$$\begin{aligned}&{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \le D(\lambda _1)\left ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m}\\&\quad + 4 + 2{\varepsilon }^{*}(m,\delta /2) \\&\le {\lambda _1}^q \left ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right ) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m} + 4 \\&\quad + \frac{300C (\kappa +1)^2 M^2 \ln (\frac{2}{\delta })}{m}\\&\quad + 2\left(\frac{150C (\kappa +1)^2 M^2 C_s (4M)^s}{m}\right)^{\frac{1}{s+1}} \\&\le {\widetilde{C}} \left ( {\lambda _1}^q + \frac{{\lambda _1}^q}{m \lambda _1} + \frac{1}{m} + \frac{1}{m} + \left(\frac{1}{m}\right)^{\frac{1}{1+s}} \right ), \end{aligned}$$

where ${\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \ln ({2}/{\delta }) + 2)$. Set $\frac{{\lambda _1}^{q-1}}{m} = {\lambda _1}^q$, we have $\lambda _1 = \frac{1}{m}$. Since $0< \lambda _1 < 1$ and $0 < q \le 1$, let s close to 0 and q close to 1, so the inequality

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \le {\widetilde{C}} \left ( {\lambda _1}^q + \frac{{\lambda _1}^q}{m \lambda _1} + \frac{2}{m} + (\frac{1}{m})^{\frac{1}{1+s}} \right ) \le {\widetilde{C}} (\frac{1}{m}) \end{aligned}$$

is valid with probability at least $1- \delta$, where ${\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \log ({2}/{\delta }) + 2)$ is a constant. Then we accomplished proof of Theorem 1. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, Z., Qin, Y., Zou, B. et al. LMSVCR: novel effective method of semi-supervised multi-classification. Neural Comput & Applic 34, 3857–3873 (2022). https://doi.org/10.1007/s00521-021-06647-7

Download citation

Received: 05 June 2021
Accepted: 15 October 2021
Published: 08 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00521-021-06647-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LMSVCR: novel effective method of semi-supervised multi-classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A meta-learning network method for few-shot multi-class classification problems with numerical data

Methods for class-imbalanced learning with support vector machines: a review and an empirical evaluation

Multi-class support vector machine based on the minimization of class variance

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 1

Lemma 2

Lemma 3

Proof of Proposition 1:

Proposition 4

Proof

Proposition 5

Proof

Proposition 6

Proof

Appendix B

Proof of Proposition 2:

Proof of Theorem 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now