The previously known works studying the learning performance of multi-classification algorithm are usually based on supervised samples, but large amount of data generated in real-life is usually unlabeled. This paper introduces a novel Laplacian multi-classification support vector classification and regression (LMSVCR) algorithm for the case of semi-supervised learning. We first establish the fast learning rate of LMSVCR algorithm with semi-supervised multi-classification samples, and prove that LMSVCR algorithm with semi-supervised multi-classification samples is consistent. We show the numerical investigation on the learning performance of LMSVCR algorithm. The experimental studies indicate that the proposed LMSVCR algorithm has better learning performance in terms of prediction accuracy, sampling and training total time than other semi-supervised multi-classification algorithms.

This work is supported in part by National Nature Science Foundation of China (No. 61772011), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2018002), and National Key Research and Development Program of China (No. 2020YFA0714200).
Appendix A
To bound the excess generalization error \({{\mathcal {E}}}(f_{\mathbf{z}}) - {{\mathcal {E}}}(f_B)\) of LMSVCR, by Proposition 1, we should estimate the errors \(T_1, T_2, T_3\). Thus we first present the main tools as follows:
Lemma 1
[24] Let \(\xi\) be a random variable on a probability space Z with mean \(E(\xi )\), variance \(\sigma ^2(\xi ) = \sigma ^2\), and satisfying \(|\xi (z) - E(\xi )| \le M_{\xi }\) for almost all \(z \in Z\). Then for all \(\varepsilon > 0\),
Lemma 2
[24] Let \({\mathcal {G}}\) be a set of functions on Z such that for some \({c_\rho } \ge 0\), \(|g - E(g)| \le B\) almost everywhere and \(E(g^2) \le {c_\rho }E(g)\) for each \(g \in {{\mathcal {G}}}\) Then for every \(\varepsilon > 0\) and \(0 < \alpha \le 1\),
Lemma 3
[36] Let \(c_1,c_2 > 0\), \(p_1> p_2 > 0\). The equation \(x^{p_1} - c_1 x^{p_2} - c_2 = 0\) has a unique positive zero \(x^{*}\). And \(x^{*} \le \max \{(2 c_1)^{1/(p_1-p_2)}, (2 c_2)^{1/{p_1}}\}\).
Proof of Proposition 1:
Since for any \({{\mathbf {z}}} \in {Z}^m\), \(\lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2\ge 0, \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 \ge 0\), we have the following error decomposition
The last inequality above is follows from the fact that \(T_4 \le 0\) since by the definition \(f_{{\mathbf {z}},\lambda _1}\), we have \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) + \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1}) + \lambda _1\Vert f_{\lambda _1}\Vert _{\mathcal K}^2.\) where \(T_1, T_2, T_3\) are defined in Proposition 1. Then we complete the proof of Proposition 1. \(\square\)
Proposition 4
Assume \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\) are i.i.d. sample set. We have that for any \(0< \delta < 1\), with confidence at least \(1 - \delta /2\),
where \({\varepsilon }^{*}(m,2/\delta ) = \max \big \{\frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}, (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}} \big \}\).
Set \(\zeta _1 = W(y,f) - W(y,f_B)\). Obviously, \(\zeta _1\) varies among a set of functions in accordance with the varying sample \({\mathbf {z}}\). Applying Lemma 2 to the function set
Hence we first make sure that the functions have a bound in \({{\mathcal {F}}}_{R}\). Not only \(E(g) = {{\mathcal {E}}}(f) - {\mathcal E}(f_B) \ge 0\), \(\frac{1}{m} \sum _{i = 1}^{m} g(z_i) = {\mathcal E}_{{\mathbf {z}}}(f) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\), and \(g = C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 [(f(x) - f_{B}(x))(f(x) + f_{B}(x))] \cdot {\mathbf{1}}_{\{y = 0\}}.\) But also \(\Vert f\Vert _{\infty } \le \kappa \Vert f\Vert _{{\mathcal {K}}} \le \kappa R\) and \(|f_B(x)| \le M\) almost everywhere. We have
where \(C=\max \{C_{1}, C_{2}\}\). Then we get \(|g - E(g)| \le 4C(\kappa R + M)^2\) almost everywhere. Also,
Thus \(E(g^2) \le C(\kappa R + M)^2 E(g)\). Applying Lemma 2 to the function set \({{\mathcal {F}}}_R\), we have that inequality
is valid with probability at least
Here we use the restriction \(R \ge M\). By Definition 3, we can get
For any \(\delta \in (0,1)\), let
It follows that,
By Lemma 3, we have \(varepsilon\le {\varepsilon }^{*}(m,\delta )\), where
Because \(\sqrt{\varepsilon }\sqrt{{{\mathcal {E}}}(f) + \varepsilon } \le \frac{1}{2} {{\mathcal {E}}}(f) + \varepsilon\) holds for any \(\varepsilon >0\), we have that for any \(\delta \in (0,1)\), the following inequality holds with the probability at least \(1-\delta\),
Replacing f by \(f_{{\mathbf {z}}}\), we have with probability at least \(1-\delta /2\),
is valid. This completes the proof of Proposition 4. \(\square\)
Proposition 5
For any \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\), \(T_2 \le 1\).
By the representor theorem in [20], we know that \(f_{{{\mathbf {z}}},\lambda _1}\) can be written as \(f_{{{\mathbf {z}}},\lambda _1} = \sum _{i=1}^{m} {\alpha }_i^{\lambda _1} {{\mathcal {K}}}_{x_i}\), and \(f_{{\mathbf {z}}} = \arg \min \left \{ \lambda _1 \Vert f\Vert _{{\mathcal {K}}}^2 + \lambda _2 \Vert f\Vert _I^2 + {{\mathcal {E}}}_{\mathbf{z}}(f) \right \}\). It follows that
where \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) \ge 0\) and \(\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \ge 0\). Then we accomplished the proof of Proposition 5. \(\square\)
Proposition 6
For any \(0< \delta < 1\), the following inequality holds with the probability at least \(1-\delta /2\),
From the definitions of \(f_{\lambda _1}\) and \(D(\lambda _1)\), we have
It follows from inequality (9), we find that \(|f_{\lambda _1}\Vert _{\infty } \le \kappa \Vert f_{\lambda _1}\Vert _{\mathcal K} \le \kappa \sqrt{D(\lambda _1)/\lambda _1}\). Set
then \(T_3 = \frac{1}{m} \sum _{i=1}^{m} \zeta _2(z_i) - E(\zeta _2)\). Since \(|f_B| \le M\) almost everywhere, we have
Hence \(|\zeta _2 - E(\zeta _2)| \le M_{\zeta _2} := 4Cb\). Moreover,
By the one-side Bernstein inequality, we have that for any \(t > 0\), \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t\), with confidence at least
Set \(t^{*}\) to be the only positive solution of the above equation, we have
So, \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t^{*}\) holds with probability at least \(1 - \delta\), where
Recall \(b = (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\). It follows that
We accomplished the proof of Proposition 6. \(\square\)
Appendix B
Proof of Proposition 2:
By Propositions 4-6 and Definition 3, we have that for any \(\delta \in (0,1)\), with confidence at least \(1 - \delta\), the following inequality is valid,
For \({\varepsilon }^{*}(m,\delta /2)\), the inequality \(\left(\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right)^{\frac{1}{s+1}} \ge \frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}\) is valid with \(m \ge 37C (\kappa +1)^2 R \ln ({2}/{\delta }) ({\ln ({2}/{\delta })}/{C_s})^{1/s}\), we get \({\varepsilon }^{*}(m,\delta /2) = (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}}\). Thus, for any \(0< \delta < 1\), with probability at least \(1- \delta\), we have
We accomplished the proof of Proposition 2. \(\square\)
Proof of Theorem 1:
By Definition 1, for any \(\lambda _1>0\), we have \(D(\lambda _1) \le {\lambda _1}^q\). Let \(R = M\), then we have that for any \(0< \delta < 1\), with probability at least \(1- \delta\),
where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \ln ({2}/{\delta }) + 2)\). Set \(\frac{{\lambda _1}^{q-1}}{m} = {\lambda _1}^q\), we have \(\lambda _1 = \frac{1}{m}\). Since \(0< \lambda _1 < 1\) and \(0 < q \le 1\), let s close to 0 and q close to 1, so the inequality
is valid with probability at least \(1- \delta\), where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \log ({2}/{\delta }) + 2)\) is a constant. Then we accomplished proof of Theorem 1. \(\square\)
Dong, Z., Qin, Y., Zou, B. et al. LMSVCR: novel effective method of semi-supervised multi-classification. Neural Comput & Applic 34, 3857–3873 (2022). https://doi.org/10.1007/s00521-021-06647-7
DOI: https://doi.org/10.1007/s00521-021-06647-7