Abstract
The previously known works studying the learning performance of multi-classification algorithm are usually based on supervised samples, but large amount of data generated in real-life is usually unlabeled. This paper introduces a novel Laplacian multi-classification support vector classification and regression (LMSVCR) algorithm for the case of semi-supervised learning. We first establish the fast learning rate of LMSVCR algorithm with semi-supervised multi-classification samples, and prove that LMSVCR algorithm with semi-supervised multi-classification samples is consistent. We show the numerical investigation on the learning performance of LMSVCR algorithm. The experimental studies indicate that the proposed LMSVCR algorithm has better learning performance in terms of prediction accuracy, sampling and training total time than other semi-supervised multi-classification algorithms.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Altun Y, McAllester D, Belkin M (2005) Maximum margin semi-supervised learning for structured variables. Adv Neural Inf Process Syst 18:33–40
Hady MFA, Schwenker F (2013) Semi-supervised learning. Handbook on Neural Information Processing. Springer, Berlin, Heidelberg, pp 215–239
Chapelle O, Schölkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542
Zhu XJ (2005) Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences
Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping cohen Cgrossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415
Du B, Liu Y, Abbas IA (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Franklin Institute 353(2):448–461
Rebai I, BenAyed Y, Mahdi W (2016) Deep multilayer multiple kernel learning. Neural Comput. Appl. 27:2305–2314
Li X, Mao W, Jiang W (2016) Multiple-kernel-learning-based extreme learning machine for classification design. Neural Comput. Appl. 27:175–184
Carballal A, Fernandez-Lozano C, Heras J, Romero J (2020) Transfer learning features for predicting aesthetics through a novel hybrid machine learning method. Neural Comput. Appl. 32:5889–5900
Joachims T (1999) Transductive inference for text classification using support vector machines. Int Conf Mach Learn 99:200–209
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434
Bennett K, Mangasarian OL (1999) Combining support vector and mathematical programming methods for induction. Advances in Kernel Methods-SV Learning 307–326
Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In Esann, pp. 219–224
Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81
Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Jackel LD, LeCun Y, Muller UA, Sackinger E, Simard P, Vapnik V (1994) Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77–82
Krebel UHG (1999) Pairwise classification and support vector machines. In Advances in kernel methods: support vector learning, pp. 255–268
Angulo C, Parra X, Catala A (2003) K-SVCR. A support vector machine for multi-class classification. Neurocomputing 55(1–2):57–77
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50
Feng Y, Yang Y, Zhao Y, Lv S, Suykens JA (2014) Learning with kernelized elastic net regularization. KU Leuven, Leuven Belgium
Xu Y, Yang Z (2014) Elastic-net regression algorithm based on multi-scale gaussian kernel. Sci J Inf Eng 4(1):19–25
Wang W, Xu Z, Lu W, Zhang X (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3–4):643–663
Wu Q, Zhou DX (2005) SVM soft margin classifiers: linear programming versus quadratic programming. Neural Comput 17(5):1160–1187
Wu Q, Ying Y, Zhou DX (2006) Learning rates of least-square regularized regression. Foundations Comput Math 6(2):171–192
Lv SG, Zhou F (2015) Optimal learning rates of \(l^{p}\)-type multiple kernel learning under general conditions. Inf Sci 294:255–268
Chen DR, Wu Q, Ying Y, Zhou DX (2004) Support vector machine soft margin classifiers: error analysis. J Mach Learn Res 5:1143–1175
Tong H, Chen DR, Peng L (2009) Analysis of support vector machines regression. Foundations Comput Math 9(2):243–257
Chen DR, Xiang DH (2006) The consistency of multicategory support vector machines. Adv Comput Math 24(1–4):155–169
Chen H, Li L (2009) Semisupervised multicategory classification with imperfect model. IEEE Trans Neural Netw 20(10):1594–1603
Bamakan SMH, Wang H, Shi Y (2017) Ramp loss k-support vector classification-regression; a robust and sparse multi-class approach to the intrusion detection problem. Knowledge-Based Syst 126:113–126
Huang CL, Dun JF (2008) A distributed PSO CSVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391
Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Qian M, Nie F, Zhang C (2009) Efficient multi-class unlabeled constrained semi-supervised SVM. In Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1665–1668
Pan H, Kang Z (2018) Robust graph learning for semi-supervised classification. In 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics, pp 265–268
Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202
Cucker F, Smale S (2002) Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations Comput Math 2(4):413–428
Acknowledgements
This work is supported in part by National Nature Science Foundation of China (No. 61772011), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2018002), and National Key Research and Development Program of China (No. 2020YFA0714200).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
To bound the excess generalization error \({{\mathcal {E}}}(f_{\mathbf{z}}) - {{\mathcal {E}}}(f_B)\) of LMSVCR, by Proposition 1, we should estimate the errors \(T_1, T_2, T_3\). Thus we first present the main tools as follows:
Lemma 1
[24] Let \(\xi\) be a random variable on a probability space Z with mean \(E(\xi )\), variance \(\sigma ^2(\xi ) = \sigma ^2\), and satisfying \(|\xi (z) - E(\xi )| \le M_{\xi }\) for almost all \(z \in Z\). Then for all \(\varepsilon > 0\),
Lemma 2
[24] Let \({\mathcal {G}}\) be a set of functions on Z such that for some \({c_\rho } \ge 0\), \(|g - E(g)| \le B\) almost everywhere and \(E(g^2) \le {c_\rho }E(g)\) for each \(g \in {{\mathcal {G}}}\) Then for every \(\varepsilon > 0\) and \(0 < \alpha \le 1\),
Lemma 3
[36] Let \(c_1,c_2 > 0\), \(p_1> p_2 > 0\). The equation \(x^{p_1} - c_1 x^{p_2} - c_2 = 0\) has a unique positive zero \(x^{*}\). And \(x^{*} \le \max \{(2 c_1)^{1/(p_1-p_2)}, (2 c_2)^{1/{p_1}}\}\).
Proof of Proposition 1:
Since for any \({{\mathbf {z}}} \in {Z}^m\), \(\lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2\ge 0, \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 \ge 0\), we have the following error decomposition
The last inequality above is follows from the fact that \(T_4 \le 0\) since by the definition \(f_{{\mathbf {z}},\lambda _1}\), we have \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) + \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1}) + \lambda _1\Vert f_{\lambda _1}\Vert _{\mathcal K}^2.\) where \(T_1, T_2, T_3\) are defined in Proposition 1. Then we complete the proof of Proposition 1. \(\square\)
Proposition 4
Assume \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\) are i.i.d. sample set. We have that for any \(0< \delta < 1\), with confidence at least \(1 - \delta /2\),
where \({\varepsilon }^{*}(m,2/\delta ) = \max \big \{\frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}, (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}} \big \}\).
Proof
Set \(\zeta _1 = W(y,f) - W(y,f_B)\). Obviously, \(\zeta _1\) varies among a set of functions in accordance with the varying sample \({\mathbf {z}}\). Applying Lemma 2 to the function set
Hence we first make sure that the functions have a bound in \({{\mathcal {F}}}_{R}\). Not only \(E(g) = {{\mathcal {E}}}(f) - {\mathcal E}(f_B) \ge 0\), \(\frac{1}{m} \sum _{i = 1}^{m} g(z_i) = {\mathcal E}_{{\mathbf {z}}}(f) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\), and \(g = C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 [(f(x) - f_{B}(x))(f(x) + f_{B}(x))] \cdot {\mathbf{1}}_{\{y = 0\}}.\) But also \(\Vert f\Vert _{\infty } \le \kappa \Vert f\Vert _{{\mathcal {K}}} \le \kappa R\) and \(|f_B(x)| \le M\) almost everywhere. We have
where \(C=\max \{C_{1}, C_{2}\}\). Then we get \(|g - E(g)| \le 4C(\kappa R + M)^2\) almost everywhere. Also,
Thus \(E(g^2) \le C(\kappa R + M)^2 E(g)\). Applying Lemma 2 to the function set \({{\mathcal {F}}}_R\), we have that inequality
is valid with probability at least
Here we use the restriction \(R \ge M\). By Definition 3, we can get
For any \(\delta \in (0,1)\), let
It follows that,
By Lemma 3, we have \(varepsilon\le {\varepsilon }^{*}(m,\delta )\), where
Because \(\sqrt{\varepsilon }\sqrt{{{\mathcal {E}}}(f) + \varepsilon } \le \frac{1}{2} {{\mathcal {E}}}(f) + \varepsilon\) holds for any \(\varepsilon >0\), we have that for any \(\delta \in (0,1)\), the following inequality holds with the probability at least \(1-\delta\),
Replacing f by \(f_{{\mathbf {z}}}\), we have with probability at least \(1-\delta /2\),
is valid. This completes the proof of Proposition 4. \(\square\)
Proposition 5
For any \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\), \(T_2 \le 1\).
Proof
By the representor theorem in [20], we know that \(f_{{{\mathbf {z}}},\lambda _1}\) can be written as \(f_{{{\mathbf {z}}},\lambda _1} = \sum _{i=1}^{m} {\alpha }_i^{\lambda _1} {{\mathcal {K}}}_{x_i}\), and \(f_{{\mathbf {z}}} = \arg \min \left \{ \lambda _1 \Vert f\Vert _{{\mathcal {K}}}^2 + \lambda _2 \Vert f\Vert _I^2 + {{\mathcal {E}}}_{\mathbf{z}}(f) \right \}\). It follows that
where \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) \ge 0\) and \(\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \ge 0\). Then we accomplished the proof of Proposition 5. \(\square\)
Proposition 6
For any \(0< \delta < 1\), the following inequality holds with the probability at least \(1-\delta /2\),
Proof
From the definitions of \(f_{\lambda _1}\) and \(D(\lambda _1)\), we have
It follows from inequality (9), we find that \(|f_{\lambda _1}\Vert _{\infty } \le \kappa \Vert f_{\lambda _1}\Vert _{\mathcal K} \le \kappa \sqrt{D(\lambda _1)/\lambda _1}\). Set
then \(T_3 = \frac{1}{m} \sum _{i=1}^{m} \zeta _2(z_i) - E(\zeta _2)\). Since \(|f_B| \le M\) almost everywhere, we have
Hence \(|\zeta _2 - E(\zeta _2)| \le M_{\zeta _2} := 4Cb\). Moreover,
By the one-side Bernstein inequality, we have that for any \(t > 0\), \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t\), with confidence at least
Set \(t^{*}\) to be the only positive solution of the above equation, we have
So, \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t^{*}\) holds with probability at least \(1 - \delta\), where
Recall \(b = (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\). It follows that
We accomplished the proof of Proposition 6. \(\square\)
Appendix B
Proof of Proposition 2:
By Propositions 4-6 and Definition 3, we have that for any \(\delta \in (0,1)\), with confidence at least \(1 - \delta\), the following inequality is valid,
For \({\varepsilon }^{*}(m,\delta /2)\), the inequality \(\left(\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right)^{\frac{1}{s+1}} \ge \frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}\) is valid with \(m \ge 37C (\kappa +1)^2 R \ln ({2}/{\delta }) ({\ln ({2}/{\delta })}/{C_s})^{1/s}\), we get \({\varepsilon }^{*}(m,\delta /2) = (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}}\). Thus, for any \(0< \delta < 1\), with probability at least \(1- \delta\), we have
We accomplished the proof of Proposition 2. \(\square\)
Proof of Theorem 1:
By Definition 1, for any \(\lambda _1>0\), we have \(D(\lambda _1) \le {\lambda _1}^q\). Let \(R = M\), then we have that for any \(0< \delta < 1\), with probability at least \(1- \delta\),
where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \ln ({2}/{\delta }) + 2)\). Set \(\frac{{\lambda _1}^{q-1}}{m} = {\lambda _1}^q\), we have \(\lambda _1 = \frac{1}{m}\). Since \(0< \lambda _1 < 1\) and \(0 < q \le 1\), let s close to 0 and q close to 1, so the inequality
is valid with probability at least \(1- \delta\), where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \log ({2}/{\delta }) + 2)\) is a constant. Then we accomplished proof of Theorem 1. \(\square\)
Rights and permissions
About this article
Cite this article
Dong, Z., Qin, Y., Zou, B. et al. LMSVCR: novel effective method of semi-supervised multi-classification. Neural Comput & Applic 34, 3857–3873 (2022). https://doi.org/10.1007/s00521-021-06647-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06647-7