Risk-Averse support vector classifier machine via moments penalization

Fu, Cui; Zhou, Shuisheng; Zhang, Junna; Han, Banghe; Chen, Yuxue; Ye, Feng

doi:10.1007/s13042-022-01598-4

Risk-Averse support vector classifier machine via moments penalization

Original Article
Published: 07 July 2022

Volume 13, pages 3341–3358, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Cui Fu¹,
Shuisheng Zhou ORCID: orcid.org/0000-0003-4764-9483¹,
Junna Zhang¹,
Banghe Han¹,
Yuxue Chen¹ &
…
Feng Ye¹

273 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Support vector machine (SVM) has always been one of the most successful learning methods, with the idea of structural risk minimization which minimizes the upper bound of the generalization error. Recently, a tighter upper bound of the generalization error, related to the variance of loss, is proved as the empirical Bernstein bound. Based on this result, we propose a novel risk-averse support vector classifier machine (RA-SVCM), which can achieve a better generalization performance by considering the second order statistical information of loss function. It minimizes the empirical first- and second-moments of loss function, i.e., the mean and variance of loss function, to achieve the “right” bias-variance trade-off for general classes. The proposed method can be solved by the kernel reduced and Newton-type technique under certain conditions. Empirical studies show that the RA-SVCM achieves the best performance in comparison with other classical and state of art methods. The additional analysis shows that the proposed method is insensitive to the parameters, so abroad range of parameters lead to satisfactory performance. The proposed method is a general form of standard SVM, so it enriches the related studies of SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Introduction to Machine Learning

References

Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw 8(6):1564–1564. https://doi.org/10.1109/TNN.1997.641482
Article Google Scholar
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999. https://doi.org/10.1109/72.788640
Article Google Scholar
Osuna E, Freund R, Girosi F (2000) Training support vector machines: an application to face detection. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp 130–136. https://doi.org/10.1109/CVPR.1997.609310
Cheng Y, Fu L, Luo P, Ye Q, Liu F, Zhu W (2020) Multi-view generalized support vector machine via mining the inherent relationship between views with applications to face and fire smoke recognition. Knowledge-Based Syst 210:106488. https://doi.org/10.1016/j.knosys.2020.106488
Article Google Scholar
Olatunji SO (2019) Improved email spam detection model based on support vector machines. Neural Comput Appl 31(3):691–699. https://doi.org/10.1007/s00521-017-3100-y
Article Google Scholar
Cun YL, Boser B, Denker JS, Henderson D, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp 396–404 . https://dl.acm.org/doi/10.5555/109230.109279
Yadav A, Singh A, Dutta MK, Travieso CM (2020) Machine learning-based classification of cardiac diseases from PCG recorded heart sounds. Neural Comput Appl 32(28):17843–17856. https://doi.org/10.1007/s00521-019-04547-5
Article Google Scholar
Yang L, Xu Z (2019) Feature extraction by PCA and diagnosis of breast tumors using svm with de-based parameter tuning. Int J Mach Learn Cybern 10(3):591–601. https://doi.org/10.1007/s13042-017-0741-1
Article Google Scholar
Le DN, Parvathy VS, Gupta D, Khanna A, Rodrigues J, Shankar K (2021) Iot enabled depthwise separable convolution neural network with deep support vector machine for covid-19 diagnosis and classification. Int J Mach Learn Cybern. https://doi.org/10.1007/s13042-020-01248-7
Article Google Scholar
Yu D, Xu Z, Wang X (2020) Bibliometric analysis of support vector machines research trend: a case study in china. Int J Mach Learn Cybern 11(3):715–728. https://doi.org/10.1007/s13042-019-01028-y
Article Google Scholar
Suykens Vandewalle J (1999) Least square support vector machine classifiers. Neural Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
Article Google Scholar
Du JZ, Lu WG, Wu XH, Dong JY, Zuo WM (2018) L-SVM: a radius-margin-based svm algorithm with logdet regularization. Expert Syst Appl 102:113–125. https://doi.org/10.1016/j.eswa.2018.02.006
Article Google Scholar
Vitt CA, Dentcheva D, Xiong H (2019) Risk-averse classification. Ann Operat Res 3:1–29. https://doi.org/10.1007/s10479-019-03344-6
Article Google Scholar
Zhou S (2015) Sparse LSSVM in primal using Cholesky factorization for large-scale problems. IEEE Trans Neural Netw Learn Syst 27(4):783–795. https://doi.org/10.1109/TNNLS.2015.2424684
Article MathSciNet Google Scholar
Khemchandani R, Chandra S et al (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. https://doi.org/10.1109/TPAMI.2007.1068
Article Google Scholar
Yan H, Ye Q, Zhang T, Yu D-J, Yuan X, Xu Y, Fu L (2018) Least squares twin bounded support vector machines based on l1-norm distance metric for classification. Pattern Recogn 74:434–447. https://doi.org/10.1016/j.patcog.2017.09.035
Article Google Scholar
Richhariya B, Tanveer M (2020) A reduced universum twin support vector machine for class imbalance learning. Pattern Recogn 102:107150. https://doi.org/10.1016/j.patcog.2019.107150
Article Google Scholar
Vapnik V, Vashist A (2009) A new learning paradigm: learning using privileged information. Neural Netw 22(5–6):544–557. https://doi.org/10.1016/j.neunet.2009.06.042
Article MATH Google Scholar
Gammerman A, Vovk V, Papadopoulos H (2015) Statistical learning and data sciences. In: Third International Symposium, SLDS, vol 9047, pp 20–23
Tang J, Tian Y, Zhang P, Liu X (2017) Multiview privileged support vector machines. IEEE Trans Neural Netw Learn Syst 29(8):3463–3477. https://doi.org/10.1109/TNNLS.2017.2728139
Article MathSciNet Google Scholar
Cheng Y, Yin H, Ye Q, Huang P, Fu L, Yang Z, Tian Y (2020) Improved multi-view GEPSVM via inter-view difference maximization and intra-view agreement minimization. Neural Netw 125:313–329. https://doi.org/10.1016/j.neunet.2020.02.002
Article MATH Google Scholar
Ye Q, Huang P, Zhang Z, Zheng Y, Fu L, Yang W (2021) Multiview learning with robust double-sided twin svm. IEEE Trans Cybern 60:1–14. https://doi.org/10.1109/TCYB.2021.3088519
Article Google Scholar
Garg A, Dan R (2003) Margin distribution and learning algorithms. In: Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August 21–24, 2003, Washington, DC, USA, pp 210–217
Lu X, Liu W, Zhou C, Huang M (2017) Robust least-squares support vector machine with minimization of mean and variance of modeling error. IEEE Trans Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2017.2709805
Article Google Scholar
Zhang T, Zhou Z (2014) Large margin distribution machine. In: Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining, pp 313–322. https://doi.org/10.1145/2623330.2623710
Zhang T, Zhou Z (2020) Optimal margin distribution machine. IEEE Trans Knowledge Data Eng 32(6):1143–1156. https://doi.org/10.1109/TKDE.2019.2897662
Article MathSciNet Google Scholar
Maurer A, Pontil M (2009) Empirical bernstein bounds and sample variance penalization. In: Proceedings of the 22nd Annual Conference on Learning Theory. Montreal, Canada, pp 1–9. https://arxiv.53yu.com/abs/0907.3740v1
Steinwart I, Hush D, Scovel C (2011) Training SVMs without offset. J Mach Learn Res 12(1):141–202. https://doi.org/10.5555/1953048.1953054
Article MathSciNet MATH Google Scholar
Vito ED, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390
MathSciNet MATH Google Scholar
Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: International Conference on Computational Learning Theory, pp 416–426. https://doi.org/10.1007/3-540-44581-1_27
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res. https://doi.org/10.1162/1532443041827925
Article MATH Google Scholar
Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the 2001 SIAM International Conference on Data Mining, pp 1–17. https://doi.org/10.1137/1.9781611972719.13
Keerthi SS, Chapelle O, DeCoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515
MathSciNet MATH Google Scholar
Chapelle O (2007) Training a support vector machine in the primal. Neural comput 19(5):1155–1178. https://doi.org/10.1162/neco.2007.19.5.1155
Article MathSciNet MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30. https://doi.org/10.1080/01621459.1963.10500830
Article MathSciNet MATH Google Scholar
Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2):215–223. https://doi.org/10.1080/00401706.1979.10489751
Article MathSciNet MATH Google Scholar
Kohavi R, etal. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol 14, pp 1137–1145. https://dl.acm.org/doi/10.5555/1643031.1643047
Fung GM, Mangasarian OL (2005) Proximal support vector machine classifiers. Mach Learn 59(1):77–97. https://doi.org/10.1007/s10994-005-0463-6
Article MATH Google Scholar
Zhou S, Cui J, Ye F, Liu H, Zhu Q (2013) New smoothing SVM algorithm with tight error bound and efficient reduced techniques. Comput Optimiz Appl 56(3):599–617. https://doi.org/10.1007/s10589-013-9571-6
Article MathSciNet MATH Google Scholar
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30. https://doi.org/10.5555/1248547.1248548
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61772020.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xi’dian University, Xi’an, 710071, China
Cui Fu, Shuisheng Zhou, Junna Zhang, Banghe Han, Yuxue Chen & Feng Ye

Authors

Cui Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shuisheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Junna Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Banghe Han
View author publications
You can also search for this author in PubMed Google Scholar
Yuxue Chen
View author publications
You can also search for this author in PubMed Google Scholar
Feng Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuisheng Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proof of remark 1

Theorem 2

Under the condition of Theorem 1, the objective function $f_1({\varvec{\alpha }})$ in (24) is convex if

$$\begin{aligned} \frac{{{\lambda _1}}}{{{\lambda _2}}} \ge 2 \left(c + \frac{1}{p}\right). \end{aligned}$$

(A1)

Proof

Here, let ${g_i}({\varvec{\alpha }}) = \frac{1}{p}\log (1 + {e^{p{\varvec{r}_i}}}) = \max \{ {\varvec{r}_i},0\} + \frac{1}{p}\log (1 + {e^{ - p{|\varvec{r}_i |}}})$, the objective function of (24) can be written as

$$\begin{aligned} {f_1}(\varvec{\alpha } ) = {\lambda _1}\sum \limits _{i = 1}^m {{g_i}(\varvec{\alpha } )} + {\lambda _2}{g^\top }(\varvec{\alpha } )Qg(\varvec{\alpha } ) + \frac{1}{2}{\varvec{\alpha } ^\top }K\varvec{\alpha } \end{aligned}$$

(A2)

The Hessian matrix of $f_1(\varvec{\alpha })$ is:

$$\begin{aligned} {\nabla ^2}{f_1}(\varvec{\alpha } )= & {} {\lambda _1}\sum \limits _{i = 1}^m {{\nabla ^2}{g_i}(\varvec{\alpha } )} + 2{\lambda _2}\sum \limits _{i = 1}^m {{\varvec{\delta } _i}{\nabla ^2}{g_i}(\varvec{\alpha } )} \nonumber \\&+ 2{\lambda _2}\nabla g(\varvec{\alpha } )Q\nabla {g^\top }(\varvec{\alpha } ) + K\nonumber \\= & {} \sum \limits _{i = 1}^m {({\lambda _1} + 2{\lambda _2}{\varvec{\delta } _i}){\nabla ^2}} {g_i}(\varvec{\alpha } ) \nonumber \\&+ 2{\lambda _2}\nabla g(\varvec{\alpha } )Q\nabla {g^\top }(\varvec{\alpha } ) + K \end{aligned}$$

(A3)

where $Q=\varvec{I}-\frac{1}{m}\varvec{e}^\top \varvec{e}$, $\varvec{\sigma }=Qg(\varvec{\alpha })$, and

$$\begin{aligned} {\varvec{\delta } _i}= & {} {g_i}(\varvec{\alpha } ) - \frac{1}{m}\sum \limits _{i = 1}^m {{e^T}g(\varvec{\alpha } )} \nonumber \\= & {} \max \{ {\varvec{r}_i},0\} + \frac{1}{p}\log (1 + {e^{ - p{|\varvec{r}_i |}}}) - \frac{1}{m}\sum \limits _{j = 1}^m \max \{ {\varvec{r}_j},0\} \nonumber \\&+ \frac{1}{p}\log (1 + {e^{ - p{|\varvec{r}_j |}}}) \nonumber \\\ge & {} - \frac{1}{m}\sum \limits _{j = 1}^m {\max \{ {\varvec{r}_j},0\} - \frac{1}{{mp}}\sum \limits _{j = 1}^m {\log (1 + {e^{ - p{|\varvec{r}_j |}}})} } \nonumber \\\ge & {} - {c} - \frac{1}{{mp}}\sum \limits _{j = 1}^m {\log (1 + {e^{ - p(1+M)}})} \nonumber \\\ge & {} - {c} - \frac{{\log 2}}{p}\nonumber \\\ge & {} - {c} - \frac{1}{p}. \end{aligned}$$

(A4)

To prove that the objective function in Eq.(A2) is convex, it suffices to show that ${\nabla ^2}{f_1}(\varvec{\alpha } ) \succeq 0$ for every $\varvec{\alpha }$. It’s obvious that $2{\lambda _2}\nabla g(\varvec{\alpha } )Q\nabla {g^T}(\varvec{\alpha } ) + K\succeq 0$. That is, we need to prove that $\sum \nolimits _{i = 1}^m {({\lambda _1} + 2{\lambda _2}{\varvec{\delta } _i}){\nabla ^2}} {g_i}(\varvec{\alpha } ) \succeq 0$. Thus, let $\varvec{\mu }_i=\lambda _1+2\lambda _2 \varvec{\delta } _i$, then we need to prove $\varvec{\mu }_i\ge 0$, based on (A4), we get $\frac{{{\lambda _1}}}{{{\lambda _2}}} \ge 2(c + \frac{1}{p})$. That is, when $\frac{{{\lambda _1}}}{{{\lambda _2}}} \ge 2(c + \frac{1}{p})$, for each $\varvec{\alpha }\in R^m$, we have ${\nabla ^2}{f_1}(\varvec{\alpha } ) \succeq 0$. Therefore, the objective function $f_1(\varvec{\alpha })$ in (24) is convex. $\square$

It is obvious that the objective function in Eq.(25) is convex since ${\varphi _p}(r)$ and ${({\varphi _p}(r))^2}$ are convex functions. When $\frac{{{\lambda _1}}}{{{\lambda _2}}} \ge 2(c + \frac{1}{p})$, the solution of problem (24) obtained by algorithm 1 is globally optimal based on the Theorem 2. In fact, we have rarely encountered non-convergence in a large number of experiments. Of course we can also make a simple rule for selecting a optimal superparameter that satisfies the conditions given above.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, C., Zhou, S., Zhang, J. et al. Risk-Averse support vector classifier machine via moments penalization. Int. J. Mach. Learn. & Cyber. 13, 3341–3358 (2022). https://doi.org/10.1007/s13042-022-01598-4

Download citation

Received: 14 September 2021
Accepted: 10 June 2022
Published: 07 July 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s13042-022-01598-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Risk-Averse support vector classifier machine via moments penalization

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Introduction to Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proof of remark 1

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Risk-Averse support vector classifier machine via moments penalization

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Learning from imbalanced data: open challenges and future directions

Introduction to Machine Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A Proof of remark 1

Appendix A Proof of remark 1

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation