Skip to main content
Log in

Extreme vector machine for fast training on large data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Quite often, different types of loss functions are adopted in SVM or its variants to meet practical requirements. How to scale up the corresponding SVMs for large datasets are becoming more and more important in practice. In this paper, extreme vector machine (EVM) is proposed to realize fast training of SVMs with different yet typical loss functions on large datasets. EVM begins with a fast approximation of the convex hull, expressed by extreme vectors, of the training data in the feature space, and then completes the corresponding SVM optimization over the extreme vector set. When hinge loss function is adopted, EVM is the same as the approximate extreme points support vector machine (AESVM) for classification. When square hinge loss function, least squares loss function and Huber loss function are adopted, EVM corresponds to three versions, namely, L2-EVM, LS-EVM and Hub-EVM, respectively, for classification or regression. In contrast to the most related machine AESVM, with the retainment of its theoretical advantage, EVM is distinctive in its applicability to a wide variety of loss functions to meet practical requirements. Compared with the other state-of-the-art fast training algorithms CVM and FastKDE of SVMs, EVM indeed relaxes the limitation of least squares loss functions, and experimentally exhibits its superiority in training time, robustness capability and number of support vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  2. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    Book  MATH  Google Scholar 

  3. Tahira M, Khan A (2016) Protein subcellular localization of fluorescence microscopy images: employing new statistical and Texton based image features and SVM based ensemble classification. Inf Sci 345(6):65–80

    Article  Google Scholar 

  4. Li YJ, Leng QK, Fu YZ (2017) Cross kernel distance minimization for designing support vector machines. Int J Mach Learn Cybernet 8(5):1585–1593

    Article  Google Scholar 

  5. Hu L, Lu SX, Wang XZ (2013) A new and informative active learning approach for support vector machine. Inf Sci 244(9):142–160

    MathSciNet  MATH  Google Scholar 

  6. Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybernet 8(4):1211–1221

    Article  Google Scholar 

  7. Reshma K, Pal A (2017) Tree based multi-category Laplacian TWSVM for content based image retrieval. Int J Mach Learn Cybernet 8(4):1197–1210

    Article  Google Scholar 

  8. Muhammad T, Shubham K (2017) A regularization on Lagrangian twin support vector regression. Int J Mach Learn Cybernet 8(3):807–821

    Article  Google Scholar 

  9. Williams C, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems, pp 661–667

  10. Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1595

    Article  Google Scholar 

  11. Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: International conference on neural information processing systems. Curran Associates Inc., pp 1177–1184

  12. Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288

    Article  MathSciNet  MATH  Google Scholar 

  13. Keerthi S, Shevade S, Bhattachayya C, Murth K (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649

    Article  MATH  Google Scholar 

  14. Peng XJ, Kong LY, Chen DJ (2017) A structural information-based twin-hypersphere support vector machine classifier. Int J Mach Learn Cybernet 8(1):295–308

    Article  Google Scholar 

  15. Joachims T (1999) Making large-scale support vector machine learning practical. Advances in kernel methods. MIT Press, Cambridge, pp 169–184

    Google Scholar 

  16. Wang D, Qiao H, Zhang B, Wang M (2013) Online support vector machine based on convex hull vertices selection. IEEE Trans Neural Netw Learn Syst 24(4):593–609

    Article  Google Scholar 

  17. Gu XQ, Chung FL, Wang ST (2018) Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks. Knowl Based Syst 151(1):149–164

    Article  Google Scholar 

  18. Osuna E, Castro OD (2002) Convex hull in feature space for support vector machines. In: Proceedings of advances in artificial intelligence, pp 411–419

    Chapter  Google Scholar 

  19. Osuna E, Tsang I, Kwok J, Cheung P (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392

    MathSciNet  MATH  Google Scholar 

  20. Tsang I, Kwok J, Zurada J (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140

    Article  Google Scholar 

  21. Tsang I, Kwok A, Kwok J (2007) Simpler core vector machines with enclosing balls. In: Proceedings of the 24th international conference on machine learning, pp 911–918

  22. Wang ST, Wang J, Chung F (2014) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybernet 44(1):1–20

    Article  Google Scholar 

  23. Nandan M, Khargonekar PP, Talathi SS (2014) Fast SVM training using approximate extreme points. J Mach Learn Res 15:59–98

    MathSciNet  MATH  Google Scholar 

  24. Huang CQ, Chung FL, Wang ST (2016) Multi-view L2-SVM and its multi-view core vector machine. Neural Netw 75(3):110–125

    Article  MATH  Google Scholar 

  25. Suykens J, Gestel T, Brabanter J, Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Pub, Singapore

    Book  MATH  Google Scholar 

  26. Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104

    Article  MATH  Google Scholar 

  27. Karasuyama M, Takeuchi I (2010) Nonlinear regularization path for the modified Huber loss support vector machines. In: Proceedings of international joint conference on neural networks, pp 1–8

  28. Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126

    Article  MATH  Google Scholar 

  29. Chau A, Li X, Yu W (2013) Large data sets classification using convex–concave hull and support vector machine. Soft Comput 17(5):793–804

    Article  Google Scholar 

  30. Theodoridis S, Mavroforakis M (2007) Reduced convex hulls: a geometric approach to support vector machines. IEEE Signal Process Mag 24(3):119–122

    Article  Google Scholar 

  31. Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(8):448–461

    Article  MathSciNet  MATH  Google Scholar 

  32. Tax D, Duin R (1999) Support vector domain description. Pattern Recogn Lett 20(11):1191–1199

    Article  Google Scholar 

  33. Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178

    Article  MathSciNet  MATH  Google Scholar 

  34. Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M (1997) Deterministic edge-preserving regularization in computed imaging. IEEE Trans Image Proc 6(2):298–311

    Article  Google Scholar 

  35. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  36. Ye J, Xiong T (2007) SVM versus least squares SVM. In: Proceedings of the 7th international conference on artificial intelligence and statistics, pp 644–651

  37. Lin C. LIBSVM data. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed 28 Feb 2017

  38. Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2):255–287

    Google Scholar 

  39. Gao S, Tsang IW, Chia LT (2013) Sparse representation with kernels. IEEE Trans Image Process 22(2):423–434

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Hong Kong Polytechnic University under Grant G-UA3W, by the National Natural Science Foundation of China under Grant nos. 61572236, 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK20161268 and BK20180956.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoqing Gu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

1.1 Proof of Theorem 1

  1. 1.

    For the L2-SVM:

Let \({L_{L2 - 2}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}}\) and \({L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{u}}_i})}\), where \({{\mathbf{u}}_i}=\sum\nolimits_{{i=1}}^{M} {{r_{i,t}}} {{\mathbf{z}}_t}\). From the precondition of yi = yj in each subset, we have

$$\begin{aligned} L_{{L2 - 3}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ) & = (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\;\{ 0,\;[1 - y_{i} } ({\mathbf{w}}^{T} \sum\nolimits_{{t = 1}}^{M} {r_{{i,t}} {\mathbf{z}}_{t} + b)]\} \} ^{2} } \\ & = (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\{ 0,\;} \sum\nolimits_{{t = 1}}^{M} {r_{{i,t}} } [1 - y_{t} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b)]\} \} ^{2} . \\ \end{aligned}$$

According to \(max(0,\;A+B) \leq max(0,\;A)+max(0,\;B)\),

$$\begin{aligned} L_{{L2 - 3}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ) & \le (C/2N)\sum\nolimits_{{i = 1}}^{N} {\sum\nolimits_{{t = 1}}^{M} {\{ max\{ 0,} } \;r_{{i,t}} [1 - y_{i} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b)]\} \} ^{2} \\ & \le (C/2N)\sum\nolimits_{{t = 1}}^{M} {\{ max\{ 0,\;1 - y_{i} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b} )\} \} ^{2} \sum\nolimits_{{i = 1}}^{N} {r_{{i,t}} } = L_{{L2 - 2}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ). \\ \end{aligned}$$

Adding \((1/2){\left\| {\mathbf{w}} \right\|^2}\) to both sides of the inequality above, we get \({F_{L2-3}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}} \leq {F_{L2-2}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}}\).

  1. 2.

    For the LS-SVM:

Let \({L_{LS - 2}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}}\) and \({L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{i=1}}^{N} {l({\mathbf{w}},b,{{\mathbf{u}}_i})}\). We have

$$\begin{gathered} {L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{i=1}}^{N} {({y_i} - ({{\mathbf{w}}^T}} \sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}+} b){)^2} \hfill \\ =(C/2N)\sum\nolimits_{{i=1}}^{N} {(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}(} } {y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b){)^2}. \hfill \\ \end{gathered}$$

According to Jensen’s inequality, if \({\lambda _1},{\lambda _2}, \ldots ,{\lambda _n}\) are nonnegative real numbers such that\({\lambda _1}+{\lambda _2}+ \ldots +{\lambda _n}=1\), \(\varphi ( \cdot )\)is a real convex function, then \(\varphi ({\lambda _1}{x_1}+{\lambda _2}{x_2}+ \cdots +{\lambda _n}{x_n}) \leq {\lambda _1}\varphi ({x_1})+{\lambda _2}\varphi ({x_2})+ \cdots +{\lambda _n}\varphi ({x_n})\), for any \({x_1}, \ldots ,{x_n}\). So, we can get

$$\begin{aligned} {L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*}) &\leq (C/2N)\sum\nolimits_{{i=1}}^{N} {\left(\sum\nolimits_{{t=1}}^{M} {r_{{i,t}}^{{}}} {{({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)}^2}\right)} \hfill \\ &= (C/2N)\sum\nolimits_{{t=1}}^{M} {{{({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)}^2}} \sum\nolimits_{{i=1}}^{N} {r_{{i,t}}^{{}}} . \hfill \\ \end{aligned}$$

Adding \((1/2){\left\| {\mathbf{w}} \right\|^2}\)to both sides of the inequality above, we get \({F_{LS{\text{-3}}}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}} \leq {F_{LS - 2}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}}\).

  1. 3.

    For the Hub-SVM:

Let \({L_{Hub - 2app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {{l_H}({\mathbf{w}},{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}},\) and

$$\begin{aligned} {L_{Hub - 3app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*}) & =(C/2N)\sum\nolimits_{{i=1}}^{N} {{l_H}({\mathbf{w}},{{\mathbf{u}}_i})} \\ & =(hC/N)\sum\nolimits_{{t=1}}^{M} {(\sqrt {1+{{(1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t})}^2}/(4{h^2})} - 1)} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \\ & ={L_{Hub - 2app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*}). \\ \end{aligned}$$

Adding \((1/2){\left\| {\mathbf{w}} \right\|^2}\)to both sides of the inequality above, then \({F_{Hub - {\text{3}}app}}{\text{(}}{\mathbf{w}}{\text{)}}={F_{Hub - 2app}}{\text{(}}{\mathbf{w}}{\text{)}}\).

Appendix 2

2.1 Proof of Theorem 2

  1. 1.

    For the L2-SVM:

Let \({L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}})=(C/2N)\sum\nolimits_{{i=1}}^{N} {l({\mathbf{w}},b,{{\mathbf{z}}_i})}\) denote the average square hinge loss that is minimized in (7) and let \({L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})\) be defined in Theorem1. Then, we have

$$\begin{aligned} {L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) & =(C/2N)\sum\nolimits_{{i=1}}^{N} {{{\{ max\{ 0,1 - {y_i}({{\mathbf{w}}^T}{{\mathbf{z}}_i}+b)\} \} }^2}} \\ & \leq (C/2N)\sum\nolimits_{{i=1}}^{N} {\{ max\{ 0,} \sum\nolimits_{{t=1}}^{M} {{r_{i,t}}} (1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} {\} ^2}+(C/2N)\sum\nolimits_{{i=1}}^{N} {max{{\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} }^2}} \\ & \quad +(C/N)\sum\nolimits_{{t=1}}^{M} {max\{ 0,\;(1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} \\ \end{aligned}$$

By assuming \({\Delta _1}=\mathop {max}\limits_{{1 \leq t \leq M}} (max\{ 0,\;(1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} )\), we further have

$${L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) \leq {L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})+(C/2N)\sum\nolimits_{{i=1}}^{N} {{{\{ max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} \} }^2}} +(CM{\Delta _1}/N)\sum\nolimits_{{i=1}}^{N} {max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} } .$$

Adding \((1/2){\left\| {\mathbf{w}} \right\|^2}\) to both sides of the inequality above, then this theorem is proved.

  1. 2.

    For the LS-SVM:

We have \({L_{LS - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}})=(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({y_i} - ({{\mathbf{w}}^T}{{\mathbf{z}}_i}+b))}^2}}\).

Assume that the maximum value of \({y_t} - ({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)\) in the dataset X is \({\Delta _2}= {\hbox{max} }_{{1 \leq t \leq M}} \left| {{y_t} - {{\mathbf{w}}^T}({{\mathbf{z}}_t}+b)} \right|\). Then, we have

$$\begin{aligned} {L_{LS - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) & \leq (C/2N)\sum\nolimits_{{i=1}}^{N} {(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}} } ({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t} - b){)^2}+(CM/N)({{\mathbf{w}}^T}{\Delta _2})\sum\nolimits_{{i=1}}^{N} {({\tau _i}{r_{i,t}})} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({{\mathbf{w}}^T}{\tau _i})}^2}} \\ & ={L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})+(CM/N)({{\mathbf{w}}^T}{\Delta _2})\sum\nolimits_{{i=1}}^{N} {{\tau _i}} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({{\mathbf{w}}^T}{\tau _i})}^2}} \\ \end{aligned}$$

Adding \((1/2){\left\| {\mathbf{w}} \right\|^2}\) to both sides of the inequality above, and in terms of M ≪ N, this theorem is proved.

  1. 3.

    For the Hub-SVM:

We have

$$\begin{aligned} {L_{Hub - 1app}}({\mathbf{w}},{\mathbf{X}}) & =(hC/N)\sum\nolimits_{{i=1}}^{N} {(\sqrt {1+{{(1+h - yf({\mathbf{x}}))}^2}/(4{h^2})} - 1)} \\ & \leq (hC/N)\sum\nolimits_{{i=1}}^{N} {\{ (\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}){)^2}} /(4{h^2})} - 1)} \\ & \quad +(C/Nh)\left| {{{\mathbf{w}}^T}{\tau _i}} \right|+\sqrt {(C/N)\left| {(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}))({y_i}{{\mathbf{w}}^T}{\tau _i})} } \right|/(4{h^2})} \} \\ & ={L_{Hub - 3app}}({\mathbf{w}},{{\mathbf{X}}^*})+{(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} +{(C/2N)^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}\sqrt {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} } . \\ \end{aligned}$$

Meanwhile, we can get:

$$\begin{aligned} {L_{Hub - 1app}}({\mathbf{w}},{\mathbf{X}}) & =\;(hC/N)\sum\nolimits_{{i=1}}^{N} {(\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}{{(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}+{\tau _i})} )}^2}/(4{h^2})} - 1)} \hfill \\ & \geq (hC/N)\sum\nolimits_{{i=1}}^{N} {\{ (\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}){)^2}} /(4{h^2})} - 1)} \hfill \\ &\quad - (C/Nh)\left| {{{\mathbf{w}}^T}{\tau _i}} \right| - \sqrt {(C/N)\left| {(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}))({y_i}{{\mathbf{w}}^T}{\tau _i})} } \right|/(4{h^2})} \} \hfill \\ & ={L_{Hub - 3app}}({\mathbf{w}},{{\mathbf{X}}^*}) - {(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} - {(C/2N)^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}\sqrt {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} } . \hfill \\ \end{aligned}$$

Appendix 3

3.1 Proof of Corollary 1

  1. 1.

    For the L2-EVM, \({F_{L2 - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{L2 - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq {C^2}\varepsilon /2+CM{\Delta _1}\sqrt {C\varepsilon }\).

$$\begin{aligned} F_{{L2 - 1}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) & \le (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\{ 0, - y_{i} {\mathbf{w}}_{2}^{{*T}} \tau _{i} \} \} ^{2} + (CM\Delta _{1} /N)\sum\nolimits_{{i = 1}}^{N} {max\{ 0, - y_{i} {\mathbf{w}}_{2}^{{*T}} \tau _{i} \} } } \\ & \le (C/2N)\left( {\sum\nolimits_{{i = 1}}^{N} {\left\| {{\mathbf{w}}_{2}^{*} } \right\|^{2} } \left\| {\tau _{i} } \right\|^{2} } \right) + (CM\Delta _{1} /N)\left( {\sum\nolimits_{{i = 1}}^{N} {\left\| {{\mathbf{w}}_{2}^{*} } \right\|} \left\| {\tau _{i} } \right\|} \right) \\ & \le C^{2} \varepsilon /2 + CM\Delta _{1} \sqrt {C\varepsilon } . \\ \end{aligned}$$

Since\(({\mathbf{w}}_{1}^{*},\;b_{1}^{*})\)is the solution of (7), \({F_{L{\text{2-1}}}}{\text{(}}{\mathbf{w}}_{1}^{*},{\text{ }}b_{1}^{*}{\text{)}} \leq {F_{L2 - 1}}{\text{(}}{\mathbf{w}}_{2}^{*},{\text{ }}b_{2}^{*}{\text{)}}\).

Using Theorem1, we obtain

$$\begin{aligned} F_{{L2 - 1}} ({\mathbf{w}}_{1}^{*} ,\;b_{1}^{*} ) - F_{{L2 - 2}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) & \le F_{{L2 - 1}} ({\mathbf{w}}_{1}^{*} ,\;b_{1}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) \\ & \le F_{{L2 - 1}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) \\ & \le C^{2} \varepsilon /2 + CM\Delta _{1} \sqrt {C\varepsilon } . \\ \end{aligned}$$
  1. 2.

    For the LS-EVM, \({F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2\).

Since \({F_{LS - 1}}{\text{(}}{\mathbf{w}}_{2}^{*},\;b_{2}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq (CM/N){\mathbf{w}}{_{2}^{*T}}{\Delta _2}\sum\nolimits_{{i=1}}^{N} {{\tau _i}} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{{\text{(}}{\mathbf{w}}{{_{2}^{*T}}}{\tau _i}{\text{)}}}^2}} \;\; \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2,\) we have

$$\begin{gathered} {F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq {F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \hfill \\ \leq {F_{LS - 1}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2. \hfill \\ \end{gathered}$$
  1. 3.

    For the Hub-SVM,

    $$\begin{gathered} - ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} - ({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}} \leq {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}), \hfill \\ \leq ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} +({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}, \hfill \\ \end{gathered}$$

    where \({\Delta _3}= {\hbox{max} }_{{1 \leq t \leq M}} \sqrt {\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|}\).

According to Theorems1 and 2, we get

$$\begin{aligned} {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})&={F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) \hfill \\ &\leq {F_{Hub - 1app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) \hfill \\ & \leq {(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} {\text{+}}(C/2N)\sum\nolimits_{{t=1}}^{M} {\sqrt {(C/N)\left| {1+h - (C/2N){y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} . \hfill \\ \end{aligned}$$

Let us define \({\Delta _3}= {\hbox{max} }_{{1 \leq t \leq M}} \sqrt {\left| {1+h - (C/2N){y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|}\), in terms of \(\left\| {{{\mathbf{w}}^*}} \right\| \leq \sqrt {C(1+h)/2}\), we immediately have

$$\begin{gathered} {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}) \leq ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} +({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}, \hfill \\ {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}) \geq - ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} - ({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}. \hfill \\ \end{gathered}$$

Appendix 4

4.1 Proof of Corollary 2

Based on Theorem1, we know that \({F_{Hub - 3app}}({\mathbf{w}}_{3}^{*}) \leq {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*})={F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})\). Meanwhile, \({F_{Hub - 3app}}({\mathbf{w}}_{3}^{*})={F_{Hub - 2app}}({\mathbf{w}}_{3}^{*}) \geq {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})\). Hence, we have \({F_{Hub - 3app}}({\mathbf{w}}_{3}^{*})={F_{Hub - 3app}}({\mathbf{w}}_{2}^{*})\).

From these results, we get \({F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{1}^{*}) \leq 0\). From Theorem2, we have the following inequalities

$$\begin{aligned}& - {{\text{(}}C/N{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} - {\text{(}}C/N{\text{)}}\sum\nolimits_{{t=1}}^{M} {\sqrt {{\text{(}}2C/N{\text{)}}\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} }\\ & \quad\sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \leq {F_{Hub - {\text{1}}app}}{\text{(}}{\mathbf{w}}_{1}^{*}{\text{)}} - {F_{Hub - 3app}}{\text{(}}{\mathbf{w}}_{1}^{*}{\text{),}}\end{aligned}$$

and

$$\begin{aligned}& {F_{Hub - {\text{1}}app}}{\text{(}}{\mathbf{w}}_{2}^{*}{\text{)}} - {F_{Hub - 3app}}{\text{(}}{\mathbf{w}}_{2}^{*}{\text{)}} \leq {{\text{(}}C/N{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} {\text{+(}}C/N{\text{)}}\\ & \quad \sum\nolimits_{{t=1}}^{M} {\sqrt {{\text{(}}2C/N{\text{)}}\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|}. \end{aligned}$$

Adding these two inequalities and using the properties of \(\left\| {{{\mathbf{w}}^*}} \right\| \leq \sqrt {C(1+h)/2}\), we get

$$\begin{gathered} {F_{Hub - 1app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) \leq {\text{2(}}C/N{{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \hfill \\ \quad +2 (C/N{{\text{)}}^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \hfill \\ \leq 2{C^2}\sqrt {C\varepsilon (1+h)} {\text{/}}N{\text{+2}}{\Delta _3}MC\sqrt {2CN} {(C\varepsilon (1+h))^{1/4}}/N. \hfill \\ \end{gathered}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, X., Chung, Fl. & Wang, S. Extreme vector machine for fast training on large data. Int. J. Mach. Learn. & Cyber. 11, 33–53 (2020). https://doi.org/10.1007/s13042-019-00936-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-00936-3

Keywords

Navigation