Extreme vector machine for fast training on large data

Gu, Xiaoqing; Chung, Fu-lai; Wang, Shitong

doi:10.1007/s13042-019-00936-3

Extreme vector machine for fast training on large data

Original Article
Published: 12 February 2019

Volume 11, pages 33–53, (2020)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Xiaoqing Gu^1,2,
Fu-lai Chung³ &
Shitong Wang^1,3

433 Accesses
16 Citations
Explore all metrics

Abstract

Quite often, different types of loss functions are adopted in SVM or its variants to meet practical requirements. How to scale up the corresponding SVMs for large datasets are becoming more and more important in practice. In this paper, extreme vector machine (EVM) is proposed to realize fast training of SVMs with different yet typical loss functions on large datasets. EVM begins with a fast approximation of the convex hull, expressed by extreme vectors, of the training data in the feature space, and then completes the corresponding SVM optimization over the extreme vector set. When hinge loss function is adopted, EVM is the same as the approximate extreme points support vector machine (AESVM) for classification. When square hinge loss function, least squares loss function and Huber loss function are adopted, EVM corresponds to three versions, namely, L2-EVM, LS-EVM and Hub-EVM, respectively, for classification or regression. In contrast to the most related machine AESVM, with the retainment of its theoretical advantage, EVM is distinctive in its applicability to a wide variety of loss functions to meet practical requirements. Compared with the other state-of-the-art fast training algorithms CVM and FastKDE of SVMs, EVM indeed relaxes the limitation of least squares loss functions, and experimentally exhibits its superiority in training time, robustness capability and number of support vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Extreme Support Vector Regression

Sparse Extreme Learning Machine for Regression

Two-stage extreme learning machine for high-dimensional data

Article 14 August 2014

References

Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Book MATH Google Scholar
Tahira M, Khan A (2016) Protein subcellular localization of fluorescence microscopy images: employing new statistical and Texton based image features and SVM based ensemble classification. Inf Sci 345(6):65–80
Article Google Scholar
Li YJ, Leng QK, Fu YZ (2017) Cross kernel distance minimization for designing support vector machines. Int J Mach Learn Cybernet 8(5):1585–1593
Article Google Scholar
Hu L, Lu SX, Wang XZ (2013) A new and informative active learning approach for support vector machine. Inf Sci 244(9):142–160
MathSciNet MATH Google Scholar
Bang S, Kang J, Jhun M, Kim E (2017) Hierarchically penalized support vector machine with grouped variables. Int J Mach Learn Cybernet 8(4):1211–1221
Article Google Scholar
Reshma K, Pal A (2017) Tree based multi-category Laplacian TWSVM for content based image retrieval. Int J Mach Learn Cybernet 8(4):1197–1210
Article Google Scholar
Muhammad T, Shubham K (2017) A regularization on Lagrangian twin support vector regression. Int J Mach Learn Cybernet 8(3):807–821
Article Google Scholar
Williams C, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems, pp 661–667
Lin C (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1595
Article Google Scholar
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In: International conference on neural information processing systems. Curran Associates Inc., pp 1177–1184
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article MathSciNet MATH Google Scholar
Keerthi S, Shevade S, Bhattachayya C, Murth K (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649
Article MATH Google Scholar
Peng XJ, Kong LY, Chen DJ (2017) A structural information-based twin-hypersphere support vector machine classifier. Int J Mach Learn Cybernet 8(1):295–308
Article Google Scholar
Joachims T (1999) Making large-scale support vector machine learning practical. Advances in kernel methods. MIT Press, Cambridge, pp 169–184
Google Scholar
Wang D, Qiao H, Zhang B, Wang M (2013) Online support vector machine based on convex hull vertices selection. IEEE Trans Neural Netw Learn Syst 24(4):593–609
Article Google Scholar
Gu XQ, Chung FL, Wang ST (2018) Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks. Knowl Based Syst 151(1):149–164
Article Google Scholar
Osuna E, Castro OD (2002) Convex hull in feature space for support vector machines. In: Proceedings of advances in artificial intelligence, pp 411–419
Chapter Google Scholar
Osuna E, Tsang I, Kwok J, Cheung P (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
MathSciNet MATH Google Scholar
Tsang I, Kwok J, Zurada J (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140
Article Google Scholar
Tsang I, Kwok A, Kwok J (2007) Simpler core vector machines with enclosing balls. In: Proceedings of the 24th international conference on machine learning, pp 911–918
Wang ST, Wang J, Chung F (2014) Kernel density estimation, kernel methods, and fast learning in large data sets. IEEE Trans Cybernet 44(1):1–20
Article Google Scholar
Nandan M, Khargonekar PP, Talathi SS (2014) Fast SVM training using approximate extreme points. J Mach Learn Res 15:59–98
MathSciNet MATH Google Scholar
Huang CQ, Chung FL, Wang ST (2016) Multi-view L2-SVM and its multi-view core vector machine. Neural Netw 75(3):110–125
Article MATH Google Scholar
Suykens J, Gestel T, Brabanter J, Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Pub, Singapore
Book MATH Google Scholar
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104
Article MATH Google Scholar
Karasuyama M, Takeuchi I (2010) Nonlinear regularization path for the modified Huber loss support vector machines. In: Proceedings of international joint conference on neural networks, pp 1–8
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126
Article MATH Google Scholar
Chau A, Li X, Yu W (2013) Large data sets classification using convex–concave hull and support vector machine. Soft Comput 17(5):793–804
Article Google Scholar
Theodoridis S, Mavroforakis M (2007) Reduced convex hulls: a geometric approach to support vector machines. IEEE Signal Process Mag 24(3):119–122
Article Google Scholar
Blum M, Floyd RW, Pratt V, Rivest RL, Tarjan RE (1973) Time bounds for selection. J Comput Syst Sci 7(8):448–461
Article MathSciNet MATH Google Scholar
Tax D, Duin R (1999) Support vector domain description. Pattern Recogn Lett 20(11):1191–1199
Article Google Scholar
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178
Article MathSciNet MATH Google Scholar
Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M (1997) Deterministic edge-preserving regularization in computed imaging. IEEE Trans Image Proc 6(2):298–311
Article Google Scholar
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Ye J, Xiong T (2007) SVM versus least squares SVM. In: Proceedings of the 7th international conference on artificial intelligence and statistics, pp 644–651
Lin C. LIBSVM data. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed 28 Feb 2017
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2):255–287
Google Scholar
Gao S, Tsang IW, Chia LT (2013) Sparse representation with kernels. IEEE Trans Image Process 22(2):423–434
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the Hong Kong Polytechnic University under Grant G-UA3W, by the National Natural Science Foundation of China under Grant nos. 61572236, 61702225 and 61806026, by the Natural Science Foundation of Jiangsu Province under Grant BK20161268 and BK20180956.

Author information

Authors and Affiliations

School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China
Xiaoqing Gu & Shitong Wang
School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu, China
Xiaoqing Gu
Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Fu-lai Chung & Shitong Wang

Authors

Xiaoqing Gu
View author publications
You can also search for this author in PubMed Google Scholar
Fu-lai Chung
View author publications
You can also search for this author in PubMed Google Scholar
Shitong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqing Gu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 1.1 Proof of Theorem 1

1.
For the L2-SVM:

Let ${L_{L2 - 2}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}}$ and ${L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{u}}_i})}$, where ${{\mathbf{u}}_i}=\sum\nolimits_{{i=1}}^{M} {{r_{i,t}}} {{\mathbf{z}}_t}$. From the precondition of y_i = y_j in each subset, we have

$$\begin{aligned} L_{{L2 - 3}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ) & = (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\;\{ 0,\;[1 - y_{i} } ({\mathbf{w}}^{T} \sum\nolimits_{{t = 1}}^{M} {r_{{i,t}} {\mathbf{z}}_{t} + b)]\} \} ^{2} } \\ & = (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\{ 0,\;} \sum\nolimits_{{t = 1}}^{M} {r_{{i,t}} } [1 - y_{t} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b)]\} \} ^{2} . \\ \end{aligned}$$

According to $max(0,\;A+B) \leq max(0,\;A)+max(0,\;B)$,

$$\begin{aligned} L_{{L2 - 3}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ) & \le (C/2N)\sum\nolimits_{{i = 1}}^{N} {\sum\nolimits_{{t = 1}}^{M} {\{ max\{ 0,} } \;r_{{i,t}} [1 - y_{i} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b)]\} \} ^{2} \\ & \le (C/2N)\sum\nolimits_{{t = 1}}^{M} {\{ max\{ 0,\;1 - y_{i} ({\mathbf{w}}^{T} {\mathbf{z}}_{t} + b} )\} \} ^{2} \sum\nolimits_{{i = 1}}^{N} {r_{{i,t}} } = L_{{L2 - 2}} ({\mathbf{w}},b,{\mathbf{X}}^{*} ). \\ \end{aligned}$$

Adding $(1/2){\left\| {\mathbf{w}} \right\|^2}$ to both sides of the inequality above, we get ${F_{L2-3}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}} \leq {F_{L2-2}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}}$.

2.
For the LS-SVM:

Let ${L_{LS - 2}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {l({\mathbf{w}},b,{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}}$ and ${L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{i=1}}^{N} {l({\mathbf{w}},b,{{\mathbf{u}}_i})}$. We have

$$\begin{gathered} {L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{i=1}}^{N} {({y_i} - ({{\mathbf{w}}^T}} \sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}+} b){)^2} \hfill \\ =(C/2N)\sum\nolimits_{{i=1}}^{N} {(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}(} } {y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b){)^2}. \hfill \\ \end{gathered}$$

According to Jensen’s inequality, if ${\lambda _1},{\lambda _2}, \ldots ,{\lambda _n}$ are nonnegative real numbers such that${\lambda _1}+{\lambda _2}+ \ldots +{\lambda _n}=1$, $\varphi ( \cdot )$is a real convex function, then $\varphi ({\lambda _1}{x_1}+{\lambda _2}{x_2}+ \cdots +{\lambda _n}{x_n}) \leq {\lambda _1}\varphi ({x_1})+{\lambda _2}\varphi ({x_2})+ \cdots +{\lambda _n}\varphi ({x_n})$, for any ${x_1}, \ldots ,{x_n}$. So, we can get

$$\begin{aligned} {L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*}) &\leq (C/2N)\sum\nolimits_{{i=1}}^{N} {\left(\sum\nolimits_{{t=1}}^{M} {r_{{i,t}}^{{}}} {{({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)}^2}\right)} \hfill \\ &= (C/2N)\sum\nolimits_{{t=1}}^{M} {{{({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)}^2}} \sum\nolimits_{{i=1}}^{N} {r_{{i,t}}^{{}}} . \hfill \\ \end{aligned}$$

Adding $(1/2){\left\| {\mathbf{w}} \right\|^2}$to both sides of the inequality above, we get ${F_{LS{\text{-3}}}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}} \leq {F_{LS - 2}}{\text{(}}{\mathbf{w}},{\text{ }}b{\text{)}}$.

3.
For the Hub-SVM:

Let ${L_{Hub - 2app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*})=(C/2N)\sum\nolimits_{{t=1}}^{M} {{l_H}({\mathbf{w}},{{\mathbf{z}}_t})} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}},$ and

$$\begin{aligned} {L_{Hub - 3app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*}) & =(C/2N)\sum\nolimits_{{i=1}}^{N} {{l_H}({\mathbf{w}},{{\mathbf{u}}_i})} \\ & =(hC/N)\sum\nolimits_{{t=1}}^{M} {(\sqrt {1+{{(1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t})}^2}/(4{h^2})} - 1)} \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \\ & ={L_{Hub - 2app}}{\text{(}}{\mathbf{w}},{{\mathbf{X}}^*}). \\ \end{aligned}$$

Adding $(1/2){\left\| {\mathbf{w}} \right\|^2}$to both sides of the inequality above, then ${F_{Hub - {\text{3}}app}}{\text{(}}{\mathbf{w}}{\text{)}}={F_{Hub - 2app}}{\text{(}}{\mathbf{w}}{\text{)}}$.

Appendix 2 2.1 Proof of Theorem 2

1.
For the L2-SVM:

Let ${L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}})=(C/2N)\sum\nolimits_{{i=1}}^{N} {l({\mathbf{w}},b,{{\mathbf{z}}_i})}$ denote the average square hinge loss that is minimized in (7) and let ${L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})$ be defined in Theorem1. Then, we have

$$\begin{aligned} {L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) & =(C/2N)\sum\nolimits_{{i=1}}^{N} {{{\{ max\{ 0,1 - {y_i}({{\mathbf{w}}^T}{{\mathbf{z}}_i}+b)\} \} }^2}} \\ & \leq (C/2N)\sum\nolimits_{{i=1}}^{N} {\{ max\{ 0,} \sum\nolimits_{{t=1}}^{M} {{r_{i,t}}} (1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} {\} ^2}+(C/2N)\sum\nolimits_{{i=1}}^{N} {max{{\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} }^2}} \\ & \quad +(C/N)\sum\nolimits_{{t=1}}^{M} {max\{ 0,\;(1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} \\ \end{aligned}$$

By assuming ${\Delta _1}=\mathop {max}\limits_{{1 \leq t \leq M}} (max\{ 0,\;(1 - {y_t}({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b))\} )$, we further have

$${L_{L2 - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) \leq {L_{L2 - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})+(C/2N)\sum\nolimits_{{i=1}}^{N} {{{\{ max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} \} }^2}} +(CM{\Delta _1}/N)\sum\nolimits_{{i=1}}^{N} {max\{ 0, - {y_i}{{\mathbf{w}}^T}{\tau _i}\} } .$$

Adding $(1/2){\left\| {\mathbf{w}} \right\|^2}$ to both sides of the inequality above, then this theorem is proved.

2.
For the LS-SVM:

We have ${L_{LS - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}})=(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({y_i} - ({{\mathbf{w}}^T}{{\mathbf{z}}_i}+b))}^2}}$.

Assume that the maximum value of ${y_t} - ({{\mathbf{w}}^T}{{\mathbf{z}}_t}+b)$ in the dataset X is ${\Delta _2}= {\hbox{max} }_{{1 \leq t \leq M}} \left| {{y_t} - {{\mathbf{w}}^T}({{\mathbf{z}}_t}+b)} \right|$. Then, we have

$$\begin{aligned} {L_{LS - 1}}{\text{(}}{\mathbf{w}},b,{\mathbf{X}}) & \leq (C/2N)\sum\nolimits_{{i=1}}^{N} {(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}} } ({y_t} - {{\mathbf{w}}^T}{{\mathbf{z}}_t} - b){)^2}+(CM/N)({{\mathbf{w}}^T}{\Delta _2})\sum\nolimits_{{i=1}}^{N} {({\tau _i}{r_{i,t}})} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({{\mathbf{w}}^T}{\tau _i})}^2}} \\ & ={L_{LS - 3}}{\text{(}}{\mathbf{w}},b,{{\mathbf{X}}^*})+(CM/N)({{\mathbf{w}}^T}{\Delta _2})\sum\nolimits_{{i=1}}^{N} {{\tau _i}} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{({{\mathbf{w}}^T}{\tau _i})}^2}} \\ \end{aligned}$$

Adding $(1/2){\left\| {\mathbf{w}} \right\|^2}$ to both sides of the inequality above, and in terms of M ≪ N, this theorem is proved.

3.
For the Hub-SVM:

We have

$$\begin{aligned} {L_{Hub - 1app}}({\mathbf{w}},{\mathbf{X}}) & =(hC/N)\sum\nolimits_{{i=1}}^{N} {(\sqrt {1+{{(1+h - yf({\mathbf{x}}))}^2}/(4{h^2})} - 1)} \\ & \leq (hC/N)\sum\nolimits_{{i=1}}^{N} {\{ (\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}){)^2}} /(4{h^2})} - 1)} \\ & \quad +(C/Nh)\left| {{{\mathbf{w}}^T}{\tau _i}} \right|+\sqrt {(C/N)\left| {(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}))({y_i}{{\mathbf{w}}^T}{\tau _i})} } \right|/(4{h^2})} \} \\ & ={L_{Hub - 3app}}({\mathbf{w}},{{\mathbf{X}}^*})+{(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} +{(C/2N)^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}\sqrt {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} } . \\ \end{aligned}$$

Meanwhile, we can get:

$$\begin{aligned} {L_{Hub - 1app}}({\mathbf{w}},{\mathbf{X}}) & =\;(hC/N)\sum\nolimits_{{i=1}}^{N} {(\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}{{(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}+{\tau _i})} )}^2}/(4{h^2})} - 1)} \hfill \\ & \geq (hC/N)\sum\nolimits_{{i=1}}^{N} {\{ (\sqrt {1+(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}){)^2}} /(4{h^2})} - 1)} \hfill \\ &\quad - (C/Nh)\left| {{{\mathbf{w}}^T}{\tau _i}} \right| - \sqrt {(C/N)\left| {(1+h - (C/2N){y_i}{{\mathbf{w}}^T}(\sum\nolimits_{{t=1}}^{M} {{r_{i,t}}{{\mathbf{z}}_t}))({y_i}{{\mathbf{w}}^T}{\tau _i})} } \right|/(4{h^2})} \} \hfill \\ & ={L_{Hub - 3app}}({\mathbf{w}},{{\mathbf{X}}^*}) - {(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} - {(C/2N)^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}\sqrt {\left| {{{\mathbf{w}}^T}{\tau _i}} \right|} } . \hfill \\ \end{aligned}$$

Appendix 3 3.1 Proof of Corollary 1

1.
For the L2-EVM, ${F_{L2 - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{L2 - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq {C^2}\varepsilon /2+CM{\Delta _1}\sqrt {C\varepsilon }$.

$$\begin{aligned} F_{{L2 - 1}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) & \le (C/2N)\sum\nolimits_{{i = 1}}^{N} {\{ max\{ 0, - y_{i} {\mathbf{w}}_{2}^{{*T}} \tau _{i} \} \} ^{2} + (CM\Delta _{1} /N)\sum\nolimits_{{i = 1}}^{N} {max\{ 0, - y_{i} {\mathbf{w}}_{2}^{{*T}} \tau _{i} \} } } \\ & \le (C/2N)\left( {\sum\nolimits_{{i = 1}}^{N} {\left\| {{\mathbf{w}}_{2}^{*} } \right\|^{2} } \left\| {\tau _{i} } \right\|^{2} } \right) + (CM\Delta _{1} /N)\left( {\sum\nolimits_{{i = 1}}^{N} {\left\| {{\mathbf{w}}_{2}^{*} } \right\|} \left\| {\tau _{i} } \right\|} \right) \\ & \le C^{2} \varepsilon /2 + CM\Delta _{1} \sqrt {C\varepsilon } . \\ \end{aligned}$$

Since$({\mathbf{w}}_{1}^{*},\;b_{1}^{*})$is the solution of (7), ${F_{L{\text{2-1}}}}{\text{(}}{\mathbf{w}}_{1}^{*},{\text{ }}b_{1}^{*}{\text{)}} \leq {F_{L2 - 1}}{\text{(}}{\mathbf{w}}_{2}^{*},{\text{ }}b_{2}^{*}{\text{)}}$.

Using Theorem1, we obtain

$$\begin{aligned} F_{{L2 - 1}} ({\mathbf{w}}_{1}^{*} ,\;b_{1}^{*} ) - F_{{L2 - 2}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) & \le F_{{L2 - 1}} ({\mathbf{w}}_{1}^{*} ,\;b_{1}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) \\ & \le F_{{L2 - 1}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) - F_{{L2 - 3}} ({\mathbf{w}}_{2}^{*} ,\;b_{2}^{*} ) \\ & \le C^{2} \varepsilon /2 + CM\Delta _{1} \sqrt {C\varepsilon } . \\ \end{aligned}$$

2.
For the LS-EVM, ${F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2$.

Since ${F_{LS - 1}}{\text{(}}{\mathbf{w}}_{2}^{*},\;b_{2}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq (CM/N){\mathbf{w}}{_{2}^{*T}}{\Delta _2}\sum\nolimits_{{i=1}}^{N} {{\tau _i}} +(C/2N)\sum\nolimits_{{i=1}}^{N} {{{{\text{(}}{\mathbf{w}}{{_{2}^{*T}}}{\tau _i}{\text{)}}}^2}} \;\; \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2,$ we have

$$\begin{gathered} {F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 2}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq {F_{LS - 1}}({\mathbf{w}}_{1}^{*},\;b_{1}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \hfill \\ \leq {F_{LS - 1}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) - {F_{LS - 3}}({\mathbf{w}}_{2}^{*},\;b_{2}^{*}) \leq CM{\Delta _2}\Omega \sqrt \varepsilon +C{\Omega ^2}\varepsilon /2. \hfill \\ \end{gathered}$$

3.
For the Hub-SVM,
$$\begin{gathered} - ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} - ({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}} \leq {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}), \hfill \\ \leq ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} +({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}, \hfill \\ \end{gathered}$$
where ${\Delta _3}= {\hbox{max} }_{{1 \leq t \leq M}} \sqrt {\left| {1+h - (C/2N){y_t}{{\mathbf{w}}^T}{{\mathbf{z}}_t}} \right|}$.

According to Theorems1 and 2, we get

$$\begin{aligned} {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})&={F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) \hfill \\ &\leq {F_{Hub - 1app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) \hfill \\ & \leq {(C/2N)^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} {\text{+}}(C/2N)\sum\nolimits_{{t=1}}^{M} {\sqrt {(C/N)\left| {1+h - (C/2N){y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} . \hfill \\ \end{aligned}$$

Let us define ${\Delta _3}= {\hbox{max} }_{{1 \leq t \leq M}} \sqrt {\left| {1+h - (C/2N){y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|}$, in terms of $\left\| {{{\mathbf{w}}^*}} \right\| \leq \sqrt {C(1+h)/2}$, we immediately have

$$\begin{gathered} {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}) \leq ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} +({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}, \hfill \\ {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) - {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*}) \geq - ({C^2}/4N)\sqrt {C\varepsilon (1+h)/2} - ({\Delta _3}MC\sqrt {CN} /2N){(C\varepsilon (1+h)/2)^{1/4}}. \hfill \\ \end{gathered}$$

Appendix 4 4.1 Proof of Corollary 2

Based on Theorem1, we know that ${F_{Hub - 3app}}({\mathbf{w}}_{3}^{*}) \leq {F_{Hub - 3app}}({\mathbf{w}}_{2}^{*})={F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})$. Meanwhile, ${F_{Hub - 3app}}({\mathbf{w}}_{3}^{*})={F_{Hub - 2app}}({\mathbf{w}}_{3}^{*}) \geq {F_{Hub - 2app}}({\mathbf{w}}_{2}^{*})$. Hence, we have ${F_{Hub - 3app}}({\mathbf{w}}_{3}^{*})={F_{Hub - 3app}}({\mathbf{w}}_{2}^{*})$.

From these results, we get ${F_{Hub - 3app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 3app}}({\mathbf{w}}_{1}^{*}) \leq 0$. From Theorem2, we have the following inequalities

$$\begin{aligned}& - {{\text{(}}C/N{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} - {\text{(}}C/N{\text{)}}\sum\nolimits_{{t=1}}^{M} {\sqrt {{\text{(}}2C/N{\text{)}}\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} }\\ & \quad\sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \leq {F_{Hub - {\text{1}}app}}{\text{(}}{\mathbf{w}}_{1}^{*}{\text{)}} - {F_{Hub - 3app}}{\text{(}}{\mathbf{w}}_{1}^{*}{\text{),}}\end{aligned}$$

and

$$\begin{aligned}& {F_{Hub - {\text{1}}app}}{\text{(}}{\mathbf{w}}_{2}^{*}{\text{)}} - {F_{Hub - 3app}}{\text{(}}{\mathbf{w}}_{2}^{*}{\text{)}} \leq {{\text{(}}C/N{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} {\text{+(}}C/N{\text{)}}\\ & \quad \sum\nolimits_{{t=1}}^{M} {\sqrt {{\text{(}}2C/N{\text{)}}\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|}. \end{aligned}$$

Adding these two inequalities and using the properties of $\left\| {{{\mathbf{w}}^*}} \right\| \leq \sqrt {C(1+h)/2}$, we get

$$\begin{gathered} {F_{Hub - 1app}}({\mathbf{w}}_{2}^{*}) - {F_{Hub - 1app}}({\mathbf{w}}_{1}^{*}) \leq {\text{2(}}C/N{{\text{)}}^2}\sum\nolimits_{{i=1}}^{N} {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \hfill \\ \quad +2 (C/N{{\text{)}}^{\frac{3}{2}}}\sum\nolimits_{{t=1}}^{M} {\sqrt {2\left| {1+h - {\text{(}}C/N{\text{)}}{y_t}{\mathbf{w}}{{_{2}^{*}}^T}{{\mathbf{z}}_t}} \right|} } \sum\nolimits_{{i=1}}^{N} {{r_{i,t}}} \sqrt {\left| {{\mathbf{w}}{{_{2}^{*}}^T}{\tau _i}} \right|} \hfill \\ \leq 2{C^2}\sqrt {C\varepsilon (1+h)} {\text{/}}N{\text{+2}}{\Delta _3}MC\sqrt {2CN} {(C\varepsilon (1+h))^{1/4}}/N. \hfill \\ \end{gathered}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, X., Chung, Fl. & Wang, S. Extreme vector machine for fast training on large data. Int. J. Mach. Learn. & Cyber. 11, 33–53 (2020). https://doi.org/10.1007/s13042-019-00936-3

Download citation

Received: 24 September 2017
Accepted: 28 January 2019
Published: 12 February 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s13042-019-00936-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extreme vector machine for fast training on large data

Abstract

Access this article

Similar content being viewed by others

Extreme Support Vector Regression

Sparse Extreme Learning Machine for Regression

Two-stage extreme learning machine for high-dimensional data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1

1.1 Proof of Theorem 1

Appendix 2

2.1 Proof of Theorem 2

Appendix 3

3.1 Proof of Corollary 1

Appendix 4

4.1 Proof of Corollary 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extreme vector machine for fast training on large data

Abstract

Access this article

Similar content being viewed by others

Extreme Support Vector Regression

Sparse Extreme Learning Machine for Regression

Two-stage extreme learning machine for high-dimensional data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1

1.1 Proof of Theorem 1

Appendix 2

2.1 Proof of Theorem 2

Appendix 3

3.1 Proof of Corollary 1

Appendix 4

4.1 Proof of Corollary 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation