A primal method for multiple kernel learning

Hao, Zhifeng; Yuan, Ganzhao; Yang, Xiaowei; Chen, Zijie

doi:10.1007/s00521-012-1022-2

A primal method for multiple kernel learning

Original Article
Published: 12 July 2012

Volume 23, pages 975–987, (2013)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhifeng Hao^1,2,
Ganzhao Yuan¹,
Xiaowei Yang³ &
…
Zijie Chen¹

477 Accesses
6 Citations
Explore all metrics

Abstract

The canonical support vector machines (SVMs) are based on a single kernel, recent publications have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and promote classification accuracy. However, most of existing approaches mainly reformulate the multiple kernel learning as a saddle point optimization problem which concentrates on solving the dual. In this paper, we show that the multiple kernel learning (MKL) problem can be reformulated as a BiConvex optimization and can also be solved in the primal. While the saddle point method still lacks convergence results, our proposed method exhibits strong optimization convergence properties. To solve the MKL problem, a two-stage algorithm that optimizes canonical SVMs and kernel weights alternately is proposed. Since standard Newton and gradient methods are too time-consuming, we employ the truncated-Newton method to optimize the canonical SVMs. The Hessian matrix need not be stored explicitly, and the Newton direction can be computed using several Preconditioned Conjugate Gradient steps on the Hessian operator equation, the algorithm is shown more efficient than the current primal approaches in this MKL setting. Furthermore, we use the Nesterov’s optimal gradient method to optimize the kernel weights. One remarkable advantage of solving in the primal is that it achieves much faster convergence rate than solving in the dual and does not require a two-stage algorithm even for the single kernel LapSVM. Introducing the Laplacian regularizer, we also extend our primal method to semi-supervised scenario. Extensive experiments on some UCI benchmarks have shown that the proposed algorithm converges rapidly and achieves competitive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of machine learning for big data processing

Article Open access 28 May 2016

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Problem formulations and solvers in linear SVM: a review

Article 16 January 2018

Notes

http://asi.insa-rouen.fr/enseignants/~arakotom.
http://research.microsoft.com/en-us/um/people/manik/.
http://ttic.uchicago.edu/~argyriou/code/index.html
http://www.dii.unisi.it/~melacci/lapsvmp/
Following Argyriou, We use matlab optimizer ‘minconf’ to obtain the affine transformation and tangent distances.

References

Argyriou A, Herbster M, Pontil M (2005) Combining graph laplacians for semi-supervised learning. In: NIPS, pp 67–74
Argyriou A, Micchelli CA, Pontil M (2005) Learning convex combinations of continuously parameterized basic kernels. In: COLT, pp 338–352
Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225
MathSciNet MATH Google Scholar
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, vol 69
Bertsekas D (1999) Nonlinear programming
Bo L, Wang L, Jiao L (2007) Recursive finite newton algorithm for support vector regression in the primal. Neural Comput 19(4):1082–1096
Article MathSciNet MATH Google Scholar
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5)
Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, pp 109–116
Duchi J, Shwartz SS, Singer Y, Chandra T (2008) Efficient projections onto the L1-ball for learning in high dimensions. In: ICML, pp 272–279
Gorski J, Pfeuffer F, Klamroth K (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407
Article MathSciNet MATH Google Scholar
Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear gauss-seidel method under convex constraints. Oper Res Lett 26(3):127–136
Article MathSciNet MATH Google Scholar
Keerthi SS, Chapelle O, Decoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515
MathSciNet MATH Google Scholar
Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. SIAM, Thailand
Book Google Scholar
Kloft M, Brefeld U, Sonnenburg S, Laskov P, Müller K-R, Zien A (2009) Efficient and accurate Lp-Norm multiple kernel learning. In: NIPS, pp 997–1005
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
MATH Google Scholar
Melacci S, Belkin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184
MathSciNet Google Scholar
Nesterov YE (2003) Introductory lectures on convex optimization: a basic course, volume 87 of applied optimization. Kluwer, Boston
Google Scholar
Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2008) Simple MKL. J Mach Learn Res 9:2491–2521
MathSciNet MATH Google Scholar
Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond (adaptive computation and machine learning). The MIT Press, Cambridge
Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book Google Scholar
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
MathSciNet MATH Google Scholar
Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: ICML, pp 1065–1072
Vishwanathan SVN, Sun Z, Theera-Ampornpunt N, Varma M (December 2010) Multiple kernel learning and the SMO algorithm. In: NIPS
Zien A, Ong CS (2007) Multiclass multiple kernel learning. In: ICML, pp 1191–1198

Download references

Acknowledgments

This work is supported by NFS-China (61070033, 61100148), NSF-Guangdong (9251009001000005, S2011040004804) and the Open Project of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Chinese Ministry of Education (93K-17-2009-K04).

Author information

Authors and Affiliations

School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, People’s Republic of China
Zhifeng Hao, Ganzhao Yuan & Zijie Chen
Faculty of Computer Science, Guangdong University of Technology, Guangzhou, People’s Republic of China
Zhifeng Hao
Department of Mathematics, South China University of Technology, Guangzhou, 510641, People’s Republic of China
Xiaowei Yang

Authors

Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Ganzhao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zijie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ganzhao Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, Z., Yuan, G., Yang, X. et al. A primal method for multiple kernel learning. Neural Comput & Applic 23, 975–987 (2013). https://doi.org/10.1007/s00521-012-1022-2

Download citation

Received: 12 February 2012
Accepted: 21 June 2012
Published: 12 July 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s00521-012-1022-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A primal method for multiple kernel learning

Abstract

Access this article

Similar content being viewed by others

A survey of machine learning for big data processing

Tutorial on PCA and approximate PCA and approximate kernel PCA

Problem formulations and solvers in linear SVM: a review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A primal method for multiple kernel learning

Abstract

Access this article

Similar content being viewed by others

A survey of machine learning for big data processing

Tutorial on PCA and approximate PCA and approximate kernel PCA

Problem formulations and solvers in linear SVM: a review

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation