Abstract
The canonical support vector machines (SVMs) are based on a single kernel, recent publications have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and promote classification accuracy. However, most of existing approaches mainly reformulate the multiple kernel learning as a saddle point optimization problem which concentrates on solving the dual. In this paper, we show that the multiple kernel learning (MKL) problem can be reformulated as a BiConvex optimization and can also be solved in the primal. While the saddle point method still lacks convergence results, our proposed method exhibits strong optimization convergence properties. To solve the MKL problem, a two-stage algorithm that optimizes canonical SVMs and kernel weights alternately is proposed. Since standard Newton and gradient methods are too time-consuming, we employ the truncated-Newton method to optimize the canonical SVMs. The Hessian matrix need not be stored explicitly, and the Newton direction can be computed using several Preconditioned Conjugate Gradient steps on the Hessian operator equation, the algorithm is shown more efficient than the current primal approaches in this MKL setting. Furthermore, we use the Nesterov’s optimal gradient method to optimize the kernel weights. One remarkable advantage of solving in the primal is that it achieves much faster convergence rate than solving in the dual and does not require a two-stage algorithm even for the single kernel LapSVM. Introducing the Laplacian regularizer, we also extend our primal method to semi-supervised scenario. Extensive experiments on some UCI benchmarks have shown that the proposed algorithm converges rapidly and achieves competitive accuracy.
Similar content being viewed by others
Notes
Following Argyriou, We use matlab optimizer ‘minconf’ to obtain the affine transformation and tangent distances.
References
Argyriou A, Herbster M, Pontil M (2005) Combining graph laplacians for semi-supervised learning. In: NIPS, pp 67–74
Argyriou A, Micchelli CA, Pontil M (2005) Learning convex combinations of continuously parameterized basic kernels. In: COLT, pp 338–352
Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, vol 69
Bertsekas D (1999) Nonlinear programming
Bo L, Wang L, Jiao L (2007) Recursive finite newton algorithm for support vector regression in the primal. Neural Comput 19(4):1082–1096
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5)
Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, pp 109–116
Duchi J, Shwartz SS, Singer Y, Chandra T (2008) Efficient projections onto the L1-ball for learning in high dimensions. In: ICML, pp 272–279
Gorski J, Pfeuffer F, Klamroth K (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407
Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear gauss-seidel method under convex constraints. Oper Res Lett 26(3):127–136
Keerthi SS, Chapelle O, Decoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515
Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. SIAM, Thailand
Kloft M, Brefeld U, Sonnenburg S, Laskov P, Müller K-R, Zien A (2009) Efficient and accurate Lp-Norm multiple kernel learning. In: NIPS, pp 997–1005
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72
Melacci S, Belkin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184
Nesterov YE (2003) Introductory lectures on convex optimization: a basic course, volume 87 of applied optimization. Kluwer, Boston
Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2008) Simple MKL. J Mach Learn Res 9:2491–2521
Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond (adaptive computation and machine learning). The MIT Press, Cambridge
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565
Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: ICML, pp 1065–1072
Vishwanathan SVN, Sun Z, Theera-Ampornpunt N, Varma M (December 2010) Multiple kernel learning and the SMO algorithm. In: NIPS
Zien A, Ong CS (2007) Multiclass multiple kernel learning. In: ICML, pp 1191–1198
Acknowledgments
This work is supported by NFS-China (61070033, 61100148), NSF-Guangdong (9251009001000005, S2011040004804) and the Open Project of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Chinese Ministry of Education (93K-17-2009-K04).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hao, Z., Yuan, G., Yang, X. et al. A primal method for multiple kernel learning. Neural Comput & Applic 23, 975–987 (2013). https://doi.org/10.1007/s00521-012-1022-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-012-1022-2