Skip to main content
Log in

A primal method for multiple kernel learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The canonical support vector machines (SVMs) are based on a single kernel, recent publications have shown that using multiple kernels instead of a single one can enhance interpretability of the decision function and promote classification accuracy. However, most of existing approaches mainly reformulate the multiple kernel learning as a saddle point optimization problem which concentrates on solving the dual. In this paper, we show that the multiple kernel learning (MKL) problem can be reformulated as a BiConvex optimization and can also be solved in the primal. While the saddle point method still lacks convergence results, our proposed method exhibits strong optimization convergence properties. To solve the MKL problem, a two-stage algorithm that optimizes canonical SVMs and kernel weights alternately is proposed. Since standard Newton and gradient methods are too time-consuming, we employ the truncated-Newton method to optimize the canonical SVMs. The Hessian matrix need not be stored explicitly, and the Newton direction can be computed using several Preconditioned Conjugate Gradient steps on the Hessian operator equation, the algorithm is shown more efficient than the current primal approaches in this MKL setting. Furthermore, we use the Nesterov’s optimal gradient method to optimize the kernel weights. One remarkable advantage of solving in the primal is that it achieves much faster convergence rate than solving in the dual and does not require a two-stage algorithm even for the single kernel LapSVM. Introducing the Laplacian regularizer, we also extend our primal method to semi-supervised scenario. Extensive experiments on some UCI benchmarks have shown that the proposed algorithm converges rapidly and achieves competitive accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://asi.insa-rouen.fr/enseignants/~arakotom.

  2. http://research.microsoft.com/en-us/um/people/manik/.

  3. http://ttic.uchicago.edu/~argyriou/code/index.html

  4. http://www.dii.unisi.it/~melacci/lapsvmp/

  5. Following Argyriou, We use matlab optimizer ‘minconf’ to obtain the affine transformation and tangent distances.

References

  1. Argyriou A, Herbster M, Pontil M (2005) Combining graph laplacians for semi-supervised learning. In: NIPS, pp 67–74

  2. Argyriou A, Micchelli CA, Pontil M (2005) Learning convex combinations of continuously parameterized basic kernels. In: COLT, pp 338–352

  3. Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9:1179–1225

    MathSciNet  MATH  Google Scholar 

  4. Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML, vol 69

  5. Bertsekas D (1999) Nonlinear programming

  6. Bo L, Wang L, Jiao L (2007) Recursive finite newton algorithm for support vector regression in the primal. Neural Comput 19(4):1082–1096

    Article  MathSciNet  MATH  Google Scholar 

  7. Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5)

  8. Cortes C, Mohri M, Rostamizadeh A (2009) L2 regularization for learning kernels. In: UAI, pp 109–116

  9. Duchi J, Shwartz SS, Singer Y, Chandra T (2008) Efficient projections onto the L1-ball for learning in high dimensions. In: ICML, pp 272–279

  10. Gorski J, Pfeuffer F, Klamroth K (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407

    Article  MathSciNet  MATH  Google Scholar 

  11. Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear gauss-seidel method under convex constraints. Oper Res Lett 26(3):127–136

    Article  MathSciNet  MATH  Google Scholar 

  12. Keerthi SS, Chapelle O, Decoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515

    MathSciNet  MATH  Google Scholar 

  13. Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. SIAM, Thailand

    Book  Google Scholar 

  14. Kloft M, Brefeld U, Sonnenburg S, Laskov P, Müller K-R, Zien A (2009) Efficient and accurate Lp-Norm multiple kernel learning. In: NIPS, pp 997–1005

  15. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MATH  Google Scholar 

  16. Melacci S, Belkin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184

    MathSciNet  Google Scholar 

  17. Nesterov YE (2003) Introductory lectures on convex optimization: a basic course, volume 87 of applied optimization. Kluwer, Boston

    Google Scholar 

  18. Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2008) Simple MKL. J Mach Learn Res 9:2491–2521

    MathSciNet  MATH  Google Scholar 

  19. Schölkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond (adaptive computation and machine learning). The MIT Press, Cambridge

    Google Scholar 

  20. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  21. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  22. Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: ICML, pp 1065–1072

  23. Vishwanathan SVN, Sun Z, Theera-Ampornpunt N, Varma M (December 2010) Multiple kernel learning and the SMO algorithm. In: NIPS

  24. Zien A, Ong CS (2007) Multiclass multiple kernel learning. In: ICML, pp 1191–1198

Download references

Acknowledgments

This work is supported by NFS-China (61070033, 61100148), NSF-Guangdong (9251009001000005, S2011040004804) and the Open Project of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Chinese Ministry of Education (93K-17-2009-K04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ganzhao Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hao, Z., Yuan, G., Yang, X. et al. A primal method for multiple kernel learning. Neural Comput & Applic 23, 975–987 (2013). https://doi.org/10.1007/s00521-012-1022-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1022-2

Keywords

Navigation