Skip to main content
Log in

Accelerated max-margin multiple kernel learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Kernel machines such as Support Vector Machines (SVM) have exhibited successful performance in pattern classification problems mainly due to their exploitation of potentially nonlinear affinity structures of data through the kernel functions. Hence, selecting an appropriate kernel function, equivalently learning the kernel parameters accurately, has a crucial impact on the classification performance of the kernel machines. In this paper we consider the problem of learning a kernel matrix in a binary classification setup, where the hypothesis kernel family is represented as a convex hull of fixed basis kernels. While many existing approaches involve computationally intensive quadratic or semi-definite optimization, we propose novel kernel learning algorithms based on large margin estimation of Parzen window classifiers. The optimization is cast as instances of linear programming. This significantly reduces the complexity of the kernel learning compared to existing methods, while our large margin based formulation provides tight upper bounds on the generalization error. We empirically demonstrate that the new kernel learning methods maintain or improve the accuracy of the existing classification algorithms while significantly reducing the learning time on many real datasets in both supervised and semi-supervised settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. A procedure of turning a matrix into a vector by concatenating the columns of the matrix from left to right.

  2. Refer to Theorem 17 of [18] for further details.

  3. The sign of f(x) determines the class label, while the magnitude indicates how confident it is.

  4. Here we assume that the kernel functions are properly normalized to have probability masses 1.

  5. http://asi.insa-rouen.fr/enseignants/~arakotom/code/mklindex.html.

  6. http://www.cse.cuhk.edu.hk/~zlxu/toolbox/level_mkl.html.

  7. http://www.vision.ee.ethz.ch/~pgehler/.

  8. Strictly saying, one should contrast with the learning time of the SMM, for instance, run by the SDP3 solver for a fair comparison. The SimpleMKL is much faster than the SDP3-solved SMM (e.g., for the Sonar dataset in Table 2, the SDP3 solver for SMM recorded 105.72 secs for the coarse set and 490.13 secs for the fine set). Despite this advantage of SMM, we will demonstrate that our algorithms are significantly faster than SMM.

  9. We also evaluated the performance using the Parzen window classifier (PWC), where the classification performance of the PWC appears slightly below but not statistically different from that of the SVM classifier. SVM classifier, on the other hand, leads to a more compact and computationally efficient representation. Although we used a simple PWC model in our kernel learning framework, the classification performance based on the learned kernels does not degrade as long as the same classifier family is used.

  10. We have similar results for the coarse sets.

References

  1. Andrews S, Tsochantaridis I, Hofmann T (2003) Support vector machines for multiple-instance learning. In: Neural information processing systems

    Google Scholar 

  2. Asuncion A, Newman D (2007) UCI machine learning repository

  3. Babich GA, Camps OI (1996) Weighted Parzen windows for pattern classification. IEEE Trans Pattern Anal Mach Intell 18(5):567–570

    Article  Google Scholar 

  4. Bach F (2008) Exploring large feature spaces with hierarchical multiple kernel learning. In: Neural information processing systems, pp 105–112

    Google Scholar 

  5. Bach F, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: International conference on machine learning

    Google Scholar 

  6. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  7. Bi J, Fung G, Dundar M, Rao B (2005) Semi-supervised mixture of kernels via lpboost methods. In: International conference on data mining

    Google Scholar 

  8. Bi J, Zhang T, Bennet KP (2004) Column-generation boosting methods for mixture of kernels. In: SIGKDD

    Google Scholar 

  9. Cañete A, Constanzo J, Salinas L (2008) Kernel price pattern trading. Appl Intell 29(2):152–156

    Article  Google Scholar 

  10. Cristianini N, Shawe-Taylor J, Elisseeff A (2001) On kernel-target alignment. In: Neural information processing systems

    Google Scholar 

  11. Demiriz A, Bennett KP, Shawe-Taylor J (2002) Linear programming boosting via column generation. Mach Learn 46(1–3):225–254

    Article  MATH  Google Scholar 

  12. Dioşan L, Rogozan A, Pecuchet JP (2010) Improving classification performance of support vector machine by genetically optimising kernel shape and hyper-parameters. Applied Intelligence, 1–15. doi:10.1007/s10489-010-0260-1

  13. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York

    MATH  Google Scholar 

  14. Fumera G, Roli F (2005) A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 27(6):942–956

    Article  Google Scholar 

  15. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: International conference on computer vision

    Google Scholar 

  16. Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: International conference on machine learning

    Google Scholar 

  17. Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  18. Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5:27–72

    MATH  Google Scholar 

  19. Lee LH, Wan CH, Rajkumar R, Isa D (2011) An enhanced support vector machine classification framework by using Euclidean distance function for text document categorization. Appl Intell. doi:10.1007/s10489-011-0314-z

    Google Scholar 

  20. Nanni L, Lumini A (2005) Ensemble of Parzen window classifiers for on-line signature verification. Neurocomputing 68:217–224

    Article  Google Scholar 

  21. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) Simple MKL. J Mach Learn Res 9:2491–2521

    MathSciNet  MATH  Google Scholar 

  22. Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686

    Article  MathSciNet  MATH  Google Scholar 

  23. Schölkopf B, Herbrich R, Smola A (2001) A generalized representer theorem. Comput Learn Theor 2111:416–426

    Article  Google Scholar 

  24. Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  25. Smola A, Kondor R (2003) Kernels and regularization on graphs. In: Annual conference on learning theory (COLT)

    Google Scholar 

  26. Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  27. Tsivtsivadze E, Pahikkala T, Boberg J, Salakoski T (2009) Locality kernels for sequential data and their applications to parse ranking. Appl Intell 31(1):81–88

    Article  Google Scholar 

  28. Tutuncu RH, Toh KC, Todd MJ (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math Program, Ser A 95:189–217

    Article  MathSciNet  Google Scholar 

  29. Varma M, Babu BR (2009) More generality in efficient multiple kernel learning. In: International conference on machine learning

    Google Scholar 

  30. Wang J, Lu H, Plataniotis K, Lu J (2009) Gaussian kernel optimization for pattern classification. Pattern Recognit 42(7):1237–1247

    Article  MATH  Google Scholar 

  31. Williams CKI, Rasmussen CE (1996) Gaussian processes for regression. In: Neural information processing systems

    Google Scholar 

  32. Xu Z, Jin R, King I, Lyu MR (2008) An extended level method for efficient multiple kernel learning. In: Neural information processing systems, pp 1825–1832

    Google Scholar 

  33. Yeung DY, Chow C (2002) Parzen-window network intrusion detectors. In: Proceedings of the sixteenth international conference on pattern recognition, pp 385–388

    Google Scholar 

  34. Zhang D, Chen S, Zhou ZH (2006) Learning the kernel parameters in kernel minimum distance classifier. Pattern Recognit 39(1):133–135

    Article  MATH  Google Scholar 

  35. Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: International conference on machine learning

    Google Scholar 

  36. Zhu X, Kandola J, Ghahramani Z, Lafferty J (2004) Nonparametric transforms of graph kernels for semi-supervised learning. In: Neural information processing systems

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minyoung Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M. Accelerated max-margin multiple kernel learning. Appl Intell 38, 45–57 (2013). https://doi.org/10.1007/s10489-012-0356-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0356-x

Keywords

Navigation