A survey of the state of the art in learning the kernels

Abbasnejad, M. Ehsan; Ramachandram, Dhanesh; Mandava, Rajeswari

doi:10.1007/s10115-011-0404-6

A survey of the state of the art in learning the kernels

Survey Paper
Published: 19 May 2011

Volume 31, pages 193–221, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

M. Ehsan Abbasnejad¹,
Dhanesh Ramachandram¹ &
Rajeswari Mandava¹

940 Accesses
31 Citations
Explore all metrics

Abstract

In recent years, the machine learning community has witnessed a tremendous growth in the development of kernel-based learning algorithms. However, the performance of this class of algorithms greatly depends on the choice of the kernel function. Kernel function implicitly represents the inner product between a pair of points of a dataset in a higher dimensional space. This inner product amounts to the similarity between points and provides a solid foundation for nonlinear analysis in kernel-based learning algorithms. The most important challenge in kernel-based learning is the selection of an appropriate kernel for a given dataset. To remedy this problem, algorithms to learn the kernel have recently been proposed. These methods formulate a learning algorithm that finds an optimal kernel for a given dataset. In this paper, we present an overview of these algorithms and provide a comparison of various approaches to find an optimal kernel. Furthermore, a list of pivotal issues that lead to efficient design of such algorithms will be presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbasnejad ME, Ramachandram D, Mandava R (2009) Optimizing kernel functions using transfer learning from unlabeled data. In: Machine Vision, 2009. ICMV ’09. Second international conference on, pp 111–117. doi:10.1109/ICMV.2009.10
Abbasnejad ME, Ramachandram D, Mandava R (2010) An unsupervised approach to learn the kernel functions: from global influence to local similarity. Neural Comput Appl. doi:10.1007/s00521-010-0411-7
Adankon MM, Cheriet M (2007) Optimizing resources in model selection for support vector machine. Pattern Recognit 40(3): 953–963. doi:10.1016/j.patcog.2006.06.012
Article MATH Google Scholar
Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernal functions. Neural Netw 12: 783–789. doi:10.1016/S0893-6080(99)00032-5
Article Google Scholar
Argyriou A, Micchelli CA, Pontil M (2005) Learning convex combinations of continuously parameterized basic kernels. In: Auer P, Meir R (eds) Learning theory, Lecture Notes in Computer Science, vol 3559.. Springer, Heidelberg, pp 338–352. doi:10.1007/11503415_23
Google Scholar
Argyriou A, Hauser R, Micchelli CA, Pontil M (2006) A dc-programming algorithm for kernel selection. In: ICML ’06: Proceedings of the 23rd international conference on machine learning, ACM, New York, NY, USA, pp 41–48. doi:10.1145/1143844.1143850
Argyriou A, Evgeniou T, Pontil M, Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19.. MIT Press, Cambridge, MA, pp 41–48
Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Ame Math Soc 68: 337–404
Article MathSciNet MATH Google Scholar
Ayat N, Cheriet M, Suen C (2005) Automatic model selection for the optimization of svm kernels. Pattern Recognit 38(10): 1733–1745. doi:10.1016/j.patcog.2005.03.011
Article Google Scholar
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the smo algorithm. In: ICML ’04: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, USA. doi:10.1145/1015330.1015424
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2: 125–137
MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York
MATH Google Scholar
Burges CJC (1999) Geometry and invariance in kernel based methods. MIT Press, Cambridge, pp 89–116
Google Scholar
Caruana R (1997) Multitask learning. Mach Learn 28(1): 41–75
Article MathSciNet Google Scholar
Chen B, Liu H, Bao Z (2008) A kernel optimization method based on the localized kernel fisher criterion. Pattern Recognit 41(3): 1098–1109. doi:10.1016/j.patcog.2007.08.009
Article MATH Google Scholar
Collins M, Schapire RE, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 48(1–3): 253–285
Article MATH Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) NIPS: advances in neural information processing systems, vol 22, pp 396–404
Cristianini N, Shawe-taylor J, Elissee A, Kandola J (2002) On kernel-target alignment. In: Advances in neural information processing systems, vol 14, MIT Press, pp 367–373
Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J (2003) On optimizing kernel alignment. Tech. rep., UC Davis Department of Statistics
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: ICML ’07: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, USA, pp 209–216. doi:10.1145/1273496.1273523
Domeniconi C, Peng J, Yan B (2010) Composite kernels for semi-supervised clustering. Knowl Inf Syst, pp 1–18
Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning svm hyperparameters. Neurocomputing 51: 41–59. doi:10.1016/S0925-2312(02)00601-X
Article Google Scholar
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. JMLR Org 6: 615–637
MathSciNet MATH Google Scholar
Florian A, Potra SJW (2000) Interior-point methods. J Comput Appl Math 124(1–2): 281–302. doi:10.1016/S0377-0427(00)00433-7
MathSciNet MATH Google Scholar
Freund RM, Mizuno S (1996) Interior point methods: current status and future directions. Working papers 3924-96., Massachusetts Institute of Technology (MIT), Sloan School of Management. http://ideas.repec.org/p/mit/sloanp/2634.html
Fung G, Rosales R, Rao RB (2008) Feature selection and kernel design via linear programming. In: IJCAI’07: Proceedings of the 20th international joint conference on artifical intelligence, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 786–791
Gang Wu EYC (2003) Adaptive feature-space conformal transformation for imbalanced-data learning. In: Proceedings of IEEE international conference on machine learning, pp 816–823
Gehler P, Nowozin S (2008) Infinite kernel learning. Tech. rep., Max Planck Institute For Biological Cybernetics, Tuebingen, Germany
Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs. Recent Adv Learn Control 371: 95–110. doi:10.1007/978-1-84800-155-8_7
Article MathSciNet Google Scholar
Grant M, Boyd S (2009) Cvx: Matlab software for disciplined convex programming. web page and software. http://stanford.edu/~boyd/cvx
Herbster M, Pontil M, Wainer L (2005) Online learning over graphs. In: ICML ’05: Proceedings of the 22nd international conference on machine learning, ACM, New York, NY, USA, pp 305–312. doi:10.1145/1102351.1102390
Hertz T, Hillel AB, Weinshall D (2006) Learning a kernel function for classification with small training samples. In: ICML ’06: Proceedings of the 23rd international conference on machine learning, ACM, New York, NY, USA, pp 401–408. doi:10.1145/1143844.1143895
Hoi SCH, Jin R (2008) Active kernel learning. In: ICML ’08: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, USA, pp 400–407. doi:10.1145/1390156.1390207
Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 187–196. doi:10.1145/1150402.1150426
Hoi SCH, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. In: ICML ’07: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, USA, pp 361–368. doi:10.1145/1273496.1273542
Horst R, Thoai NV (1999) Dc programming: overview. J Optim Theory Appl 103(1): 1–4310.1023/A:1021765131316
Article MathSciNet Google Scholar
Huang K, Ying Y, Campbell C (2010) Generalized sparse metric learning with relative comparisons. Knowl Inf Syst
Jaakkola TS, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Proceedings of the 1998 conference on advances in neural information processing systems II, MIT Press, Cambridge, MA, USA, pp 487–493
Jebara T (2004) Multi-task feature and kernel selection for svms. In: ICML ’04: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, USA. doi:10.1145/1015330.1015426
Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5: 819–844
MathSciNet MATH Google Scholar
Jieping Ye JC, Shuiwang Ji (2007) Multi-class discriminant kernel learning via convex programming. J Mach Learn Res 9: 719–758
Google Scholar
Joseph KC, Keshet J, Singer Y (2002) Kernel design using boosting. In: Advances in Neural Information Processing Systems, vol 15, MIT Press, pp 537–544
Kandola J, Shawe-Taylor J (2002a) On the extensions of kernel alignment. Tech. Rep. NC-TR-2002-120, University of Southampton (United Kingdom). http://eprints.ecs.soton.ac.uk/9745/
Kandola J , Shawe-Taylor J (2002b) Optimizing kernel alignment over combinations of kernel. Tech. Rep. NC-TR-2002-121. http://eprints.ecs.soton.ac.uk/9746/
Kapoor A, Alan Qi Y, Ahn H, Picard RW (2005) Hyperparameter and kernel learning for graph based semi-supervised classification
Kim SJ, Magnani A, Boyd S (2006) Optimal kernel selection in kernel fisher discriminant analysis. In: ICML ’06: Proceedings of the 23rd international conference on machine learning, ACM, New York, NY, USA, pp 465–472. doi:10.1145/1143844.1143903
Kim SJ, Zymnis A, Magnani A, Koh K, Boyd S (2008) Learning the kernel via convex optimization. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on, pp 1997–2000. doi:10.1109/ICASSP.2008.4518030
Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Proceedings of the ICML ’02: Proceedings of the 23rd international conference on machine learning, pp 315–322
Kristin P, Bennett MJE Michinari M (2002) Mark: a boosting algorithm for heterogeneous kernel models. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, pp 24–31. doi:10.1145/775047.775051
Kulis B, Sustik M, Dhillon I (2006) Learning low-rank kernel matrices. In: ICML ’06: Proceedings of the 23rd international conference on machine learning, ACM, New York, NY, USA, pp 505–512. doi:10.1145/1143844.1143908
Kulis B, Sustik MA, Dhillon IS (2009) Low-rank kernel learning with bregman matrix divergences. J Mach Learn Res 10: 341–376
MathSciNet Google Scholar
Lanckriet G, Deng M, Cristianini N, Jordan MI, Noble WS (2004a) Kernel-based data fusion and its application to protein function prediction in yeast. Pac Symp Biocomput, vol 11
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5: 27–72
MATH Google Scholar
Li F, Fu Y, Dai YH, Sminchisescu C, Jue W (2009) Kernel learning by unconstrained optimization. J Mach Learn Res 5: 328–335
Google Scholar
Lu J, Plataniotis KN, Venetsanopoulos AN (2005) Kernel discriminant learning with application to face recognition. In: Wang L (ed) Support vector machines: theory and applications, Springer, Heidelberg, Studies in Fuzziness and Soft Computing, pp 275–296. doi:10.1007/10984697_13
MacKay JCD (1997) Introduction to gaussian processes. NATO ASI Ser F: Comput Syst Sci 168: 33–165
Google Scholar
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc 209: 415–446. doi:10.1098%2Frsta.1909.0016
Article MATH Google Scholar
Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6: 1099–1125
MathSciNet MATH Google Scholar
Micchelli CA, Pontil M (2007) Feature space perspectives for learning the kernel. Mach Learn 66(2–3): 297–319. doi:10.1007/s10994-006-0679-0
Article Google Scholar
Mika S, Ratsch G, Weston J, Scholkopf B, Mullers K (1999) Fisher discriminant analysis with kernels pp 41–48. doi:10.1109/NNSP.1999.788121
Moreno PJ, Ho P, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classification in multimedia applications. In: Thrun S, Saul LK, Schölkopf B, Thrun S, Saul LK, Schölkopf B (eds) NIPS, MIT Press. http://dblp.uni-trier.de/rec/bibtex/conf/nips/MorenoHV03
Nguyen CH, Ho TB (2008) An efficient kernel matrix evaluation measure. Pattern Recognit 41(11): 3366–3372. doi:10.1016/j.patcog.2008.04.005
Article MATH Google Scholar
Platt JC (1999a) Fast training of support vector machines using sequential minimal optimization. Adv Kernel Methods Support Vector Learn, pp 185–208
Platt JC (1999b) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif, pp 61–74
Raina R, Battle A, Lee H (2007) Self-taught learning: transfer learning from unlabeled data. 24th International conference on machine learning
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: ICML ’07: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, USA, pp 775–782. doi:10.1145/1273496.1273594
Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2008) Simplemkl. J Mach Learn Res 9: 2491–2521
MathSciNet MATH Google Scholar
Rasmussen CE (2003) Gaussian processes in machine learning 3176:63–71 doi:10.1007/978-3-540-28650-9_4
Rückert U, Kramer S (2008) Kernel-based inductive transfer. In: Machine learning and knowledge discovery in databases, Springer, Berlin, Heidelberg, pp 220–233. doi:10.1007/978-3-540-87481-2_15
Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables. In: ICML ’98: Proceedings of the fifteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 515–521
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. J Mach Learn 37(3): 297–336. doi:10.1023/A:1007614523901
Article MATH Google Scholar
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp 10(5):1299–1319 http://neco.mitpress.org/cgi/content/abstract/10/5/1299
Google Scholar
Shaw B, Jebara T (2009) Structure preserving embedding. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning, ACM, New York, NY, USA, pp 937–944. doi:10.1145/1553374.1553494
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Book Google Scholar
Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: from transductive to semi-supervised learning. In: ICML ’05: Proceedings of the 22nd international conference on machine learning, ACM, New York, NY, USA, pp 824–831. doi:10.1145/1102351.1102455
Smola A, Hofmann T, Scholkopf B (2007) Kernel methods in machine learning. Annal Stat 36:1171–1220. http://eprints.pascal-network.org/archive/00003984/
Google Scholar
Szafranski M, Grandvalet Y, Rakotomamonjy A (2008) Composite kernel learning. In: ICML ’08: Proceedings of the 25th international conference on machine learning, ACM, New York, NY, USA, pp 1040–1047. doi:10.1145/1390156.1390287
Tsuda K, Noble WS (2004) Learning kernels from biological networks by maximizing entropy. Bioinformatics 20(1): 326–333. doi:10.1093/bioinformatics/bth906
Article Google Scholar
Tsuda K, Kin T, Asai K (2002) Marginalized kernels for biological sequences. Bioinformatics (Oxford, England) 19: 2149–2156. doi:10.1093/bioinformatics/18.suppl_1.S268
Google Scholar
Vapnik V (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5): 988–999. doi:10.1109/72.788640
Article Google Scholar
Wang G, Yeung DY, Lochovsky FH (2007) A kernel path algorithm for support vector machines. In: ICML ’07: Proceedings of the 24th international conference on machine learning, ACM, New York, NY, USA, pp 951–958. doi:10.1145/1273496.1273616
Wang J, Lu H, Plataniotis K, Lu J (2009) Gaussian kernel optimization for pattern classification. Pattern Recognit 42(7): 1237–1247. doi:10.1016/j.patcog.2008.11.024
Article MATH Google Scholar
Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix for nonlinear dimensionality reduction. In: ICML ’04: Proceedings of the twenty-first international conference on machine learning, ACM, New York, NY, USA
Weinberger KQ, Packer BD, Saul LK (2005) Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 381–388
Williams CKI, Barber D (1998) Bayesian classification with gaussian processes. IEEE Trans Pattern Anal Mach Intell 20(12): 1342–1351. doi:10.1109/34.735807
Article Google Scholar
Williams P, Li S, Feng J, Wu S (2007) A geometrical method to improve performance of the support vector machine. IEEE Trans Neural Netw 18(3): 942–947. doi:10.1109/TNN.2007.891625
Article Google Scholar
Wu S, Amari SI (2002) Conformal transformation of kernel functions: a data-dependent way to improve support vector machine classifiers. Neural Process Lett 15(1): 59–67. doi:10.1023/A:1013848912046
Article MATH Google Scholar
Xiong H, Swamy M, Ahmad M (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474. doi:10.1109/TNN.2004.841784
Article Google Scholar
Xu Z, Zhu J, Lyu MR, King I (2007) Maximum margin based semi-supervised spectral kernel learning. In: Proceedings of international joint conference on neural networks, Orlando, Florida, USA
Xu Z, Jin R, Yang H, King I, Lyu MR (2010) Simple and efficient multiple kernel learning by group lasso. In: ICML ’10: Proceedings of the 27th international conference on machine learning, ACM
Yang J, Cheung WK, Chen X (2008) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst
Yang L, Jin R (May, 2006) Distance metric learning: a comprehensive survey. Department of Computer Science and Engineering, Michigan State University
Yeung DY, Chang H, Dai G (2007) Learning the kernel matrix by maximizing a kfd-based class separability criterion. Pattern Recognit 40(7): 2021–2028. doi:10.1016/j.patcog.2006.12.031
Article MATH Google Scholar
Yeung DY, Chang H, Dai G (2008) A scalable kernel-based semisupervised metric learning algorithm with out-of-sample generalization ability. Neural Comput 20(11): 2839–2861. doi:10.1162/neco.2008.05-07-528
Article MATH Google Scholar
Yuan M, Yuan M, Lin Y, Lin Y (2006) Model selection and estimation in regression with grouped variables. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.2062
Zhang Z, Yeung DY, Kwok JT (2004) Bayesian inference for transductive learning of kernel matrix using the tanner-wong data augmentation algorithm. In: Brodley CE (ed) ICML, machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004, ACM, ACM International Conference Proceeding Series, vol 69
Zhang Z, Kwok JT, Yeung DY (2006) Model-based transductive learning of the kernel matrix. Mach Learn 63(1): 69–101
Article MATH Google Scholar
Zhang Z, Yeung DY, , Kwok JT, Kwok JT (2007) Probabilistic kernel matrix learning with a mixture model of kernels
Zhengdong Lu PJ, Dhillon I (2009) Geometry-aware metric learning. In: ICML ’09: Proceedings of the 26nd international conference on machine learning, ACM
Zhu X, Kandola J, Ghahramani Z, Lafferty J (2005) Nonparametric transforms of graph kernels for semi-supervised learning. In: Nonparametric transforms of graph kernels for semi-supervised learning, no. 17 in advances in neural information processing systems
Zhuang J, Tsang IW, Hoi SCH (2009) Simplenpkl: simple non-parametric kernel learning. In: ICML ’09: Proceedings of the 26th annual international conference on machine learning, ACM, New York, NY, USA, pp 1273–1280. doi:10.1145/1553374.1553537

Download references

Author information

Authors and Affiliations

Computer Vision Research Group, School of Computer Sciences, Universiti Sains Malaysia, Penang, Malaysia
M. Ehsan Abbasnejad, Dhanesh Ramachandram & Rajeswari Mandava

Authors

M. Ehsan Abbasnejad
View author publications
You can also search for this author in PubMed Google Scholar
Dhanesh Ramachandram
View author publications
You can also search for this author in PubMed Google Scholar
Rajeswari Mandava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Ehsan Abbasnejad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasnejad, M.E., Ramachandram, D. & Mandava, R. A survey of the state of the art in learning the kernels. Knowl Inf Syst 31, 193–221 (2012). https://doi.org/10.1007/s10115-011-0404-6

Download citation

Received: 10 October 2010
Revised: 18 March 2011
Accepted: 19 April 2011
Published: 19 May 2011
Issue Date: May 2012
DOI: https://doi.org/10.1007/s10115-011-0404-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of the state of the art in learning the kernels

Abstract

Access this article

Similar content being viewed by others

Continuous Kernel Learning

Major advancements in kernel function approximation

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of the state of the art in learning the kernels

Abstract

Access this article

Similar content being viewed by others

Continuous Kernel Learning

Major advancements in kernel function approximation

Nyström-SGD: Fast Learning of Kernel-Classifiers with Conditioned Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation