Abstract
The success of kernel methods is very much dependent on the choice of kernel. Kernel design and learning a kernel from the data require evaluation measures to assess the quality of the kernel. In recent years, the notion of kernel alignment, which measures the degree of agreement between a kernel and a learning task, is widely used for kernel selection due to its effectiveness and low computational complexity. In this paper, we present an overview of the research progress of kernel alignment and its applications. We introduce the basic idea of kernel alignment and its theoretical properties, as well as the extensions and improvements for specific learning problems. The typical applications, including kernel parameter tuning, multiple kernel learning, spectral kernel learning and feature selection and extraction, are reviewed in the context of classification framework. The relationship between kernel alignment and other evaluation measures is also explored. Finally, concluding remarks and future directions are presented.
Similar content being viewed by others
References
Amayri O, Bouguila N (2010) A study of spam filtering using support vector machines. Artif Intell Revi 34(1): 73–108
Baram Y (2005) Learning by kernel polarization. Neural Comput 17(6): 1264–1275
Camargo JE, González FA (2009) A multi-class kernel alignment method for image collection summarization. In: Proceedings of the 14th Iberoamerican conference on pattern recognition: progress in pattern recognition, image analysis, computer vision, and applications, Guadalajara, Mexico, pp 545–552
Chapelle O, Vapnik V, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1): 131–159
Chapelle O, Zien A, Schölkopf B (2006) Semi-supervised learning. MIT Press, Cambridge, MA
Chen B, Liu H, Bao Z (2008) A kernel optimization method based on the localized kernel Fisher criterion. Pattern Recognit 41(3): 1098–1109
Chudzian P (2012) Evaluation measures for kernel optimization. Pattern Recognit Lett 33(9): 1108–1116
Chung KM, Kao WC, Sun T, Wang LL, Lin CJ (2003) Radius margin bounds for support vector machines with the RBF kernel. Neural Comput 15(11): 2463–2681
Cortes C, Mohri M, Rostamizadeh A (2012) Algorithms for learning kernels based on centered alignment. J Mach Learn Res 13: 795–828
Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola J (2001) On kernel-target alignment. In: Dietterich TG, Becker S, Ghahraman Z (eds) Advances in Neural Information Processing Systems 14, MIT Press, Cambridge, MA, pp 367–373
Ding S, Zhu H, Jia W, Su C (2012) A survey on feature extraction for pattern recognition. Artif Intell Rev 37(3): 169–180
Duan KB, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51: 41–59
Girolami M, Rogers S (2005) Hierarchic Bayesian models for kernel learning. In: Proceedings of the 22nd international conference on machine learning, Bonn, Germany, pp 241–248
Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12: 2211–2268
Guermeur Y, Lifchitz A, Vert R (2004) A kernel for protein secondary structure prediction. In: Schölkopf B, Tsuda K, Vert JP (eds) Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, pp 193–206
Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, USA, pp 187–196
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2): 415–425
Igel C, Glasmachers T, Mersch B, Pfeifer N, Meinicke P (2007) Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection. IEEE/ACM Trans Comput Biol Bioinform 4(2): 216–226
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1): 4–37
Kandola J, Shawe-Taylor J, Cristianini N (2002a) On the extensions of kernel alignment. Technical report 120, Department of Computer Science, University of London
Kandola J, Shawe-Taylor J, Cristianini N (2002b) Optimizing kernel alignment over combinations of kernels. Technical report 121, Department of Computer Science, University of London
Kawanabe M, Nakajima S, Binder A (2009) A procedure of adaptive kernel combination with kernel-target alignment for object classification. In: Proceedings of the 8th ACM international conference on image and video retrieval, Santorini Island, Greece
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 15(7): 1667–1689
Lanckriet GRG, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J Mach Learn Res 5: 27–72
Lee SW, Bien Z (2010) Representation of Fisher criterion function in a kernel feature space. IEEE Trans Neural Netw 21(2): 333–339
Liu Y, Liao S, Hou Y (2011) Learning kernels with upper bounds of leave-one-out error. In: Proceedings of the 20th ACM conference on information and knowledge management, Glasgow, UK, pp 2205–2208
Meila M (2003) Data centering in feature space. In: Proceedings of the 9th international workshop on artificial intelligence and statistics, Key West, USA
Neumann J, Schnörr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61(1–3): 129–150
Nguyen CH, Ho TB (2008) An efficient kernel matrix evaluation measure. Pattern Recognit 41(11): 3366–3372
Ong CS, Williamson RC (2005) Learning the kernel with hyperkernels. J Mach Learn Res 6: 1043–1071
Pothin J-B, Richard C (2006) A greedy algorithm for optimizing the kernel alignment and the performance of kernel machines. In: Proceedings of 14th European signal processing conference, Florence, Italy, pp 4–8
Pothin J-B, Richard C (2007) Optimal feature representation for kernel machines using kernel-target alignment criterion. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Honolulu, USA, vol 3, pp 1065–1068
Pothin J-B, Richard C (2008) Optimizing kernel alignment by data translation in feature space. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Las Vegas, USA, pp 3345–3348
Qiu S, Lane T (2009) A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction. IEEE/ACM Trans Comput Biol Bioinf 6(2): 190–199
Ramona M, Richard G, David B (2012, to appear) Multiclass feature selection with kernel Gram-matrix-based criteria. IEEE Trans Neural Netw Learn Syst
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5: 101–141
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6: 1453–1484
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vert R (2002) Designing a m-SVM kernel for protein secondary structure prediction. Master’s thesis, DEA Informatique de Lorraine
Wang L (2008) Feature selection with kernel class separability. IEEE Trans Pattern Anal Mach Intell 30(9): 1534–1546
Wang L, Xue P, Chan KL (2008) Two criteria for model selection in multiclass support vector machines. IEEE Trans Syst Man Cybern B Cybern 38(6): 1432–1448
Wang T, Tian S, Huang H, Deng D (2009) Learning by local kernel polarization. Neurocomputing 72(13–15): 3077–3084
Wong WWL, Burkowski FJ (2009) Using kernel alignment to select features of molecular descriptors in a QSAR study. IEEE/ACM Trans Comput Biol Bioinform 8(5): 1373–1384
Wu M, Farquhar J (2007) A subspace kernel for nonlinear feature extraction. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 1125–1130
Xiong H, Swamy MNS, Ahmad MO (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2): 460–474
Zhu X, Kandola J, Ghahramani Z, Lafferty J (2004) Nonparametric transforms of graph kernels for semi-supervised learning. In: Saul LK, Weiss Y, Bottou L (eds) Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, T., Zhao, D. & Tian, S. An overview of kernel alignment and its applications. Artif Intell Rev 43, 179–192 (2015). https://doi.org/10.1007/s10462-012-9369-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-012-9369-4