Localized algorithms for multiple kernel learning
Highlights
► Introduces a localized multiple kernel learning framework for kernel-based algorithms. ► Generalizes the model for different gating models, kernel functions, and applications. ► Reports the results of extensive simulations on multiple real-world data sets. ► Identifies the relevant parts of images acting as a saliency detector. ► Has inherent regularization to avoid overfitting using required number of kernels.
Introduction
Support vector machine (SVM) is a discriminative classifier based on the theory of structural risk minimization [33]. Given a sample of independent and identically distributed training instances , where and is its class label, SVM finds the linear discriminant with the maximum margin in the feature space induced by the mapping function . The discriminant function is whose parameters can be learned by solving the following quadratic optimization problem:where is the vector of weight coefficients, S is the dimensionality of the feature space obtained by , C is a predefined positive trade-off parameter between model simplicity and classification error, is the vector of slack variables, and b is the bias term of the separating hyperplane. Instead of solving this optimization problem directly, the Lagrangian dual function enables us to obtain the following dual formulation:where is the vector of dual variables corresponding to each separation constraint and the obtained kernel matrix of is positive semidefinite. Solving this, we get and the discriminant function can be written as
There are several kernel functions successfully used in the literature such as the linear kernel (kL), the polynomial kernel (kP), and the Gaussian kernel (kG)There are also kernel functions proposed for particular applications, such as natural language processing [24] and bioinformatics [31].
Selecting the kernel function and its parameters (e.g., q or s) is an important issue in training. Generally, a cross-validation procedure is used to choose the best performing kernel function among a set of kernel functions on a separate validation set different from the training set. In recent years, multiple kernel learning (MKL) methods are proposed, where we use multiple kernels instead of selecting one specific kernel function and its corresponding parameterswhere the combination function can be a linear or a nonlinear function of the input kernels. Kernel functions, , take P feature representations (not necessarily different) of data instances, where , , and Dm is the dimensionality of the corresponding feature representation.
The reasoning is similar to combining different classifiers: Instead of choosing a single kernel function and putting all our eggs in the same basket, it is better to have a set and let an algorithm do the picking or combination. There can be two uses of MKL: (i) Different kernels correspond to different notions of similarity and instead of trying to find which works best, a learning method does the picking for us, or may use a combination of them. Using a specific kernel may be a source of bias, and in allowing a learner to choose among a set of kernels, a better solution can be found. (ii) Different kernels may be using inputs coming from different representations possibly from different sources or modalities. Since these are different representations, they have different measures of similarity corresponding to different kernels. In such a case, combining kernels is one possible way to combine multiple information sources.
Since their original conception, there is significant work on the theory and application of multiple kernel learning. Fixed rules use the combination function in (1) as a fixed function of the kernels, without any training. Once we calculate the combined kernel, we train a single kernel machine using this kernel. For example, we can obtain a valid kernel by taking the summation or multiplication of two kernels as follows [10]:The summation rule is applied successfully in computational biology [27] and optical digit recognition [25] to combine two or more kernels obtained from different representations.
Instead of using a fixed combination function, we can have a function parameterized by a set of parameters and then we have a learning procedure to optimize as well. The simplest case is to parameterize the sum rule as a weighted sum with . Different versions of this approach differ in the way they put restrictions on the kernel weights [22], [4], [29], [19]. For example, we can use arbitrary weights (i.e., linear combination), nonnegative kernel weights (i.e., conic combination), or weights on a simplex (i.e., convex combination). A linear combination may be restrictive and nonlinear combinations are also possible [23], [13], [8]; our proposed approach is of this type and we will discuss these in more detail later.
We can learn the kernel combination weights using a quality measure that gives performance estimates for the kernel matrices calculated on training data. This corresponds to a function that assigns weights to kernel functions The quality measure used for determining the kernel weights could be “kernel alignment” [21], [22] or another similarity measure such as the Kullback–Leibler divergence [36]. Another possibility inspired from ensemble and boosting methods is to iteratively update the combined kernel by adding a new kernel as training continues [5], [9]. In a trained combiner parameterized by , if we assume to contain random variables with a prior, we can use a Bayesian approach. For the case of a weighted sum, we can, for example, have a prior on the kernel weights [11], [12], [28]. A recent survey of multiple kernel learning algorithms is given in [18].
This paper is organized as follows: We formulate our proposed nonlinear combination method localized MKL (LMKL) with detailed mathematical derivations in Section 2. We give our experimental results in Section 3 where we compare LMKL with MKL and single kernel SVM. In Section 4, we discuss the key properties of our proposed method together with related work in the literature. We conclude in Section 5.
Section snippets
Localized multiple kernel learning
Using a fixed unweighted or weighted sum assigns the same weight to a kernel over the whole input space. Assigning different weights to a kernel in different regions of the input space may produce a better classifier. If the data has underlying local structure, different similarity measures may be suited in different regions. We propose to divide the input space into regions using a gating function and assign combination weights to kernels in a data-dependent way [13]; in the neural network
Experiments
In this section, we report empirical performance of LMKL for classification and regression problems on several data sets and compare LMKL with SVM, SVR, and MKL (using the linear formulation of [4]). We use our own implementations1 of SVM, SVR, MKL, and LMKL written in MATLAB and the resulting optimization problems for all these methods are solved using the MOSEK optimization software [26].
Except otherwise stated, our experimental methodology
Discussion
We discuss the key properties of the proposed method and compare it with similar MKL methods in the literature.
Conclusions
This work introduces a localized multiple kernel learning framework for kernel-based algorithms. The proposed algorithm has two main ingredients: (i) a gating model that assigns weights to kernels for a data instance, (ii) a kernel-based learning algorithm with the locally combined kernel. The training of these two components is coupled and the parameters of both components are optimized together using a two-step alternating optimization procedure. We derive the learning algorithm for three
Acknowledgments
This work was supported by the Turkish Academy of Sciences in the framework of the Young Scientist Award Program under EA-TÜBA-GEBİP/2001-1-1, the Boğaziçi University Scientific Research Project 07HA101, and the Scientific and Technological Research Council of Turkey (TÜBİTAK) under Grant EEEAG 107E222. The work of M. Gönen was supported by the Ph.D. scholarship (2211) from TÜBİTAK.
Mehmet Gönen received the B.Sc. degree in industrial engineering, the M.Sc. and the Ph.D. degrees in computer engineering from Boğaziçi University, İstanbul, Turkey, in 2003, 2005, and 2010, respectively.
He was a Teaching Assistant at the Department of Computer Engineering, Boğaziçi University. He is currently doing his postdoctoral work at the Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland. His research interests include support vector
References (36)
- et al.
Supervised learning of local projection kernels
Neurocomputing
(2010) - E. Alpaydın, Selective attention for handwritten digit recognition, in: Advances in Neural Information Processing...
Combined 5×2 cv F test for comparing supervised classification learning algorithms
Neural Computation
(1999)- et al.
Local linear perceptrons for classification
IEEE Transactions on Neural Networks
(1996) - F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, in:...
- K.P. Bennett, M. Momma, M.J. Embrechts, MARK: a boosting algorithm for heterogeneous kernel models, in: Proceedings of...
- et al.
Choosing multiple parameters for support vector machines
Machine Learning
(2002) - M. Christoudias, R. Urtasun, T. Darrell, Bayesian Localized Multiple Kernel Learning, Technical Report....
- C. Cortes, M. Mohri, A. Rostamizadeh, Learning non-linear combinations of kernels, in: Advances in Neural Information...
- K. Crammer, J. Keshet, Y. Singer, Kernel design using boosting, in: Advances in Neural Information Processing Systems...
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
Multiple kernel learning algorithms
Journal of Machine Learning Research
Cited by (80)
Integrating technical indicators, chip factors and stock news for enhanced stock price predictions: A multi-kernel approach
2023, Asia Pacific Management ReviewImproving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset
2023, Computer Methods and Programs in Biomedicine UpdateLocalized multiple kernel learning using graph modularity
2022, Pattern Recognition LettersMinimum class variance multiple kernel learning
2020, Knowledge-Based SystemsA three-level Multiple-Kernel Learning approach for soil spectral analysis
2020, Neurocomputing
Mehmet Gönen received the B.Sc. degree in industrial engineering, the M.Sc. and the Ph.D. degrees in computer engineering from Boğaziçi University, İstanbul, Turkey, in 2003, 2005, and 2010, respectively.
He was a Teaching Assistant at the Department of Computer Engineering, Boğaziçi University. He is currently doing his postdoctoral work at the Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland. His research interests include support vector machines, kernel methods, Bayesian methods, optimization for machine learning, dimensionality reduction, information retrieval, and computational biology applications.
Ethem Alpaydın received his B.Sc. from the Department of Computer Engineering of Boğaziçi University in 1987 and the degree of Docteur es Sciences from Ecole Polytechnique Fédérale de Lausanne in 1990.
He did his postdoctoral work at the International Computer Science Institute, Berkeley, in 1991 and afterwards was appointed as Assistant Professor at the Department of Computer Engineering of Boğaziçi University. He was promoted to Associate Professor in 1996 and Professor in 2002 in the same department. As visiting researcher, he worked at the Department of Brain and Cognitive Sciences of MIT in 1994, the International Computer Science Institute, Berkeley, in 1997 and IDIAP, Switzerland, in 1998. He was awarded a Fulbright Senior scholarship in 1997 and received the Research Excellence Award from the Boğaziçi University Foundation in 1998 (junior level) and in 2008 (senior level), the Young Scientist Award from the Turkish Academy of Sciences in 2001 and the Scientific Encouragement Award from the Scientific and Technological Research Council of Turkey in 2002. His book Introduction to Machine Learning was published by The MIT Press in October 2004. Its German edition was published in 2008, its Chinese edition in 2009, its second edition in 2010, and its Turkish edition in 2011. He is a senior member of the IEEE, an editorial board member of The Computer Journal (Oxford University Press) and an associate editor of Pattern Recognition (Elsevier).