A new scheme to learn a kernel in regularization networks
Introduction
Kernel-based methods become very popular in the community of machine learning in recent years. Roughly speaking, a kernel-based method is constructing a nonlinear map from the data set to a Hilbert space (feature space), then building a linear algorithm in the feature space to implement the nonlinear counterpart in the data set. In many problems of machine learning, such as regression, classification and dimensionality reduction, kernel-based methods are very successful. However, the results depend on kernel severely, therefore how to select a ‘good’ kernel is a critical issue for all kernel-based methods. In this direction, there are some progresses. In general, the existing works can be divided into three classes based on tasks. Rayleigh coefficients play a key role in classification problems (see [7], [8], [12], [20]), as for regression, we recommend Micchelli and Pontil's works (see [2], [12]) and for dimensionality reduction, the celebrated work refers to paper [17]. In these works, convex optimization techniques, especially semidefinite programming, are basic tools (see [4], [15]). At the last of this paragraph, we want to mention that statistical generalization analysis of learning the kernel problem is also important. Recently, Ying and Campbell developed a novel generalization bound for learning the kernel problem, especially established satisfactory excess generalization bounds and misclassification error rates for learning Gaussian kernels (see [19]).
In this paper, we develop a scheme to learn an optimal kernel from the convex combination of finite given kernels in regularization networks. Before addressing our method, we review some basic notations.
For two given data sets and , the goal is learning a map from to based on finite training data . In what follows, we will restrict and . A kernel K defined on is a symmetric function from to satisfying that for any finite set , the Gram matrix (kernel matrix) of order m is positive semidefinite, and if the kernel matrix GK is positive definite, we call K a positive definite kernel. What is more, for a given kernel K, there exists a unique reproducing kernel Hilbert space associated with K. The inner product of , denoted as , satisfies , for any , and we use to represent the norm of HK. For more details on kernels and reproducing kernel Hilbert spaces, see [1], [13].
Classical regularization network theory formulates the regression problem as a variational problem of finding a function f that minimizes the functionalwhere is the regularization parameter. It is well known that (see [5], [6], [13]) if fK is a minimizer of (1.1), it has the formfor some real vector which is determined by , . This classical regularization network theory is based on an essential assumption: the function f from to lies in or can be well approximated by some element of HK. See [11] for the approximation ability of HK.
In this paper, a new scheme is proposed to learn an optimal kernel, that is,where is the set of convex combination of finite given kernels. Our method is motivated by Micchelli and Pontil's work [10], in which the idea can be addressed as follows:The relation between these two models will be discussed in Section 3, as we will see, under certain conditions, the problem (1.3) can be approximated by a semidefinite programming problem which coincides with (1.4).
This paper is organized as follows: in Section 2, we address the basic issues of optimization problem (1.3): the existence of the solution and the convexity of this optimization problem. In Section 3, the relation between our model and MP's is discussed and we summarize our work in Section 4.
Section snippets
Learning an optimal kernel
In this part, we will discuss that the solution for problem (1.3) exists and this optimization problem is convex. For the sake of simplicity, we introduce the following notations:and sometimes we refer as the subset of whose elements are positive definite matrices.
Let be given p kernels, which are obtained mainly based on the prior information of the problems. Generally speaking, the choice of these kernels
Further discussion on the variational problem
If the given kernels are positive definite, under a mild condition on regularization parameter , the problem (2.3) can be approximated by a semidefinite programming problem. First we show an important theorem. Theorem 3.1 Let be given positive definite kernels, be kernel matrix associated with respective to inputs . By the definition of positive definite kernel, are symmetric positive definite matrices, we use to denote the least eigenvalues of , thus . For any
Conclusion
In this paper, we study a very important issue: kernel selecting for kernel-based methods. We propose a new scheme to learn a kernel function in regularization networks and analyze the theoretic issue of the corresponding variational problem. What is more, we discuss the relation between our model and MP's, which helps us understand MP's model well. But there are some problems still widely open. The first one is that how to choose the candidate kernels for combination, and as we have known,
Acknowledgment
The authors would like to give many thanks to the anonymous reviewers for their constructive suggestions and comments which greatly improve the paper.
Jie Chen received his B.Sc. degree in Information and Computational Science in 2005 and Ph.D. degree in Computational Mathematics in 2010 from Sun Yat-Sen University, Guangzhou, China. From July 2010 he works in the Department of Mathematics, Yibin University, Yibin, China. His research interests are in the areas of multiscale computing, fast singularity preserving algorithms of the linear and nonlinear integral equations and machine learning and adaptive algorithms.
References (20)
Theory of reproducing kernels
Transactions of the American Mathematical Society
(1950)- A. Argyriou, C.A. Micchelli, M. Pontil, Learning convex combinations of continuously parameterized basic kernels, in:...
- A. Argyriou, C.A. Micchelli, M. Pontil, Y. Ying, A spectral regularization framework for multi-task structure learning,...
- et al.
Convex Optimization
(2004) - et al.
Learning Theory: An Approximation Theory Viewpoint
(2007) - et al.
Regularization networks and support vector machines
Advances in Computational Mathematics
(2000) - S.J. Kim, A. Magnani, S. Boyd, Optimal kernel selection in kernel Fisher discriminant analysis, in: Proceedings of the...
- et al.
Learning the kernel matrix with semidefinite programming
Journal of Machine Learning Research
(2004) - (1966)
- et al.
Learning the kernel function via regularization
Journal of Machine Learning Research
(2005)
Cited by (1)
A novel fractional-order pid controller for integrated pressurized water reactor based on wavelet kernel neural network algorithm
2014, Mathematical Problems in Engineering
Jie Chen received his B.Sc. degree in Information and Computational Science in 2005 and Ph.D. degree in Computational Mathematics in 2010 from Sun Yat-Sen University, Guangzhou, China. From July 2010 he works in the Department of Mathematics, Yibin University, Yibin, China. His research interests are in the areas of multiscale computing, fast singularity preserving algorithms of the linear and nonlinear integral equations and machine learning and adaptive algorithms.
Fei Ma received his B.Sc. degree in Computer Science and Technology from Jilin University and his M.Sc. degree in Information Computation from Sun Yat-sen University, both in China. Now he is a member of Xinhu Futures Co. Ltd., Shanghai, China.
Jian Chen received his Ph.D. degree in Computational Mathematics from Zhongshan University, Guangzhou, China, in 2010. Since July 2010 he worked in the Department of Mathematics, Foshan University, Foshan, China. His research interests are in the areas of multiscale computing, fast algorithm of the nonlinear integral equations and differential equations and model reduction.