Multi-kernel classification machine with reduced complexity
Introduction
Support Vector Machine (SVM) [2] is proposed to offer an effective classification technique [3]. SVM is supposed to be a basic learning framework for a series of kernel-based algorithms [15], [10], [27], [13], [25], [6], [40]. It can be found that SVM is superior in solving linear separable problems and has an effective performance in dealing with small-scale samples. In the real-world cases, there are generally nonlinear separable data. To solve these nonlinear separable data, SVM is introduced with the kernel technique which offers a new developing direction for SVM-based algorithms. SVM-based learning maps the input data x into a feature space [4]. Then one can use a linear separable algorithm in the feature space which plays a similar role as a nonlinear algorithm in the original input space. Generally, the has two kinds of forms. The first one is named Empirical Kernel Mapping (EKM) which is explicitly represented as [3], [4], [14]. EKM explicitly gives the form of and then maps a sample into a feature space with the . The second one is named Implicit Kernel Mapping (IKM) which is implicitly represented as [5], [36]. IKM uses the kernel function to compute the kernel matrix for the samples. EKM is demonstrated to be superior to IKM in terms of maintaining the geometrical measurement between the input data and the feature space [3]. But EKM need explicitly give the form of the mapping . Doing so will cause higher time and space complexities. Compared with EKM, IKM is much easier to carry out since it maps the input data x into a feature space with a special kernel function by the way of the inner product between each pair of points. The geometrical structure of the mapped data in the feature space is totally determined by the form of the used kernel function [4].
Meanwhile, the SVM-based learning can be sorted into Single Kernel Learning (SKL) [2] and Multiple Kernel Learning (MKL) [6] according to the number of the used kernels in the learning processing. SKL selects one optimized kernel from the candidate kernels for the final learning. But MKL can fuse multiple kernels in the learning processing. Therefore, in our opinion there are naturally at least four combinations: SKL with IKM, SKL with EKM, MKL with IKM, and MKL with EKM also denoted as MEKL. In the traditional kernel-based algorithms, SKL with either IKM or EKM is generally studied [8], [9], [4], [36]. Nevertheless the SKL is not always effective in some practical cases with heterogeneous, irregular, or structure complicated samples [1]. In order to overcome those disadvantages of SKL, researchers develop MKL [6], [7] which simultaneously adopts a group of optimized combination kernels. In fact, it is known that: (1) SKL only uses one kernel over the whole input space. When the data set has a complex structure, single kernel cannot wholly reflect the data structure. (2) SKL fails to combine different kernels into a joint learning machine. In fact, different kernels play different discriminant roles on the design of the learning machine. To solve the disadvantages of SKL, MKL is given. MKL uses multiple kernels in one learning machine and assigns feasible kernel weights to these kernels. By MKL, complex structures of data can be considered [6], [7]. One classical form of MKL is adopted as the convex combination of multiple candidate kernels [6]. Our previous work Multi-Views Kernelized Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (MultiV-KMHKS) [15] falls into the MKL learning framework.
In MultiV-KMHKS, we first adopted each candidate kernel to map the input data. Following that, we designed multiple sub-classifiers in each corresponding kernel mapped space with one certain SKL algorithm. The used SKL algorithm is the Kernelized Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (KMHKS) [10]. Finally, we combined the different sub-classifiers into one fused learning processing named MultiV-KMHKS [15]. However, it can be found that it is large for both the computational complexity of finding the solution scale and the space complexity of storing the kernel matrices of MultiV-KMHKS. In order to overcome the disadvantages, in this paper we develop an efficient multi-kernel classification machine based on the Nyström approximation technique. In doing so, we can efficiently reduce both the computational and space complexities of MKL. In practice, we firstly apply the Nyström approximation matrix technique to gain a group of approximation matrices for the M candidate kernel matrices , where M is the number of kernels and n is the number of training samples. The approximation matrix can be generated by choosing a subset of m columns from the original [11], [12], [15]. Then we compute corresponding coefficients of these s and thus gain the optimized combination matrix . Following that, we apply the generated matrix into the learning framework of KMHKS. The whole learning processing is named as Nyström approximation matrix with Multiple KMHKSs (NMKMHKS). In order to validate the feasibility and efficiency of the proposed NMKMHKS, we design different experimental strategies including the classification performance and training time comparisons on multiple real-world data sets, the discussion for the regularized parameter, the convergence analysis, the Kernel Alignment (KA) [29] analysis, and the generalization risk analysis in terms of the Rademacher complexity. Finally, we can highlight the advantages of the proposed NMKMHKS as follows:
- •
Compared with the original MultiV-KMHKS [15], the proposed NMKMHKS significantly reduces the computational complexity of finding the solution scale from to , which is attributed to that NMKMHKS only needs a comparable complexity to that of the single kernel learning KMHKS [10]. But MultiV-KMHKS needs about M times of the complexity of KMHKS. Furthermore, we demonstrate that compared with MultiV-KMHKS, NMKMHKS has a lower Rademacher complexity and thus obtains a superior generalization performance.
- •
Compared with KMHKS, NMKMHKS improves the classification performance but keeps a comparable computational complexity due to that NMKMHKS can adopt multiple kernels simultaneously.
- •
A superior recognition of NMKMHKS can be got in a strong correlation between multiple used s, which can give a guide advice for choosing kernels.
The rest of this paper is organized as follows. Section 2 reviews the related work of MKL and the origin of KMHKS. In Section 3, we give the architecture description of the proposed NMKMHKS. In Section 4, multiple kinds of the experiments on multiple real-world data sets demonstrate both the effectiveness and efficiency of NMKMHKS. Finally, the conclusions are given in Section 5.
Section snippets
Multiple Kernel Learning (MKL)
It had been demonstrated that MKL had a significant advantage over SKL in handling heterogeneous, irregular, and structure complicated data [6]. Since MKL adopted an ensemble of kernel functions rather than a single kernel one to construct learning machines, the integration of MKL generally went through three stages. There were the data-integrated process in the first period, the kernel matrices-integrated one in the middle period, and the classifiers-integrated one in the final period. In most
Proposed multi-kernel classification machine (NMKMHKS)
In this section we give the architecture of the proposed NMKMHKS. In fact, the process of NMKMHKS consists of the following steps. Firstly, we should generate M candidate kernel matrices with the given kernel functions for the given data set and we should also centralize and normalize these candidate kernel matrices. Secondly, we apply Nyström approximation for each to get its approximation matrix where . Thirdly, we should calculate the coefficient of each .
Experiments
In order to validate the effectiveness of the proposed NMKMHKS, we design our experiments from multiple views on UCI machine learning repository data sets [28], namely some real-world data sets. Firstly, we give the experimental setting for all implemented algorithms. The compared algorithms are NMKMHKS, KMHKS, MultiV-KMHKS, and some other state-of-the-art MKL algorithms including MKDA with Semi-Definite Program (SDP) [27], -MKDA with Semi-Infinite Program (SIP) [13], SVM- [25], MVNA-KMHKS,
Conclusions
In this paper, we propose a multi-kernel classification machine based on the Nyström approximation named NMKMHKS. The procedure of the NMKMHKS can be given as follows: (1) using the Nyström approximation method to deal with each candidate kernel matrix so as to gain its corresponding Nyström approximation matrix ; (2) fusing multiple s into the ensemble matrix ; and (3) introducing the into the KMHKS learning framework. We demonstrate both the effectiveness and efficiency of the
Acknowledgments
This work was partially supported by Natural Science Foundations of China under Grant Nos. 61272198 and 21176077, Innovation Program of Shanghai Municipal Education Commission under Grant No. 14ZZ054, the Fundamental Research Funds for the Central Universities, Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-2012-003, Nature Science Foundation of Shanghai Province of China under Grant No. 11ZR1409600, and Shanghai Pujiang Program under Grant No.
References (40)
- et al.
Multi-view kernel machine on single-view data
Neurocomputing
(2009) - et al.
A novel multi-view classifier based on Nyström approximation
Expert Syst. Appl.
(2011) Ho–Kashyap classifier with generalization control
Pattern Recogn. Lett.
(2003)- et al.
Regularized multi-view machine based on response surface technique
Neurocomputing
(2012) - et al.
Matrix-pattern-oriented Ho–Kashyap classifier with regularization learning
Pattern Recogn.
(2007) - et al.
Learning the kernel matrix with semidefinite programming
J. Mach. Learn. Res.
(2004) Statistical Learning Theory
(1998)- et al.
Input space versus feature space in kernel-based methods
IEEE Trans. Neural Netw.
(1999) - et al.
Optimizing the kernel in the empirical feature space
IEEE Trans. Neural Netw.
(2005) - et al.
Kernel Methods for Pattern Analysis
(2004)
Large scale multiple kernel learning
J. Mach. Learn. Res.
A general and efficient multiple kernel learning algorithm
Neural Inform. Process. Syst.
Generalized discriminant analysis using a kernel approach
Neural Comput.
Kernel Ho–Kashyap classifier with generalization control
Int. J. Appl. Math. Comput. Sci.
Using the Nyström method to speed up kernel machines
Neural Inform. Process. Syst.
MultiK-MHKS: a novel multiple kernel learning algorithm
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (14)
Sparse random projection-based hyperdisk classifier for bevel gearbox fault diagnosis
2022, Advanced Engineering InformaticsCitation Excerpt :In this regard, many scholars have proposed some fast algorithms. One popular method is to optimize the calculation of the kernel matrix, such as greedy stagewise algorithm [20], Nyström approximation method [21], kernel matrix decomposition [22], matrix transformation [23], sequential minimal optimization (SMO) algorithm [24], and so on. Another method is to reduce the number of samples participating in training.
Reducing the number of training samples for transient classification in nuclear power plant
2019, Annals of Nuclear EnergyFast convex-hull vector machine for training on large-scale ncRNA data classification tasks
2018, Knowledge-Based SystemsCitation Excerpt :One popular strategy is to focus on reducing the computation of kernel matrix. This strategy is adopted in Nyström method [21,22], matrix decompositions [23], sequential minimal optimization (SMO) algorithm [24], greedy approximation [25] and matrix transforms [26,27]. Another popular strategy is to focus on reducing the training samples or on limiting the number of support vectors in SVM, e.g. in generalized core vector machine (CVM) [28] and its variants [29,30], SVMperf [31], fast kernel density estimate (FastKDE) [32], randomized approximation convex hull (ApproxHull) [33], online SVM based on convex hull vertices selection (VS-OSVM) [34] and convex hull vertices selection (CHVS) [35].
Improved multi-kernel classification machine with Nyström approximation technique and Universum data
2015, NeurocomputingCitation Excerpt :Finally, we summary the procedure of the proposed Uni-INMKMHKS in Table 2. Here we discuss the relationship among the following six learning machines including KMHKS [4], MultiV-KMHKS [5], MVNA-KMHKS [8], NMKMHKS [9], INMKMHKS [10], and the proposed Uni-INMKMHKS in terms of the computational complexity of finding the solution scale and the space complexity of storing the kernel matrices. Before the discussion, we define n to be the number of training patterns and M be the number of views or kernel functions.
Improved multi-kernel classification machine with Nyström approximation technique
2015, Pattern RecognitionCitation Excerpt :The other is that INMKMHKS adopts a new Nyström approximation technique which avoids the setting of parameter m.This new proposed Nyström approximation technique can perfectly recover the full matrix in a higher probability. Since Nyström approximation technique can reduce the computational complexity of finding the solution scale and space complexity of storing the kernel matrices [17] when comparing with the existing MKL, our proposed Nyström approximation technique can also reduce the computational and space complexities. With such a reason, INMKMHKS can deal with the practical images with lower computational and space complexities.