Elsevier

Knowledge-Based Systems

Volume 65, July 2014, Pages 83-95
Knowledge-Based Systems

Multi-kernel classification machine with reduced complexity

https://doi.org/10.1016/j.knosys.2014.04.012Get rights and content

Abstract

Multiple Kernel Learning (MKL) has been demonstrated to improve classification performance effectively. But it will cause a large complexity in some large-scale cases. In this paper, we aim to reduce both the time and space complexities of MKL, and thus propose an efficient multi-kernel classification machine based on the Nyström approximation. Firstly, we generate different kernel matrices Kps for given data. Secondly, we apply the Nyström approximation technique into each Kp so as to obtain its corresponding approximation matrix Kp. Thirdly, we fuse multiple generated Kps into the final ensemble matrix G with one certain heuristic rule. Finally, we select the Kernelized Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (KMHKS) as the incorporated paradigm, and meanwhile apply the G into KMHKS. In doing so, we propose a multi-kernel classification machine with reduced complexity named Nyström approximation matrix with Multiple KMHKSs (NMKMHKS). The experimental results here validate both the effectiveness and efficiency of the proposed NMKMHKS. The contributions of NMKMHKS are that: (1) compared with the existing MKL, NMKMHKS reduces the computational complexity of finding the solution scale from O(Mn3) to O(Mnm2), where M is the number of kernels, n is the number of training samples, and m is the number of the selected columns from Kp. Meanwhile, NMKMHKS reduces the space complexity of storing the kernel matrices from O(Mn2) to O(n2); (2) compared with the original KMHKS, NMKMHKS improves the classification performance but keeps a comparable space complexity; (3) the better recognition of NMKMHKS can be got in a strong correlation between multiple used Kps; and (4) NMKMHKS has a tighter generalization risk bound in terms of the Rademacher complexity analysis.

Introduction

Support Vector Machine (SVM) [2] is proposed to offer an effective classification technique [3]. SVM is supposed to be a basic learning framework for a series of kernel-based algorithms [15], [10], [27], [13], [25], [6], [40]. It can be found that SVM is superior in solving linear separable problems and has an effective performance in dealing with small-scale samples. In the real-world cases, there are generally nonlinear separable data. To solve these nonlinear separable data, SVM is introduced with the kernel technique which offers a new developing direction for SVM-based algorithms. SVM-based learning maps the input data x into a feature space F,ϕ:xϕ(x) [4]. Then one can use a linear separable algorithm in the feature space which plays a similar role as a nonlinear algorithm in the original input space. Generally, the ϕ(x) has two kinds of forms. The first one is named Empirical Kernel Mapping (EKM) which is explicitly represented as ϕe(x) [3], [4], [14]. EKM explicitly gives the form of ϕe(x) and then maps a sample into a feature space with the ϕe(x). The second one is named Implicit Kernel Mapping (IKM) which is implicitly represented as ϕi(x) [5], [36]. IKM uses the kernel function K(x,y)=ϕi(x)·ϕi(y) to compute the kernel matrix for the samples. EKM is demonstrated to be superior to IKM in terms of maintaining the geometrical measurement between the input data and the feature space [3]. But EKM need explicitly give the form of the mapping ϕ(x). Doing so will cause higher time and space complexities. Compared with EKM, IKM is much easier to carry out since it maps the input data x into a feature space with a special kernel function by the way of the inner product between each pair of points. The geometrical structure of the mapped data in the feature space is totally determined by the form of the used kernel function [4].

Meanwhile, the SVM-based learning can be sorted into Single Kernel Learning (SKL) [2] and Multiple Kernel Learning (MKL) [6] according to the number of the used kernels in the learning processing. SKL selects one optimized kernel from the candidate kernels for the final learning. But MKL can fuse multiple kernels in the learning processing. Therefore, in our opinion there are naturally at least four combinations: SKL with IKM, SKL with EKM, MKL with IKM, and MKL with EKM also denoted as MEKL. In the traditional kernel-based algorithms, SKL with either IKM or EKM is generally studied [8], [9], [4], [36]. Nevertheless the SKL is not always effective in some practical cases with heterogeneous, irregular, or structure complicated samples [1]. In order to overcome those disadvantages of SKL, researchers develop MKL [6], [7] which simultaneously adopts a group of optimized combination kernels. In fact, it is known that: (1) SKL only uses one kernel over the whole input space. When the data set has a complex structure, single kernel cannot wholly reflect the data structure. (2) SKL fails to combine different kernels into a joint learning machine. In fact, different kernels play different discriminant roles on the design of the learning machine. To solve the disadvantages of SKL, MKL is given. MKL uses multiple kernels in one learning machine and assigns feasible kernel weights to these kernels. By MKL, complex structures of data can be considered [6], [7]. One classical form of MKL is adopted as the convex combination of multiple candidate kernels [6]. Our previous work Multi-Views Kernelized Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (MultiV-KMHKS) [15] falls into the MKL learning framework.

In MultiV-KMHKS, we first adopted each candidate kernel to map the input data. Following that, we designed multiple sub-classifiers in each corresponding kernel mapped space with one certain SKL algorithm. The used SKL algorithm is the Kernelized Modification of Ho–Kashyap algorithm with Squared approximation of the misclassification errors (KMHKS) [10]. Finally, we combined the different sub-classifiers into one fused learning processing named MultiV-KMHKS [15]. However, it can be found that it is large for both the computational complexity of finding the solution scale and the space complexity of storing the kernel matrices of MultiV-KMHKS. In order to overcome the disadvantages, in this paper we develop an efficient multi-kernel classification machine based on the Nyström approximation technique. In doing so, we can efficiently reduce both the computational and space complexities of MKL. In practice, we firstly apply the Nyström approximation matrix technique to gain a group of approximation matrices KpRn×n for the M candidate kernel matrices KpRn×n,p=1M, where M is the number of kernels and n is the number of training samples. The approximation matrix Kp can be generated by choosing a subset of m columns from the original Kp [11], [12], [15]. Then we compute corresponding coefficients of these Kps and thus gain the optimized combination matrix G. Following that, we apply the generated matrix G into the learning framework of KMHKS. The whole learning processing is named as Nyström approximation matrix with Multiple KMHKSs (NMKMHKS). In order to validate the feasibility and efficiency of the proposed NMKMHKS, we design different experimental strategies including the classification performance and training time comparisons on multiple real-world data sets, the discussion for the regularized parameter, the convergence analysis, the Kernel Alignment (KA) [29] analysis, and the generalization risk analysis in terms of the Rademacher complexity. Finally, we can highlight the advantages of the proposed NMKMHKS as follows:

  • Compared with the original MultiV-KMHKS [15], the proposed NMKMHKS significantly reduces the computational complexity of finding the solution scale from O(Mn3) to O(Mnm2), which is attributed to that NMKMHKS only needs a comparable complexity to that of the single kernel learning KMHKS [10]. But MultiV-KMHKS needs about M times of the complexity of KMHKS. Furthermore, we demonstrate that compared with MultiV-KMHKS, NMKMHKS has a lower Rademacher complexity and thus obtains a superior generalization performance.

  • Compared with KMHKS, NMKMHKS improves the classification performance but keeps a comparable computational complexity due to that NMKMHKS can adopt multiple kernels simultaneously.

  • A superior recognition of NMKMHKS can be got in a strong correlation between multiple used Kps, which can give a guide advice for choosing kernels.

The rest of this paper is organized as follows. Section 2 reviews the related work of MKL and the origin of KMHKS. In Section 3, we give the architecture description of the proposed NMKMHKS. In Section 4, multiple kinds of the experiments on multiple real-world data sets demonstrate both the effectiveness and efficiency of NMKMHKS. Finally, the conclusions are given in Section 5.

Section snippets

Multiple Kernel Learning (MKL)

It had been demonstrated that MKL had a significant advantage over SKL in handling heterogeneous, irregular, and structure complicated data [6]. Since MKL adopted an ensemble of kernel functions rather than a single kernel one to construct learning machines, the integration of MKL generally went through three stages. There were the data-integrated process in the first period, the kernel matrices-integrated one in the middle period, and the classifiers-integrated one in the final period. In most

Proposed multi-kernel classification machine (NMKMHKS)

In this section we give the architecture of the proposed NMKMHKS. In fact, the process of NMKMHKS consists of the following steps. Firstly, we should generate M candidate kernel matrices K1,K2,,KM with the given kernel functions for the given data set and we should also centralize and normalize these candidate kernel matrices. Secondly, we apply Nyström approximation for each Kp to get its approximation matrix Kp where p=1,2,,M. Thirdly, we should calculate the coefficient αp of each Kp.

Experiments

In order to validate the effectiveness of the proposed NMKMHKS, we design our experiments from multiple views on UCI machine learning repository data sets [28], namely some real-world data sets. Firstly, we give the experimental setting for all implemented algorithms. The compared algorithms are NMKMHKS, KMHKS, MultiV-KMHKS, and some other state-of-the-art MKL algorithms including MKDA with Semi-Definite Program (SDP) [27], p-MKDA with Semi-Infinite Program (SIP) [13], SVM-2K [25], MVNA-KMHKS,

Conclusions

In this paper, we propose a multi-kernel classification machine based on the Nyström approximation named NMKMHKS. The procedure of the NMKMHKS can be given as follows: (1) using the Nyström approximation method to deal with each candidate kernel matrix so as to gain its corresponding Nyström approximation matrix Kp; (2) fusing multiple Kps into the ensemble matrix G; and (3) introducing the G into the KMHKS learning framework. We demonstrate both the effectiveness and efficiency of the

Acknowledgments

This work was partially supported by Natural Science Foundations of China under Grant Nos. 61272198 and 21176077, Innovation Program of Shanghai Municipal Education Commission under Grant No. 14ZZ054, the Fundamental Research Funds for the Central Universities, Shanghai Key Laboratory of Intelligent Information Processing of China under Grant No. IIPL-2012-003, Nature Science Foundation of Shanghai Province of China under Grant No. 11ZR1409600, and Shanghai Pujiang Program under Grant No.

References (40)

  • S. Sonnenburg et al.

    Large scale multiple kernel learning

    J. Mach. Learn. Res.

    (2006)
  • S. Sonnenburg et al.

    A general and efficient multiple kernel learning algorithm

    Neural Inform. Process. Syst.

    (2006)
  • B. Schölkopf, S. Mika, G. Ratsch, Kernel PCA pattern reconstruction via approximation preimages, in: Proceeding of the...
  • G. Baudat et al.

    Generalized discriminant analysis using a kernel approach

    Neural Comput.

    (2000)
  • J. Leski

    Kernel Ho–Kashyap classifier with generalization control

    Int. J. Appl. Math. Comput. Sci.

    (2004)
  • K.I.W. Christopher et al.

    Using the Nyström method to speed up kernel machines

    Neural Inform. Process. Syst.

    (2001)
  • M. Li, J.T. Kwok, B.L. Lu, Making large-scale Nyström approximation possible, in Proceeding of the 27th International...
  • F. Yan, K. Mikolajczyk, M. Barnard, H.P. Cai, J. Kittler, ℓp Norm multiple kernel fisher discriminant analysis for...
  • Z. Wang et al.

    MultiK-MHKS: a novel multiple kernel learning algorithm

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • M. Gönen, E. Alpaydin, Localized multiple kernel learning, in: Proceeding of the 25th International Conference on...
  • Cited by (14)

    • Sparse random projection-based hyperdisk classifier for bevel gearbox fault diagnosis

      2022, Advanced Engineering Informatics
      Citation Excerpt :

      In this regard, many scholars have proposed some fast algorithms. One popular method is to optimize the calculation of the kernel matrix, such as greedy stagewise algorithm [20], Nyström approximation method [21], kernel matrix decomposition [22], matrix transformation [23], sequential minimal optimization (SMO) algorithm [24], and so on. Another method is to reduce the number of samples participating in training.

    • Fast convex-hull vector machine for training on large-scale ncRNA data classification tasks

      2018, Knowledge-Based Systems
      Citation Excerpt :

      One popular strategy is to focus on reducing the computation of kernel matrix. This strategy is adopted in Nyström method [21,22], matrix decompositions [23], sequential minimal optimization (SMO) algorithm [24], greedy approximation [25] and matrix transforms [26,27]. Another popular strategy is to focus on reducing the training samples or on limiting the number of support vectors in SVM, e.g. in generalized core vector machine (CVM) [28] and its variants [29,30], SVMperf [31], fast kernel density estimate (FastKDE) [32], randomized approximation convex hull (ApproxHull) [33], online SVM based on convex hull vertices selection (VS-OSVM) [34] and convex hull vertices selection (CHVS) [35].

    • Improved multi-kernel classification machine with Nyström approximation technique and Universum data

      2015, Neurocomputing
      Citation Excerpt :

      Finally, we summary the procedure of the proposed Uni-INMKMHKS in Table 2. Here we discuss the relationship among the following six learning machines including KMHKS [4], MultiV-KMHKS [5], MVNA-KMHKS [8], NMKMHKS [9], INMKMHKS [10], and the proposed Uni-INMKMHKS in terms of the computational complexity of finding the solution scale and the space complexity of storing the kernel matrices. Before the discussion, we define n to be the number of training patterns and M be the number of views or kernel functions.

    • Improved multi-kernel classification machine with Nyström approximation technique

      2015, Pattern Recognition
      Citation Excerpt :

      The other is that INMKMHKS adopts a new Nyström approximation technique which avoids the setting of parameter m.This new proposed Nyström approximation technique can perfectly recover the full matrix in a higher probability. Since Nyström approximation technique can reduce the computational complexity of finding the solution scale and space complexity of storing the kernel matrices [17] when comparing with the existing MKL, our proposed Nyström approximation technique can also reduce the computational and space complexities. With such a reason, INMKMHKS can deal with the practical images with lower computational and space complexities.

    View all citing articles on Scopus
    View full text