Elsevier

Knowledge-Based Systems

Volume 83, July 2015, Pages 159-169
Knowledge-Based Systems

Multiple kernel dimensionality reduction via spectral regression and trace ratio maximization

https://doi.org/10.1016/j.knosys.2015.03.019Get rights and content

Abstract

The performance of kernel-based dimensionality reduction heavily relies on the selection of kernel functions. Multiple kernel learning for dimensionality reduction (MKL-DR) has been recently proposed to learn a convex combination from a set of base kernels. But this method relaxes a nonconvex quadratically constrained quadratic programming (QCQP) problem into a semi-definite programming (SDP) problem to specify the kernel weights, which might lead to its performance degradation. Although a trace ratio maximization approach to multiple-kernel based dimensionality reduction (MKL-TR) has been presented to avoid convex relaxation, it has to compute a generalized eigenvalue problem in each iteration of its algorithm, which is expensive in both time and memory. To improve the performance of these methods further, this paper proposes a novel multiple kernel dimensionality reduction method by virtue of spectral regression and trace ratio maximization, termed as MKL-SRTR. The proposed approach aims at learning an appropriate kernel from the multiple base kernels and a transformation into a lower dimensionality space efficiently and effectively. The experimental results demonstrate the effectiveness of the proposed method in benchmark datasets for supervised, unsupervised as well as semi-supervised scenarios.

Introduction

In real applications, many machine learning algorithms, such as spectral clustering, usually behave badly when faced with high-dimensional data. Hence, finding a way of transforming them into a unified space of lower dimension can facilitate the underlying tasks such as pattern recognition or regression problems. According to different assumptions about the data distribution, these methods can be classified into two categories, i.e., linear and nonlinear methods [1], [2], [3], [4], [5], [6]. If the training data is embedded in a linear subspace, linear dimensionality reduction methods are commonly used to discover the dimensionality of the subspace. Correspondingly, if the data is sampled from a nonlinear low dimensional manifold which is embedded in the high-dimensional ambient space, nonlinear dimensionality reduction methods are usually utilized to preserve the manifold structure.

In recent years, nonlinear dimensionality reduction methods based on manifold assumption have been attracting many researchers. These methods mainly take advantage of various manifold learning techniques, such as isometric feature mapping (ISOMAP) [6], locally linear embedding (LLE) [7] and Laplacian Eigenmap [8] (LE), which reduce the dimensionality of a fixed training set in a way that maximally preserve certain inter-point relationships. But, one of the major limitations of these methods is that they do not generally address the out-of-sample problem. There are a lot of approaches that try to provide natural out-of-sample extensions of Lapalcian Eigenmaps, LLE and Isomap, such as locality preserving projections (LPP) [9] and regularized regression neural networks, they address this issue by explicitly finding an embedding function when minimizing the objective function [9], [10], [11]. The computation of these methods generally involves eigen-decomposition of dense matrices which is expensive in both time and memory. Some other approaches interpret these spectral embedding algorithms as learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Thus, they address this issue through a kernel view of LLE, Isomap and Laplacian Eigenmaps [12], [13]. To obtain the embedding result of an unseen example, the kernel function values of this unseen example with all the training samples have to be calculated, which may not be possible in some situations. In addition, several fast extension methods based on landmarks have been developed to dramatically reduce the computational burden of these methods. But, these methods use landmarks to increase speed at the cost of some accuracy. Take Landmark-Isomap for example, it is a variant of Isomap which is faster than Isomap. However, the accuracy of the manifold is compromised by a marginal factor [14]. Fortunately, Spectral regression (SR) subtly avoids eigen-decomposition of dense matrices by means of regression and spectral graph analysis. Meanwhile, it can enhance the performance of dimensionality reduction by introducing the regularization term [15], [16]. Moreover, it can be performed either in supervised, unsupervised or semi-supervised settings. These dimensionality reduction techniques could be unified under a common framework called graph embedding [17].

Recently, multiple kernel learning was incorporated into graph embedding called multiple kernel learning for dimensionality reduction (MKL-DR) [18], which aims to learning a linear transformation in the nonlinear space induced by a set of base kernels. MKL-DR provides the ability of automatically selecting optimal kernels by identifying a linear combination of base kernels. The advantage of using multiple kernels instead of only one kernel in dimensionality reduction has been demonstrated [17]. But, this method needs to iteratively solve a generalized eigenvalue problem, which is time-consuming. In addition, it replaces a QCQP optimization problem by a SDP problem to obtaining kernel weights, which could have a negative effect on its performance. To speed up MKL-DR, a multiple kernel dimensionality reduction algorithm based on spectral regression, called MKL-SR, was proposed in Ref. [19]. It transforms eigen-decomposition of dense matrices in the first step of MKL-DR into a linear regression problem by means of spectral regression. But, MKL-SR still uses the convex relaxation technique to optimize the kernel weights. Instead of convex relaxation, a multiple kernel learning framework called MKL-TR was recently proposed to avoid relaxing the primal problem [20]. MKL-TR learns a transformation into a space of lower dimension by converting a trace ratio maximization problem into single trace maximization. But, the computational bottleneck of both MKL-DR and MKL-TR is in iteratively computing generalized eigen-decomposition of regularized dense matrices, which has high computational complexity of On3 (n denotes the number of training data).

Since SR, MKL-DR, MKL-SR and MKL-TR are all based on graph embedding, it is natural to make good use of SR and the trace ratio optimization technique to overcome the restrictions of MKL-DR, MKL-SR and MKL-TR. Consequently, we present a framework, termed as MKL-SRTR, which not only incorporates spectral regression into multiple kernel learning for dimensionality reduction, but uses the trace ratio optimization algorithm to obtain kernel weights. On one hand, spectral regression does not increase speed at the cost of some accuracy, on the other hand, the trace ratio optimization algorithm can avoid the convex relaxation of MKL-DR and MKL-SR. The formulation of MKL-SRTR with graph embedding will be illustrated, which not only extends any DR technique expressible by graph embedding to multiple kernel settings, but selects optimal kernels more efficiently than other multiple kernel-based dimensionality reduction methods. Specifically, the proposed approach decreases the computational complexity to On2. The experimental results demonstrate the proposed method achieves better or similar performance compared to other algorithms for supervised, unsupervised as well as semi-supervised scenarios. In addition, as other multiple kernel-based dimensionality reduction methods would do, it can also solve the out-of-sample extension problem.

The paper is structured as follows: Spectral regression and dimensionality reduction with multiple kernels are introduced in Section 2 Spectral regression algorithm, 3 Dimensionality reduction with multiple kernels, respectively. In Section 4, we provide the MKL-SRTR framework and the optimization process. The experimental results are listed in Section 5. Finally, we give the related conclusions and a discussion of future works in Section 6. In order to avoid confusion, we give a list of the main notations used in this paper in Table 1.

Section snippets

Spectral regression algorithm

The SR algorithm casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Meanwhile, it incorporates a regularization term into the regression model, which contributes to achieve satisfactory performance of dimensionality reduction at fast learning speed. Given a training set with l labeled samples x1,x2,,xl and n-l unlabeled samples xl+1,xl+2,,xn, where the sample xiRd belongs to one of c classes, and let lk be the

Dimensionality reduction with multiple kernels

Since the relevant literature is quite extensive, our survey instead emphasizes the key works crucial to the establishment of the proposed framework.

Multiple kernel dimensionality reduction based on SR and trace ratio maximization

Since MKL-DR, MKL-SR and MKL-TR can all be viewed as a multiple kernel extension of graph embedding, the proposed MKL-SRTR model is also derived from graph embedding. In MKL-SRTR, we utilize spectral regression to speed up optimizing a sample coefficient matrix A without sacrificing accuracy, and use the trace ratio maximization algorithm to find a kernel weight vector β instead of convex relaxation. These two aspects can guarantee the effectiveness and efficiency of the proposed algorithm.

Experiments

We tested all algorithms on UCI datasets (Sonar, Ionosphere and Isolet1), face recognition datasets (Yale2), PIE,3 ORL,4 digits recognition datasets (USPS5) and MNIST6 and object recognition datasets (COIL-20

Conclusion

Currently, multiple kernel learning is one of the hot spots in the machine learning community. It is crucial to effectively select an appropriate kernel to improve the performance of kernel dimensionality reduction methods by MKL. In this paper, we propose a novel multiple kernel dimensionality reduction method based on spectral regression and trace ratio maximization, which learns a linear combination of given base kernels and a linear transformation matrix simultaneously. By means of trace

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61403394) and the Fundamental Research Funds for the Central Universities (2014QNA45).

References (29)

  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • M. Belkin et al.

    Laplacian eigenmaps and spectral techniques for embedding and clustering

  • X. He et al.

    Locality preserving projections

    Proc Conf. Adv. Neural Inf. Process. Syst.

    (2003)
  • M. Brand, Continuous nonlinear dimensionality reduction by kernel eigenmaps, in: International Joint Conference on...
  • Cited by (16)

    • High-order fuzzy clustering algorithm based on multikernel mean shift

      2020, Neurocomputing
      Citation Excerpt :

      The role of kernel function is to conceal a mapping from a low-dimensional space to a high-dimensional one [40], which can transform linearly inseparable points in the low-dimensional space into linear separability, as shown in Fig. 2. The computation processing weakens high-dimensional operations subtly through kernel functions, thereby achieving dimensionality reduction and avoiding the curse of dimensionality [41,42]. The kernel function method can handle high-dimensional input effectively, given the dimension d of the input space has no effect on the kernel function matrix within a certain range.

    • Multiple kernel dimensionality reduction based on linear regression virtual reconstruction for image set classification

      2019, Neurocomputing
      Citation Excerpt :

      Here, the concept of the dual linear regression classification (DLRC) algorithm is described as follow. Compared with the awareness of human to choose the kernels [34–37], the strategy of Multiple kernel learning has been proposed in [38–40] and extensively incorporated in the training process of dimensionality reduction (DR) methods [41–45]. It typically works with multiple base kernels, and integrates into the domain of kernel matrices.

    • Spectral regression based marginal Fisher analysis dimensionality reduction algorithm

      2018, Neurocomputing
      Citation Excerpt :

      Recently, nonlinear dimensionality reduction is an active research subject in machine learning and pattern recognition [1–3]. A family of multiple kernel dimensionality reduction methods, such as MKL-DR [4], MKL-TR [5], MKL-SR [6] and MKL-SRTR [7], has been proposed to automatically construct new kernels using existing base kernels instead of using only one specific kernel. Since these methods can be unified under the graph embedding framework and be regarded as multiple kernel versions of linear discriminant analysis (LDA), which has the assumption that the distribution of each class is considered to be a unimodal Gaussian.

    View all citing articles on Scopus
    View full text