Multiple kernel dimensionality reduction via spectral regression and trace ratio maximization

doi:10.1016/j.knosys.2015.03.019

Knowledge-Based Systems

Volume 83, July 2015, Pages 159-169

https://doi.org/10.1016/j.knosys.2015.03.019 Get rights and content

Abstract

The performance of kernel-based dimensionality reduction heavily relies on the selection of kernel functions. Multiple kernel learning for dimensionality reduction (MKL-DR) has been recently proposed to learn a convex combination from a set of base kernels. But this method relaxes a nonconvex quadratically constrained quadratic programming (QCQP) problem into a semi-definite programming (SDP) problem to specify the kernel weights, which might lead to its performance degradation. Although a trace ratio maximization approach to multiple-kernel based dimensionality reduction (MKL-TR) has been presented to avoid convex relaxation, it has to compute a generalized eigenvalue problem in each iteration of its algorithm, which is expensive in both time and memory. To improve the performance of these methods further, this paper proposes a novel multiple kernel dimensionality reduction method by virtue of spectral regression and trace ratio maximization, termed as MKL-SRTR. The proposed approach aims at learning an appropriate kernel from the multiple base kernels and a transformation into a lower dimensionality space efficiently and effectively. The experimental results demonstrate the effectiveness of the proposed method in benchmark datasets for supervised, unsupervised as well as semi-supervised scenarios.

Introduction

In real applications, many machine learning algorithms, such as spectral clustering, usually behave badly when faced with high-dimensional data. Hence, finding a way of transforming them into a unified space of lower dimension can facilitate the underlying tasks such as pattern recognition or regression problems. According to different assumptions about the data distribution, these methods can be classified into two categories, i.e., linear and nonlinear methods [1], [2], [3], [4], [5], [6]. If the training data is embedded in a linear subspace, linear dimensionality reduction methods are commonly used to discover the dimensionality of the subspace. Correspondingly, if the data is sampled from a nonlinear low dimensional manifold which is embedded in the high-dimensional ambient space, nonlinear dimensionality reduction methods are usually utilized to preserve the manifold structure.

In recent years, nonlinear dimensionality reduction methods based on manifold assumption have been attracting many researchers. These methods mainly take advantage of various manifold learning techniques, such as isometric feature mapping (ISOMAP) [6], locally linear embedding (LLE) [7] and Laplacian Eigenmap [8] (LE), which reduce the dimensionality of a fixed training set in a way that maximally preserve certain inter-point relationships. But, one of the major limitations of these methods is that they do not generally address the out-of-sample problem. There are a lot of approaches that try to provide natural out-of-sample extensions of Lapalcian Eigenmaps, LLE and Isomap, such as locality preserving projections (LPP) [9] and regularized regression neural networks, they address this issue by explicitly finding an embedding function when minimizing the objective function [9], [10], [11]. The computation of these methods generally involves eigen-decomposition of dense matrices which is expensive in both time and memory. Some other approaches interpret these spectral embedding algorithms as learning the principal eigenfunctions of an operator defined from a kernel and the unknown data generating density. Thus, they address this issue through a kernel view of LLE, Isomap and Laplacian Eigenmaps [12], [13]. To obtain the embedding result of an unseen example, the kernel function values of this unseen example with all the training samples have to be calculated, which may not be possible in some situations. In addition, several fast extension methods based on landmarks have been developed to dramatically reduce the computational burden of these methods. But, these methods use landmarks to increase speed at the cost of some accuracy. Take Landmark-Isomap for example, it is a variant of Isomap which is faster than Isomap. However, the accuracy of the manifold is compromised by a marginal factor [14]. Fortunately, Spectral regression (SR) subtly avoids eigen-decomposition of dense matrices by means of regression and spectral graph analysis. Meanwhile, it can enhance the performance of dimensionality reduction by introducing the regularization term [15], [16]. Moreover, it can be performed either in supervised, unsupervised or semi-supervised settings. These dimensionality reduction techniques could be unified under a common framework called graph embedding [17].

Recently, multiple kernel learning was incorporated into graph embedding called multiple kernel learning for dimensionality reduction (MKL-DR) [18], which aims to learning a linear transformation in the nonlinear space induced by a set of base kernels. MKL-DR provides the ability of automatically selecting optimal kernels by identifying a linear combination of base kernels. The advantage of using multiple kernels instead of only one kernel in dimensionality reduction has been demonstrated [17]. But, this method needs to iteratively solve a generalized eigenvalue problem, which is time-consuming. In addition, it replaces a QCQP optimization problem by a SDP problem to obtaining kernel weights, which could have a negative effect on its performance. To speed up MKL-DR, a multiple kernel dimensionality reduction algorithm based on spectral regression, called MKL-SR, was proposed in Ref. [19]. It transforms eigen-decomposition of dense matrices in the first step of MKL-DR into a linear regression problem by means of spectral regression. But, MKL-SR still uses the convex relaxation technique to optimize the kernel weights. Instead of convex relaxation, a multiple kernel learning framework called MKL-TR was recently proposed to avoid relaxing the primal problem [20]. MKL-TR learns a transformation into a space of lower dimension by converting a trace ratio maximization problem into single trace maximization. But, the computational bottleneck of both MKL-DR and MKL-TR is in iteratively computing generalized eigen-decomposition of regularized dense matrices, which has high computational complexity of $O (n^{3})$ (n denotes the number of training data).

Since SR, MKL-DR, MKL-SR and MKL-TR are all based on graph embedding, it is natural to make good use of SR and the trace ratio optimization technique to overcome the restrictions of MKL-DR, MKL-SR and MKL-TR. Consequently, we present a framework, termed as MKL-SRTR, which not only incorporates spectral regression into multiple kernel learning for dimensionality reduction, but uses the trace ratio optimization algorithm to obtain kernel weights. On one hand, spectral regression does not increase speed at the cost of some accuracy, on the other hand, the trace ratio optimization algorithm can avoid the convex relaxation of MKL-DR and MKL-SR. The formulation of MKL-SRTR with graph embedding will be illustrated, which not only extends any DR technique expressible by graph embedding to multiple kernel settings, but selects optimal kernels more efficiently than other multiple kernel-based dimensionality reduction methods. Specifically, the proposed approach decreases the computational complexity to $O (n^{2})$ . The experimental results demonstrate the proposed method achieves better or similar performance compared to other algorithms for supervised, unsupervised as well as semi-supervised scenarios. In addition, as other multiple kernel-based dimensionality reduction methods would do, it can also solve the out-of-sample extension problem.

The paper is structured as follows: Spectral regression and dimensionality reduction with multiple kernels are introduced in Section 2 Spectral regression algorithm, 3 Dimensionality reduction with multiple kernels, respectively. In Section 4, we provide the MKL-SRTR framework and the optimization process. The experimental results are listed in Section 5. Finally, we give the related conclusions and a discussion of future works in Section 6. In order to avoid confusion, we give a list of the main notations used in this paper in Table 1.

Section snippets

Spectral regression algorithm

The SR algorithm casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Meanwhile, it incorporates a regularization term into the regression model, which contributes to achieve satisfactory performance of dimensionality reduction at fast learning speed. Given a training set with l labeled samples $x_{1}, x_{2}, \dots, x_{l}$ and $(n - l)$ unlabeled samples $x_{l + 1}, x_{l + 2}, \dots, x_{n}$ , where the sample $x_{i} \in R^{d}$ belongs to one of c classes, and let $l_{k}$ be the

Dimensionality reduction with multiple kernels

Since the relevant literature is quite extensive, our survey instead emphasizes the key works crucial to the establishment of the proposed framework.

Multiple kernel dimensionality reduction based on SR and trace ratio maximization

Since MKL-DR, MKL-SR and MKL-TR can all be viewed as a multiple kernel extension of graph embedding, the proposed MKL-SRTR model is also derived from graph embedding. In MKL-SRTR, we utilize spectral regression to speed up optimizing a sample coefficient matrix A without sacrificing accuracy, and use the trace ratio maximization algorithm to find a kernel weight vector β instead of convex relaxation. These two aspects can guarantee the effectiveness and efficiency of the proposed algorithm.

Experiments

We tested all algorithms on UCI datasets (Sonar, Ionosphere and Isolet¹), face recognition datasets (Yale²), PIE,³ ORL,⁴ digits recognition datasets (USPS⁵) and MNIST⁶ and object recognition datasets (COIL-20

Conclusion

Currently, multiple kernel learning is one of the hot spots in the machine learning community. It is crucial to effectively select an appropriate kernel to improve the performance of kernel dimensionality reduction methods by MKL. In this paper, we propose a novel multiple kernel dimensionality reduction method based on spectral regression and trace ratio maximization, which learns a linear combination of given base kernels and a linear transformation matrix simultaneously. By means of trace

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61403394) and the Fundamental Research Funds for the Central Universities (2014QNA45).

References (29)

Xiaofeng Zhu et al.
Self-taught dimensionality reduction on the high-dimensional small-sized data
Pattern Recogn.
(2013)
Xiaofeng Zhu et al.
Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis
Pattern Recogn.
(2012)
Wenhao Jiang et al.
A trace ratio maximization approach to multiple kernel-based dimensionality reduction
Neural Networks
(2014)
S. Xiang et al.
Learning a Mahalanobis distance metric for data clustering and classification
Pattern Recogn.
(2008)
Jun-Bao Li et al.
Kernel class-wise locality preserving projection
Inf. Sci.
(2008)
Weifu Chen et al.
Spectral clustering: a semi-supervised approach
Neurocomputing
(2012)
A. Kapoor et al.
Gaussian processes for object categorization
Int. J. Comput. Vision
(2010)
I. Joliffe
Principal Component Analysis
(1986)
D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: IEEE 11th International Conference on Computer...
J. Tenenbaum et al.
A global geometric framework for nonlinear dimensionality reduction
Science
(2000)

S.T. Roweis et al.

Nonlinear dimensionality reduction by locally linear embedding

Science

(2000)

M. Belkin et al.

Laplacian eigenmaps and spectral techniques for embedding and clustering

X. He et al.

Locality preserving projections

Proc Conf. Adv. Neural Inf. Process. Syst.

(2003)

M. Brand, Continuous nonlinear dimensionality reduction by kernel eigenmaps, in: International Joint Conference on...

Cited by (16)

Wind speed prediction utilizing dynamic spectral regression broad learning system coupled with multimodal information
2024, Engineering Applications of Artificial Intelligence
As the integration of wind energy into the power system increases, accurate wind speed prediction becomes crucial to ensure the reliable and economically efficient operation of the grid. The non-stationary, chaotic, and nonlinear characteristics of wind speed pose significant challenges for prediction models in uncovering the dynamic evolution process. To address these challenges, we proposed a wind speed prediction method based on the dynamic spectral regression broad learning system coupled with multimodal information (DSR-BLS). Firstly, we proposed a frequency density clustering-based mode decomposition (FDCMD) algorithm, which automatically transforms the non-stationary wind speed into multiple relatively stationary modal components. Next, we proposed the dynamic spectral regression (DSR) algorithm, which is based on dynamic-inner latent variable modeling and spectral regression. DSR can extract features and reconstruct the dynamic characteristics of wind speed through non-uniform embedding in the phase space. Finally, DSR-BLS is proposed to enhance the deterministic point prediction accuracy of the original BLS by utilizing multimodal features and dynamic features. The experiments show that DSR-BLS outperforms the comparative prediction methods in multi-step ahead prediction results.
High-order fuzzy clustering algorithm based on multikernel mean shift
2020, Neurocomputing
Citation Excerpt :
The role of kernel function is to conceal a mapping from a low-dimensional space to a high-dimensional one [40], which can transform linearly inseparable points in the low-dimensional space into linear separability, as shown in Fig. 2. The computation processing weakens high-dimensional operations subtly through kernel functions, thereby achieving dimensionality reduction and avoiding the curse of dimensionality [41,42]. The kernel function method can handle high-dimensional input effectively, given the dimension d of the input space has no effect on the kernel function matrix within a certain range.
This study proposes a method of constructing multikernel space to ensure the integrity of the original data in which the multikernel space aims to reduce the computational complexity of multidimensional data and is suitable for the processing of relational data. The high-dimensional samples of the original space are therefore mapped into a high-dimensional kernel feature space to obtain the inner product. However, when the dimensions of the feature space for multikernel is extremely high or even infinite, the inner product is difficult to calculate directly. To overcome these limitations, this study further proposes a high-order fuzzy clustering (HoFC) algorithm called multikernel mean shift (MKMS-HoFC), which incorporates mean shift based on multikernel space to divide the data and expand the original dimension into multiple new dimensions in the high-dimensional kernel feature space. The MKMS-HoFC initially maps the input points into a high-dimensional feature space of the multikernel and constructs a separating hyper-plane that maximizes the margin among multiple clusters in this space. The multikernel then finds the optimal hyper-plane by HoFC. This method iteratively searches for the densest regions of the sample points in the feature space and improves the clustering performance by using the multidimensional commensurability of HoFC. Real datasets are used to analyze the quality of clustering. Experimental results and comparisons demonstrate the excellent performances of MKMS-HoFC with its effectiveness in practice.
Multiple kernel dimensionality reduction based on linear regression virtual reconstruction for image set classification
2019, Neurocomputing
Citation Excerpt :
Here, the concept of the dual linear regression classification (DLRC) algorithm is described as follow. Compared with the awareness of human to choose the kernels [34–37], the strategy of Multiple kernel learning has been proposed in [38–40] and extensively incorporated in the training process of dimensionality reduction (DR) methods [41–45]. It typically works with multiple base kernels, and integrates into the domain of kernel matrices.
In this paper, we propose a novel multiple kernel dimensionality reduction (DR) method based on linear regression reconstruction mechanism for image set based classification. The proposed method aims to learn an optimal kernel automatically from the multiple base kernels and a projection transformation such that in the projected low dimensional subspace, the between-class reconstruction error denoted by the distance of the between-class reconstructed virtual samples is maximized and the within-class reconstruction error denoted by the distance of the within-class reconstructed virtual samples is minimized. Therefore, the compactness of reconstructed within-class virtual samples is enhanced, and between-class reconstructed virtual samples are better separated. This feature extraction scheme can best work with the corresponding classification strategy, which will naturally enhance the classification performance. By employing the method of trace ratio maximization, we also develop a framework to solve the resulting nonconvex optimization problem efficiently. Extensive experiments on benchmark image set datasets well demonstrate the effectiveness of the proposed method compared with other set based methods.
Interactive dimensionality reduction using similarity projections
2019, Knowledge-Based Systems
Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data exploration scenarios. In this work, we propose the interactive Similarity Projection (iSP), a novel interactive DR framework based on similarity embeddings, where we form a differentiable objective based on the user interactions and perform learning using gradient descent, with an end-to-end trainable architecture. Two interaction scenarios are evaluated. First, a common methodology in multidimensional projection is to project a subset of data, arrange them in classes or clusters, and project the rest unseen dataset based on that manipulation, in a kind of semi-supervised interpolation. We report results that outperform competitive baselines in a wide range of metrics and datasets. Second, we explore the scenario of manipulating some classes, while enriching the optimization with high-dimensional neighbor information. Apart from improving classification precision and clustering on images and text documents, the new emerging structure of the projection unveils semantic manifolds. For example, on the Head Pose dataset, by just dragging the faces looking far left to the left and those looking far right to the right, all faces are re-arranged on a continuum even on the vertical axis (face up and down). This end-to-end framework can be used for fast, visual semi-supervised learning, manifold exploration, interactive domain adaptation of neural embeddings and transfer learning.
Spectral regression based marginal Fisher analysis dimensionality reduction algorithm
2018, Neurocomputing
Citation Excerpt :
Recently, nonlinear dimensionality reduction is an active research subject in machine learning and pattern recognition [1–3]. A family of multiple kernel dimensionality reduction methods, such as MKL-DR [4], MKL-TR [5], MKL-SR [6] and MKL-SRTR [7], has been proposed to automatically construct new kernels using existing base kernels instead of using only one specific kernel. Since these methods can be unified under the graph embedding framework and be regarded as multiple kernel versions of linear discriminant analysis (LDA), which has the assumption that the distribution of each class is considered to be a unimodal Gaussian.
Traditional nonlinear dimensionality reduction methods, such as multiple kernel dimensionality reduction and nonlinear spectral regression (SR), are generally regarded as extended versions of linear discriminant analysis (LDA) in the supervised case. As is well known, LDA has the restrictive assumption that the data of each class is of a Gaussian distribution. Thus, the performance of these methods will be degraded if such an assumption is not hold. Although some methods based on marginal Fisher analysis are proposed to overcome the drawback of LDA, they have to solve the problem of dense metrics generalized eigenvalue decomposition, which is very time-consuming. To address these issues, in this paper, marginal Fisher analysis criterion based on extreme learning machine (ELM) is proposed to improve spectral regression and kernel marginal Fisher analysis. It is proved that the proposed marginal Fisher analysis is a special case of traditional kernel marginal Fisher analysis. Based on the proposed criterion, a novel supervised dimensionality reduction algorithm is presented by virtue of ELM and spectral regression. Experimental results on benchmark datasets validate that the proposed algorithm outperforms the state-of-the-art nonlinear dimensionality reduction methods in supervised scenarios.
Unsupervised feature selection based on the Morisita estimator of intrinsic dimension
2017, Knowledge-Based Systems
This paper deals with a new filter algorithm for selecting the smallest subset of features carrying all the information content of a dataset (i.e. for removing redundant features). It is an advanced version of the fractal dimension reduction technique, and it relies on the recently introduced Morisita estimator of Intrinsic Dimension (ID). Here, the ID is used to quantify dependencies between subsets of features, which allows the effective processing of highly non-linear data. The proposed algorithm is successfully tested on simulated and real world case studies. Different levels of sample size and noise are examined along with the variability of the results. In addition, a comprehensive procedure based on random forests shows that the data dimensionality is significantly reduced by the algorithm without loss of relevant information. And finally, comparisons with benchmark feature selection techniques demonstrate the promising performance of this new filter.

View all citing articles on Scopus

View full text

Multiple kernel dimensionality reduction via spectral regression and trace ratio maximization

Abstract

Introduction

Section snippets

Spectral regression algorithm

Dimensionality reduction with multiple kernels

Multiple kernel dimensionality reduction based on SR and trace ratio maximization

Experiments

Conclusion

Acknowledgments

Pattern Recogn.

Pattern Recogn.

Neural Networks

Pattern Recogn.

Inf. Sci.

Neurocomputing

Gaussian processes for object categorization

Int. J. Comput. Vision

Principal Component Analysis

A global geometric framework for nonlinear dimensionality reduction

Science

Nonlinear dimensionality reduction by locally linear embedding

Science

Laplacian eigenmaps and spectral techniques for embedding and clustering

Locality preserving projections

Proc Conf. Adv. Neural Inf. Process. Syst.