Supervised discriminant Isomap with maximum margin graph regularization for dimensionality reduction

doi:10.1016/j.eswa.2021.115055

Expert Systems with Applications

Volume 180, 15 October 2021, 115055

https://doi.org/10.1016/j.eswa.2021.115055 Get rights and content

Highlights

•
Two novel DR methods are proposed to extract critical discriminative information.
•
Neighborhood size parameters of DR models are decided adaptively via data similarity.
•
Outside data can be projected directly through SD-IsoP that prevents over-fitting.
•
Margins between classes in reduced space can be maximized by proposed methods.

Abstract

As one of the most popular nonlinear dimensionality reduction methods, Isomap has been widely used in pattern recognition and machine learning. However, Isomap has the following problems: (1) Isomap is an unsupervised dimensionality reduction method, it cannot use class label information to obtain discriminative low dimensional embedding for classification; (2) The embedding performance of Isomap is sensitive to neighborhood size parameter; (3) Isomap cannot deal with outside new data by direct embedding. In this paper, a novel dimensionality reduction method called supervised discriminant Isomap is proposed to solve the first two problems mentioned above. Specifically, first, raw data points are partitioned into different manifolds by using their class label information. Then, supervised discriminant Isomap aims at seeking an optimal nonlinear subspace to preserve the geometrical structure of each manifold according to the Isomap criterion, and to enhance the discriminating capability by maximizing the distances between data points of different classes and the maximum margin graph regularization term. Finally, the corresponding optimization problems are solved by using eigen-decomposition algorithm. Further, we extend supervised discriminant Isomap to a linear dimensionality reduction method called supervised discriminant Isomap projection for handling the above three problems. Moreover, our approaches have three important characteristics: (1) Proposed methods adaptively estimate the local neighborhood surrounding each sample based on data density and similarity; (2) The objective functions of proposed methods can maximize margins between the each classes in the dimension-reduced feature space; (3) The objective functions of proposed methods have closed-form solutions. Furthermore, our methods can capture more discriminative information from raw data than other Isomap based methods. Extensive experiments on nine data sets demonstrate that the proposed methods are superior to the related state-of-the-art methods.

Graphical abstract

Introduction

Dimensionality reduction (DR) method plays an increasingly important role in science, engineering and physics applications. It aims at computing relevant low-dimensional representations of high dimensional data sets to foil the curse of dimensionality (Lee & Verleysen, 2011). DR methods can be considered as a preprocessing step in pattern recognition tasks and have a wide range of applications, including recommender systems (Nilashi, Ibrahim, & Bagherifard, 2018), stock trading (Han & Ge, 2020), data visualization and classification (Raducanu and Dornaika, 2012, Rajabzadeh et al., 2021), and many others.

In recent years, many methods have been proposed to deal with DR problem. Principal component analysis (PCA) (Jolliffe, 1986), multidimensional scaling (MDS) (Borg & Groenen, 2005) and linear discriminant analysis (LDA) (Fisher, 1936, Fukunaga, 1991) are three commonly used linear DR methods. PCA projects the high-dimensional vectors by maximizing the variance or dot product preservation. MDS is closely related to principal component analysis. Different from PCA and MDS, LDA can use prior information to project the feature space into a smaller subspace, while maintaining the information of distinguishing categories. However, most of the data sets distribution in the real-world is nonlinear (Geng, Zhan, & Zhou, 2005). Therefore, various nonlinear dimensionality reduction methods have been proposed by scholars. As three early nonlinear DR methods, Isometric Mapping (Isomap) (Balasubramanian, Schwartz, Tenenbaum, Silva, & Langford, 2002), local linear embedding (LLE) (Roweis & Saul, 2000) and Laplacian Eigenmap (LE) (Belkin & Niyogi, 2003) were designed to discover the geometric properties of data. Specifically, Isomap aims at obtaining nonlinear embedding by preserving the geodesic distance of all similarity pairs, while both LLE and LE try to preserve the local neighborhood information of each object as much as possible for discovering the low-dimensional subspace. Kernel principal component analysis (KPCA) is a combination of conventional PCA and kernel trick, which has been widely used in practical applications (Schölkopf, Smola, & Müller, 1998). Beyond that, other nonlinear dimensionality reduction methods including maximum variance unfolding (Weinberger & Saul, 2006), diffusion maps (Lafon & Lee, 2006), t-distributed stochastic neighbor embedding (T-SNE) (Laurens & Hinton, 2008), and Local distances preserving based manifold learning (SLDP) (Hajizadeh, Aghagolzadeh, & Ezoji, 2020) also are widely used in various practical applications (Ayesha, Hanif, & Talib, 2020).

Isomap can reveal the intrinsic geometric structure of manifold by preserving geodesic distance. It has shown a remarkable performance for nonlinear DR in various research domains (Ayesha et al., 2020, Zheng et al., 2017). However, the original Isomap is an unsupervised method, it is not good at extracting discriminative features by using prior information for classification. To deal with this problem, supervised Isomap (S-Isomap) (Geng et al., 2005), marginal Isomap (M-Isomap) (Zhang, Chow, & Zhao, 2013) and multi-manifold discriminant Isomap (MMD-Isomap) (Yang, Xiang, & Zhang, 2016) were proposed. S-Isomap uses two parameters to define a new distance metric among the pairwise points with the same and different class labels. M-Isomap tends to offer certain advantages over S-Isomap, due to it incorporates the local pairwise constraints into Isomap to guide the discriminant manifold learning. MMD-Isomap is extension of the M-Isomap, it uses global pairwise constraints to preserve the geometrical structure of each manifold. To solve the problem that Isomap fails to deal with outside new data, some techniques are proposed. Isometric Projection (Isoprojection) (Cai, He, & Han, 2007) constructed a weighted data graph where the weights are discrete approximations of the geodesic distances on the data manifold, which explicitly takes into account the manifold structure. Both S-Isomap and MMD-Isomap handle this problem by using the back-propagation (BP) neural networks.

Schmidhuber (2015) to calculate a deterministic mapping via an independent step after the manifold feature learning process.

Although some conventional methods show their superiority in the real-world applications, most of them still have three common problems. First, the most popular graph construction manner is based on the k-nearest neighbor (k-NN) or $∊$ -ball neighborhood criteria. k and $∊$ are user-defined parameters that should be fixed in advance. It is well known that the choice of these parameters can affect the performance of the embedding. Thus, most of Isomap based methods may suffer from the restricted applications. Second, most of Isomap based methods handle the outside new data by using two steps. For example, MMD-Isomap learns a mapping by using an independent process after the manifold feature learning stage. This strategy cannot explicitly ensure the mapping is optimal for data representation. Third, most of supervised Isomap based methods are not so powerful for preserving discriminative information for classification. This also indicates that the learned projections of these methods cannot preserve more discriminant information.

In this paper, we propose a novel supervised nonlinear DR method called supervised discriminant Isomap (SD-Isomap) to solve above problems. To adaptively select the size of neighborhood, two subsets can be obtained by using class label information with the same strategy in Raducanu and Dornaika (2012). Then, SD-Isomap aims at seeking an optimal nonlinear subspace to preserve the geometrical structure of each manifold according to the Isomap criterion, meanwhile, to enhance the discriminating capability by maximizing the distances between data points of different manifolds. To make SD-Isomap capture more discriminative information from data, a maximum margin graph regularization term is constructed. To make SD-Isomap obtain an explicit mapping, supervised discriminant Isomap projection (SD-IsoP) is proposed. It projects the original data to a new space of lower dimensionality by using a unified learning framework. Therefore, Our methods can capture more discriminative information from data compared to other Isomap based methods. Experimental results on nine real-world data sets illustrate the effectiveness of the proposed methods. Based on the above, the main contributions of this paper are summarized as follows:

1.
SD-Isomap and SD-IsoP can maximize the margins between classes in thedimension-reduced feature space and enjoy closed-form solutions.
2.
SD-IsoP learn an explicit mapping by using a unified learning framework, so that the learned projection can handle outside data efficiently by direct embedding.
3.
Proposed methods can adaptively estimate the local neighborhood surrounding each sample based on data density and similarity.
4.
Proposed methods can capture more discriminative information from high dimensional data than other Isomap based methods.

The rest of this paper is organized as follows. In Section 2, we review some related work about our proposed methods. SD-Isomap and SD-IsoP are proposed in Section 3. Furthermore, we provide a complexity analysis for the proposed methods in the Section 4. Finally, several experiments are conducted to prove the effectiveness of the proposed methods in Section 5.

Section snippets

Related work

In this section, we provide a review on the related work, including Isomap and some supervised versions of Isomap. Given a data matrix $X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{M \times N}$ , where M is the number of features and N is the number of data points. $l (x_{i}) \in {1, 2, \dots, c}$ is the class label of $x_{i}, i = 1, 2, \dots, N$ , and c is the number of classes. The goal of DR is to find a set of embedded data points $Y = [y_{1}, y_{2}, \dots, y_{N}] \in R^{m \times N}$ , where $m < M$ . I is a $N \times N$ identity matrix, $I_{m}$ is a $m \times m$ identity matrix, and $I_{M}$ is a $M \times M$ identity matrix. For a

Proposed method

Isomap aims to discover the intrinsic geometric structure of manifold by preserving geodesic distance of all similarity pairs for delivering highly nonlinear manifolds. However, it has some obvious drawbacks. First, Isomap is an unsupervised DR method, it cannot use class label information to obtain discriminative low-dimensional embedding for classification. Second, Isomap applies k-NN or $∊$ -ball neighborhood criteria to construct graph. It is well known that the choice of these parameters can

Computational time complexity

In this paper, both SD-Isomap and SD-IsoP need to compute geodesic distance metrics. The computational complexity is $O (N^{3})$ when Floyd’s algorithm is used, and it can be improved to $O ({kN}^{2} logN)$ when Dijkstra’s algorithm is used, where N and k are the sample size and neighborhood size, respectively. Then, eigen-decomposition is applied to obtain the lower-dimensional embedding Y and projection matrix P. The time complexity is $O (N^{3})$ . Therefore, both SD-Isomap and SD-IsoP have a time complexity of $O ($

Experiments

In this section, nine real-world data sets are employed to validate the performance of the proposed method. They contain five image types and four text types data sets, which are listed in Table 1. We experimentally evaluate our proposed methods with ten existing dimensionality reduction methods, including LE (Belkin & Niyogi, 2003), T-SNE (Laurens & Hinton, 2008), KPCA (Schölkopf et al., 1998), Isomap (Tenenbaum, de Silva, & Langford, 2000), L-Isomap (Silva & Tenenbaum, 2002), MMD-Isomap (Yang

Conclusion

In this paper, we proposed a novel nonlinear dimensionality reduction methods, namely SD-Isomap. To make SD-Isomap obtain an explicit mapping for the outside new data, SD-IsoP was proposed. Both of them learn data by optimizing a newly designed and efficient objective functions. Moreover, the proposed objective functions enjoy several desirable properties: (1) adaptively estimating the local neighborhood surrounding each sample; (2) maximizing margins between the each classes of data; (3)

CRediT authorship contribution statement

Hongchun Qu: Conceptualization, Methodology. Lin Li: Methodology, Software, Writing - original draft. Zhaoni Li: Data curation. Jian Zheng: Software.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors express their thanks to the anonymous reviewers and the editors for their constructive comments and suggestions, which greatly improve this paper. This work was supported by the National Natural Science Foundation of China (61871061) to HQ and Project of Science and Technology Department of Qinghai Province (2019-NN-161) to ZL. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have

References (39)

Abhishek et al.
Optimal manifold neighborhood and kernel width for robust non-linear dimensionality reduction
Knowledge-Based Systems
(2019)
S. Ayesha et al.
Overview and comparative study of dimensionality reduction techniques for high dimensional data
Information Fusion
(2020)
I. Dhillon et al.
Class visualization of high-dimensional data with applications
Computational Statistics and Data Analysis
(2002)
R. Hajizadeh et al.
Local distances preserving based manifold learning
Expert Systems with Applications
(2020)
J. Han et al.
Effect of dimensionality reduction on stock selection with cluster analysis in different market situations
Expert Systems with Applications
(2020)
S. Hougardy
The floyd-warshall algorithm on graphs with negative cycles
Information Processing Letters
(2010)
E. Kaya et al.
Debohid: A differential evolution based oversampling approach for highly imbalanced datasets
Expert Systems with Applications
(2021)
M. Nilashi et al.
A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques
Expert Systems with Applications
(2018)
B. Raducanu et al.
A supervised non-linear dimensionality reduction approach for manifold learning
Pattern Recognition
(2012)
H. Rajabzadeh et al.
Supervised discriminative dimensionality reduction by learning multiple transformation operators
Expert Systems with Applications
(2021)

J. Schmidhuber

Deep learning in neural networks: An overview

Neural Network

(2015)

B. Yang et al.

Multi-manifold discriminant isomap for visualization and classification

Pattern Recognition

(2016)

Y. Zhang et al.

Semi-supervised local multi-manifold isomap by linear embedding for feature extraction

Pattern Recognition

(2018)

M. Balasubramanian et al.

The isomap algorithm and topological stability

Science

(2002)

P.N. Belhumeur et al.

Eigenfaces vs.isherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1997)

M. Belkin et al.

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Computation

(2003)

Blake, C.L., & Merz, C.J. (1998). Uci repository of machine learning databases. http://www.ics.uci.edu/...

I. Borg et al.

Modern Multidimensional Scaling: Theory and Applications

(2005)

D. Cai et al.

Isometric projection

Proceedings of the National Conference on Artificial Intelligence

(2007)

Cited by (0)

View full text

Supervised discriminant Isomap with maximum margin graph regularization for dimensionality reduction

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Proposed method

Computational time complexity

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Knowledge-Based Systems

Information Fusion

Computational Statistics and Data Analysis

Expert Systems with Applications

Expert Systems with Applications

Information Processing Letters

Expert Systems with Applications

Expert Systems with Applications

Pattern Recognition

Expert Systems with Applications

Neural Network

Pattern Recognition

Pattern Recognition

The isomap algorithm and topological stability

Science

Eigenfaces vs.isherfaces: recognition using class specific linear projection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Computation

Modern Multidimensional Scaling: Theory and Applications

Isometric projection

Proceedings of the National Conference on Artificial Intelligence