Supervised discriminant Isomap with maximum margin graph regularization for dimensionality reduction
Graphical abstract
Introduction
Dimensionality reduction (DR) method plays an increasingly important role in science, engineering and physics applications. It aims at computing relevant low-dimensional representations of high dimensional data sets to foil the curse of dimensionality (Lee & Verleysen, 2011). DR methods can be considered as a preprocessing step in pattern recognition tasks and have a wide range of applications, including recommender systems (Nilashi, Ibrahim, & Bagherifard, 2018), stock trading (Han & Ge, 2020), data visualization and classification (Raducanu and Dornaika, 2012, Rajabzadeh et al., 2021), and many others.
In recent years, many methods have been proposed to deal with DR problem. Principal component analysis (PCA) (Jolliffe, 1986), multidimensional scaling (MDS) (Borg & Groenen, 2005) and linear discriminant analysis (LDA) (Fisher, 1936, Fukunaga, 1991) are three commonly used linear DR methods. PCA projects the high-dimensional vectors by maximizing the variance or dot product preservation. MDS is closely related to principal component analysis. Different from PCA and MDS, LDA can use prior information to project the feature space into a smaller subspace, while maintaining the information of distinguishing categories. However, most of the data sets distribution in the real-world is nonlinear (Geng, Zhan, & Zhou, 2005). Therefore, various nonlinear dimensionality reduction methods have been proposed by scholars. As three early nonlinear DR methods, Isometric Mapping (Isomap) (Balasubramanian, Schwartz, Tenenbaum, Silva, & Langford, 2002), local linear embedding (LLE) (Roweis & Saul, 2000) and Laplacian Eigenmap (LE) (Belkin & Niyogi, 2003) were designed to discover the geometric properties of data. Specifically, Isomap aims at obtaining nonlinear embedding by preserving the geodesic distance of all similarity pairs, while both LLE and LE try to preserve the local neighborhood information of each object as much as possible for discovering the low-dimensional subspace. Kernel principal component analysis (KPCA) is a combination of conventional PCA and kernel trick, which has been widely used in practical applications (Schölkopf, Smola, & Müller, 1998). Beyond that, other nonlinear dimensionality reduction methods including maximum variance unfolding (Weinberger & Saul, 2006), diffusion maps (Lafon & Lee, 2006), t-distributed stochastic neighbor embedding (T-SNE) (Laurens & Hinton, 2008), and Local distances preserving based manifold learning (SLDP) (Hajizadeh, Aghagolzadeh, & Ezoji, 2020) also are widely used in various practical applications (Ayesha, Hanif, & Talib, 2020).
Isomap can reveal the intrinsic geometric structure of manifold by preserving geodesic distance. It has shown a remarkable performance for nonlinear DR in various research domains (Ayesha et al., 2020, Zheng et al., 2017). However, the original Isomap is an unsupervised method, it is not good at extracting discriminative features by using prior information for classification. To deal with this problem, supervised Isomap (S-Isomap) (Geng et al., 2005), marginal Isomap (M-Isomap) (Zhang, Chow, & Zhao, 2013) and multi-manifold discriminant Isomap (MMD-Isomap) (Yang, Xiang, & Zhang, 2016) were proposed. S-Isomap uses two parameters to define a new distance metric among the pairwise points with the same and different class labels. M-Isomap tends to offer certain advantages over S-Isomap, due to it incorporates the local pairwise constraints into Isomap to guide the discriminant manifold learning. MMD-Isomap is extension of the M-Isomap, it uses global pairwise constraints to preserve the geometrical structure of each manifold. To solve the problem that Isomap fails to deal with outside new data, some techniques are proposed. Isometric Projection (Isoprojection) (Cai, He, & Han, 2007) constructed a weighted data graph where the weights are discrete approximations of the geodesic distances on the data manifold, which explicitly takes into account the manifold structure. Both S-Isomap and MMD-Isomap handle this problem by using the back-propagation (BP) neural networks.
Schmidhuber (2015) to calculate a deterministic mapping via an independent step after the manifold feature learning process.
Although some conventional methods show their superiority in the real-world applications, most of them still have three common problems. First, the most popular graph construction manner is based on the k-nearest neighbor (k-NN) or -ball neighborhood criteria. k and are user-defined parameters that should be fixed in advance. It is well known that the choice of these parameters can affect the performance of the embedding. Thus, most of Isomap based methods may suffer from the restricted applications. Second, most of Isomap based methods handle the outside new data by using two steps. For example, MMD-Isomap learns a mapping by using an independent process after the manifold feature learning stage. This strategy cannot explicitly ensure the mapping is optimal for data representation. Third, most of supervised Isomap based methods are not so powerful for preserving discriminative information for classification. This also indicates that the learned projections of these methods cannot preserve more discriminant information.
In this paper, we propose a novel supervised nonlinear DR method called supervised discriminant Isomap (SD-Isomap) to solve above problems. To adaptively select the size of neighborhood, two subsets can be obtained by using class label information with the same strategy in Raducanu and Dornaika (2012). Then, SD-Isomap aims at seeking an optimal nonlinear subspace to preserve the geometrical structure of each manifold according to the Isomap criterion, meanwhile, to enhance the discriminating capability by maximizing the distances between data points of different manifolds. To make SD-Isomap capture more discriminative information from data, a maximum margin graph regularization term is constructed. To make SD-Isomap obtain an explicit mapping, supervised discriminant Isomap projection (SD-IsoP) is proposed. It projects the original data to a new space of lower dimensionality by using a unified learning framework. Therefore, Our methods can capture more discriminative information from data compared to other Isomap based methods. Experimental results on nine real-world data sets illustrate the effectiveness of the proposed methods. Based on the above, the main contributions of this paper are summarized as follows:
- 1.
SD-Isomap and SD-IsoP can maximize the margins between classes in thedimension-reduced feature space and enjoy closed-form solutions.
- 2.
SD-IsoP learn an explicit mapping by using a unified learning framework, so that the learned projection can handle outside data efficiently by direct embedding.
- 3.
Proposed methods can adaptively estimate the local neighborhood surrounding each sample based on data density and similarity.
- 4.
Proposed methods can capture more discriminative information from high dimensional data than other Isomap based methods.
Section snippets
Related work
In this section, we provide a review on the related work, including Isomap and some supervised versions of Isomap. Given a data matrix , where M is the number of features and N is the number of data points. is the class label of , and c is the number of classes. The goal of DR is to find a set of embedded data points , where . I is a identity matrix, is a identity matrix, and is a identity matrix. For a
Proposed method
Isomap aims to discover the intrinsic geometric structure of manifold by preserving geodesic distance of all similarity pairs for delivering highly nonlinear manifolds. However, it has some obvious drawbacks. First, Isomap is an unsupervised DR method, it cannot use class label information to obtain discriminative low-dimensional embedding for classification. Second, Isomap applies k-NN or -ball neighborhood criteria to construct graph. It is well known that the choice of these parameters can
Computational time complexity
In this paper, both SD-Isomap and SD-IsoP need to compute geodesic distance metrics. The computational complexity is when Floyd’s algorithm is used, and it can be improved to when Dijkstra’s algorithm is used, where N and k are the sample size and neighborhood size, respectively. Then, eigen-decomposition is applied to obtain the lower-dimensional embedding Y and projection matrix P. The time complexity is . Therefore, both SD-Isomap and SD-IsoP have a time complexity of
Experiments
In this section, nine real-world data sets are employed to validate the performance of the proposed method. They contain five image types and four text types data sets, which are listed in Table 1. We experimentally evaluate our proposed methods with ten existing dimensionality reduction methods, including LE (Belkin & Niyogi, 2003), T-SNE (Laurens & Hinton, 2008), KPCA (Schölkopf et al., 1998), Isomap (Tenenbaum, de Silva, & Langford, 2000), L-Isomap (Silva & Tenenbaum, 2002), MMD-Isomap (Yang
Conclusion
In this paper, we proposed a novel nonlinear dimensionality reduction methods, namely SD-Isomap. To make SD-Isomap obtain an explicit mapping for the outside new data, SD-IsoP was proposed. Both of them learn data by optimizing a newly designed and efficient objective functions. Moreover, the proposed objective functions enjoy several desirable properties: (1) adaptively estimating the local neighborhood surrounding each sample; (2) maximizing margins between the each classes of data; (3)
CRediT authorship contribution statement
Hongchun Qu: Conceptualization, Methodology. Lin Li: Methodology, Software, Writing - original draft. Zhaoni Li: Data curation. Jian Zheng: Software.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors express their thanks to the anonymous reviewers and the editors for their constructive comments and suggestions, which greatly improve this paper. This work was supported by the National Natural Science Foundation of China (61871061) to HQ and Project of Science and Technology Department of Qinghai Province (2019-NN-161) to ZL. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have
References (39)
- et al.
Optimal manifold neighborhood and kernel width for robust non-linear dimensionality reduction
Knowledge-Based Systems
(2019) - et al.
Overview and comparative study of dimensionality reduction techniques for high dimensional data
Information Fusion
(2020) - et al.
Class visualization of high-dimensional data with applications
Computational Statistics and Data Analysis
(2002) - et al.
Local distances preserving based manifold learning
Expert Systems with Applications
(2020) - et al.
Effect of dimensionality reduction on stock selection with cluster analysis in different market situations
Expert Systems with Applications
(2020) The floyd-warshall algorithm on graphs with negative cycles
Information Processing Letters
(2010)- et al.
Debohid: A differential evolution based oversampling approach for highly imbalanced datasets
Expert Systems with Applications
(2021) - et al.
A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques
Expert Systems with Applications
(2018) - et al.
A supervised non-linear dimensionality reduction approach for manifold learning
Pattern Recognition
(2012) - et al.
Supervised discriminative dimensionality reduction by learning multiple transformation operators
Expert Systems with Applications
(2021)