Linear dimensionality reduction based on Hybrid structure preserving projections
Introduction
Dimensionality reduction (DR), which is to discover succinct representations of high-dimensional samples, can effectively alleviate the “curse of dimensionality” and the overfitting problem of machine learning algorithms [1], [2]. And related to DR, the well-known guiding principle in model selection, minimum description length, stipulates that the model yielding the most compact representation should be preferred [3]. Therefore, to achieve more efficient and precise algorithms for automated pattern recognition and exploratory data analysis, especially exploring the high-dimensional applications such as biochemistry and drug design, DR technique has been intensive studied over the past decades and a series of significant research results have been achieved in theory and practice.
Strictly speaking, DR can be grouped into feature selection and feature extraction [2], [4]. Feature selection aims at selecting the best subset from the original set of features to eliminate less informative features and irrelevant features, such as the least absolute shrinkage and selection operator (Lasso) [5] and fast correlation-based filter (FCBF) [6]. While feature extraction is to transform the input features into a new reduced set of features, by creating new features based on transformations or combinations of the original features [2]. In this paper, we concentrate on feature extraction on account of its capacity for revealing the intrinsic dimensionality of the data.
A variety of linear or nonlinear approaches for feature extraction have been developed under unsupervised, supervised and semi-supervised scenarios. Wherein, the linear route is to conduct DR by the projection matrix computed from the training data, such as principal component analysis (PCA) [7], linear discriminant analysis (LDA) [1] and semi-supervised discriminant analysis (SDA) [8]. While the nonlinear method, which has more effective expressive power [9], maps the data into reduced space by nonlinear function usually learned by various fitting methods, such as multidimensional scaling (MDS) [10], locally linear embedding (LLE) [11], Isomap [12] and autoencoder [13]. In addition, the linear method can be extended to its nonlinear version by introducing the kernel trick, such as kernel principal component analysis (KPCA) [14] and kernel fisher discriminant (KFD) [15]. However, a recent research reveals that nonlinear methods perform well on selected artificial tasks but not on real-world tasks [16].
Compared with nonlinear method, linear technique has many merits: the reduced representation can be reaped by simple algebraic manipulations; the projection matrix can be used everywhere, either training data or test data; objective problem can be solved simply and efficiently. Accordingly, many nonlinear methods have been recast into related linearized versions. For instance, locality preserving projections (LPP) [17] is a linearized Laplacian Eigenmaps [18]; neighborhood preserving embedding (NPE) [19] and locally linear embedded Eigenspace analysis (LEA) [20] are two linearized versions of LLE; isometric projection (IsoProjection) [21] can be seen as a linearized Isomap.
As mentioned above, a lot of approaches, grouped into manifold learning, have been proposed for learning the underlying low-dimensional manifold. The advances of manifold learning manifest that holding the intrinsic structure of the high-dimensional data is of importance, when these samples are mapped into reduced space for various data analysis [22], [23]. In generally, the approaches of manifold learning aim at preserving either the global pairwise sample similarity, such as Isomap, or the locally geometric structure of the data, such as LLE into embedding subspace. Wherein, the locally geometric structure preserving has been used in numerous studies and is very effective for learning discriminative features (LDF) [17], [19], [24], based on the simple geometric intuitions that each sample and its neighbors close to a locally linear patch of the underlying manifold [11]. And the local structure is constructed by employing the traditional k-nearest neighbor (k-NN) strategy, such as LLE, LPP and NPE.
Besides manifold learning, sparse representation (SR) learned by sparse coding (Sc) recently also attracts considerable attentions in theory [25] and practice. Usually, high-dimensional signal contain relatively little information compared to its ambient dimensions, so it can be well approximated as a linear combination of just a few elements from an overcomplete dictionary [26], [27], [28], [29]. Note that actually the coefficient vector of the linear combination is just the SR of the corresponding sample. SR not only is a promising tool in image processing [30], [31], but also has been investigated for classification [32], [33] and DR [34], [35]. Especially, the sparse representation based classifier (SRC) [32] exploits directly all training samples as the dictionary to learn the SR of the test sample, and then classify the test sample to that category which gives the minimum reconstruction error. Subsequently, the sparsity preserving projections (SPP) is proposed to preserve the discriminative structure characterized by SR for LDF [34]. Meanwhile, -graph constructed by SR is presented to offer datum-adaptive neighborhood instead of selecting neighbors by using a distance metric [35]. Actually, the approach called sparse neighborhood preserving embedding (SNPE) [35] is identical to SPP. By introducing class information, the supervised versions of SNPE can be found in [36], [37]. In addition, other sparsity preserving methods are also put forth, such as the sparsity preserving discriminant analysis (SPDA) [38] and the Fast Fisher Sparsity Preserving Projections (FSPP) [39]. These related research results, exhibiting the high performance on classification in reduced space, suggest that keeping sparsity structure is also very helpful for LDF.
However, those DR methods above only consider a single characterization of data, either local or global structure, which is insufficient to represent the underlying structure of the real-world data [40]. Thus, recent studies try to integrate both the global and local structures of the data to pursuit higher performance, such as laplacian linear discriminant analysis (LapLDA) [40], locality preserving linear discriminant analysis (LocLDA) [41] and global and local structure preservation for feature selection (GLSPFS) [42]. These methods concentrate on collecting the discriminative information from global and local but fail to characterize a more discriminative structure. And to the best of our knowledge, no one integrates neighborhood structure and sparsity structure and provides an empirical insight into the relationship between them. Thus, we in this paper analyze the discrimination of neighborhood structure and sparsity structure and then present two combined methods for performance improvement.
More specifically, neighborhood structure is more discriminative in low-dimensional space while sparsity is more effective in high-dimensional space. Thus we first present the rough combined method, called combination method of SPP and NPE (CSNP), to seek a trade-off between the objectives of SPP and NPE. However, such traditional combination method is prone to degrade in practice. For arriving at our goal of integration, we solidify the both structures into a hybrid structure and then propose the Sparsity and Neighborhood Preserving Projections algorithm (SNPP) for LDF. In the end, we conduct extensive experiments to verify the effectiveness of the proposed approaches. Specifically, our main contributions of this paper can be summarized as follows.
- (1)
We in this paper analysis the relationship between the sparsity structure and the neighborhood structure of the data, and exploit their respective advantages to develop the stronger approach which supports our analysis by its experimental evidence. In addition, the resulting approach can be interpreted by the general graph embedding framework [43].
- (2)
We put forward the hybrid structure of the data, by combining the neighborhood structure and the sparsity structure, to characterize a more discriminative structure and thus results in the power features for classification. Noting that the defined hybrid representation (HR), related to the hybrid structure, is a multiple collaborative representations [33], [44] and analogously can be effective for classification.
- (3)
Utilizing the deep-level integration instead of the shallow trade-off, SNPP successfully avoids the degradation appearing in CSNP and suggests a new route for combined methods.
- (4)
According to our investigations, we draw the experimental conclusion: sparsity structure is more important for high-dimensional data while neighborhood structure is more important for low-dimensional data.
In addition, the proposed methods can be extended to supervised and semi-supervised versions through DR framework having been proposed in [43], [45]. The remainder of this paper is organized as follows. Section 2 reviews mainly related works. Our proposed algorithms are proposed in Section 3. In Section 4, experimental results on an artificial data set, data sets from UCI and a face data set are shown. Finally, we conclude the paper in Section 5.
Section snippets
Dimensionality reduction techniques with SR and NR
We in this section recall two representations, SR and NR, and two corresponding DR methods, SPP and NPE. Note that here SR and NR are used to characterize the existing relationship of the input data. The problem we consider is stated as follows: Given the data matrix , linear DR is to seek a transformation matrix and then map X to the reduced data matrix () by . Without loss of generality, we denote the projection vector by a,
Sparsity and neighborhood preserving projections
Note that we here concentrate on LDF by holding more discriminative information collected from original data into reduced subspace. And since both SR and NR are useful for LDF [19], [32], [34], [35], it is reasonable to take both of them into account for pursuing higher performance. Although some recent researches deem that sparsity is also a local structure [35], SR and NR indeed reflect the underlying structure of the data from different perspectives. Thus, we try to integrate them into a
Experiments
In this section, we verify our proposed algorithms on public real-world data sets and compare them with PCA, LPP, NPE and SPP. Note that we select these approaches for comparison on account of our concentrating on discussing the two ways for neighborhood selection and our approaches being linear and unsupervised. And our experiments are organized as follows. First, we map a toy data set and three UCI data sets [58] into the reduced space for insight into the interclass separation. Then we
Conclusions and future work
Recent research results show that both the sparsity and the neighborhood structure of the data are helpful for LDF. In this paper, we for taking into account both of them propose the two integrated DR methods, CSNP and SNPP. For removing the degradation of CSNP, we solidify the two structures into our defined hybrid structure and then preserve such hybrid structure into reduced space, resulting in the method SNPP. The introduced degrees of freedom not only make both SPP and NPE be the special
Acknowledgements
The authors would like to thank any anonymous reviewers for helpful suggestions.
Yupei Zhang received the B.Eng. degree in computer science and technology from East China Institute of Technology, Fuzhou, China, in 2009, and received the M.Eng. degree in computer software and theory from Zhengzhou University, Zhengzhou, China, in 2013. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include sparse representation, pattern recognition and machine learning.
References (60)
- et al.
Robust kernel isomap
Pattern Recognit.
(2007) - et al.
Sparse coding from a Bayesian perspective
IEEE Trans. Neural Netw. Learn. Syst.
(2013) - et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognit.
(2010) - et al.
Discriminant sparse neighborhood preserving embedding for face recognition
Pattern Recognit.
(2012) - et al.
Face recognition using discriminant sparsity neighborhood preserving embedding
Knowl. Based Syst.
(2012) - et al.
Sparsity preserving discriminant analysis for single training image face recognition
Pattern Recognit. Lett.
(2010) - et al.
Efficient linear discriminant analysis with locality preserving for face recognition, Pattern Recognit.
(2012) - et al.
Learning dictionary on manifolds for image classification
Pattern Recognit.
(2013) - et al.
The Elements of Statistical Learning
(2009) - et al.
Statistical pattern recognition: a review
IEEE Trans. Pattern Anal. Mach. Intell.
(2000)
Model selection and the principle of minimum description length
J. Am. Stat. Assoc.
Dimensionality reduction using genetic algorithms
IEEE Trans. Evol. Comput.
Regression shrinkage and selection via the lasso
J. R. Stat. Soc. Ser. B: Methodol.
Efficient feature selection via analysis of relevance and redundancy
J. Mach. Learn. Res.
Principal Component Analysis
Representation learning: a review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
Multidimensional Scaling
Nonlinear dimensionality reduction by locally linear embedding
Science
A global geometric framework for nonlinear dimensionality reduction
Science
Reducing the dimensionality of data with neural networks
Science
Dimensionality reduction: a comparative review
J. Mach. Learn. Res.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Locally Linear Embedded Eigenspace Analysis, IFP-TR
Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis
J. Mach. Learn. Res.
Cited by (25)
Multi-view clustering with Laplacian rank constraint based on symmetric and nonnegative low-rank representation
2023, Computer Vision and Image UnderstandingAdaptive sparse graph learning based dimensionality reduction for classification
2019, Applied Soft Computing JournalHierarchical sparse coding from a Bayesian perspective
2018, NeurocomputingCitation Excerpt :In the case of fixed dictionary, it is usually referred to as sparse coding that we consider in this paper. As the kernel of sparse modeling, sparse coding not only delivers a concise representation for data explanation [7] and analysis [8–10], but also yields a succinct linear model for data reconstruction [11] and prediction [12]. Therefore, sparse coding attracts an enormous amount of studies in recent years and has achieved many appealing results in computer vision [13], pattern analysis [8], compressed sensing [14] and images retrieval [15].
Low-rank preserving embedding
2017, Pattern RecognitionCitation Excerpt :In this paper, we focus on unsupervised linear DR due to its well generalization and extensibility. Compared with nonlinear DR, linear DR methods have many merits: objective problem can be solved simply and efficiently; projection matrix can be used everywhere, either training data or test data; reduced representation can be achieved using simple algebraic manipulations [15]. Thus, many nonlinear methods based on manifold assumption have been recast into linearized versions.
Graph regularized nonnegative sparse coding using incoherent dictionary for approximate nearest neighbor search
2017, Pattern RecognitionCitation Excerpt :Nearest neighbor retrieval aims to, from a large-scale data set, find a data subset that is most similar to a query sample. It is always a fundamental component in a wide range applications, including dimensionality reduction [1], pattern classification [2] and image retrieval [3,4]. Nowadays high-dimensional data collection, however, results in expensive computational cost and deteriorative performance of traditional routes due to the curse of dimensionality [5].
Sparse multiple maximum scatter difference for dimensionality reduction
2017, Digital Signal Processing: A Review Journal
Yupei Zhang received the B.Eng. degree in computer science and technology from East China Institute of Technology, Fuzhou, China, in 2009, and received the M.Eng. degree in computer software and theory from Zhengzhou University, Zhengzhou, China, in 2013. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include sparse representation, pattern recognition and machine learning.
Ming Xiang received the B.Eng. and Ph.D. degrees from Northwestern Polytechnical University, Xi’an, China, in 1987 and 1999, respectively, and currently works as an associate professor in the department of computer science and technology in Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include information fusion, pattern recognition and machine learning.
Bo Yang received the B.Eng. degree in computer science and technology from Xi’an University of Posts & Telecommunication, Xi’an, China, in 2005, and received the M.Eng. degree in computer system architecture from Xidian University, Xi’an, China, in 2009. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include manifold learning, pattern recognition and machine learning.