Elsevier

Neurocomputing

Volume 173, Part 3, 15 January 2016, Pages 518-529
Neurocomputing

Linear dimensionality reduction based on Hybrid structure preserving projections

https://doi.org/10.1016/j.neucom.2015.07.011Get rights and content

Highlights

  • Construct a hybrid representation to characterize data structure.

  • Perform dimensionality reduction by preserving a hybrid structure.

  • Explicitly discuss the sparsity and the neighborhood depending on datasets.

  • Take multiple structures into account for better algorithm performance.

Abstract

Recent advances have shown the methods based on local structure preserving projections can effectively learn discriminative features. The two attractive approaches for characterizing such data structure are: the classical nearest neighbor strategy for neighborhood structure and the sparse coding algorithm for sparsity structure. Motivated by the intuitive analysis of the relationship between the two structures, in this paper, we take both of them into account and propose two integrated approaches for dimensionality reduction. Concretely, we for achieving improvement directly integrate two available objectives, utilizing neighborhood structure and based on sparsity structure, to construct the combined method, briefly called CSNP. However, such rough strategy often results in its degradation in practice. Instead of the superficial combination, we exploit a hybrid structure by intergrading the two structures and then propose the Sparsity and Neighborhood Preserving Projections, dubbed SNPP, by preserving the hybrid structure into reduced subspace. The resulting optimization problems can be also interpreted as an instance of the general graph embedding framework and can reduce to the generalized eigenvalue decomposition problem. Finally, we conduct extensive experiments on publicly available data sets to verify the efficacy of our algorithms. From the experimental results, we roughly draw the conclusion that neighborhood structure is more important for low-dimensional data while sparsity structure is more useful for high-dimensional data.

Introduction

Dimensionality reduction (DR), which is to discover succinct representations of high-dimensional samples, can effectively alleviate the “curse of dimensionality” and the overfitting problem of machine learning algorithms [1], [2]. And related to DR, the well-known guiding principle in model selection, minimum description length, stipulates that the model yielding the most compact representation should be preferred [3]. Therefore, to achieve more efficient and precise algorithms for automated pattern recognition and exploratory data analysis, especially exploring the high-dimensional applications such as biochemistry and drug design, DR technique has been intensive studied over the past decades and a series of significant research results have been achieved in theory and practice.

Strictly speaking, DR can be grouped into feature selection and feature extraction [2], [4]. Feature selection aims at selecting the best subset from the original set of features to eliminate less informative features and irrelevant features, such as the least absolute shrinkage and selection operator (Lasso) [5] and fast correlation-based filter (FCBF) [6]. While feature extraction is to transform the input features into a new reduced set of features, by creating new features based on transformations or combinations of the original features [2]. In this paper, we concentrate on feature extraction on account of its capacity for revealing the intrinsic dimensionality of the data.

A variety of linear or nonlinear approaches for feature extraction have been developed under unsupervised, supervised and semi-supervised scenarios. Wherein, the linear route is to conduct DR by the projection matrix computed from the training data, such as principal component analysis (PCA) [7], linear discriminant analysis (LDA) [1] and semi-supervised discriminant analysis (SDA) [8]. While the nonlinear method, which has more effective expressive power [9], maps the data into reduced space by nonlinear function usually learned by various fitting methods, such as multidimensional scaling (MDS) [10], locally linear embedding (LLE) [11], Isomap [12] and autoencoder [13]. In addition, the linear method can be extended to its nonlinear version by introducing the kernel trick, such as kernel principal component analysis (KPCA) [14] and kernel fisher discriminant (KFD) [15]. However, a recent research reveals that nonlinear methods perform well on selected artificial tasks but not on real-world tasks [16].

Compared with nonlinear method, linear technique has many merits: the reduced representation can be reaped by simple algebraic manipulations; the projection matrix can be used everywhere, either training data or test data; objective problem can be solved simply and efficiently. Accordingly, many nonlinear methods have been recast into related linearized versions. For instance, locality preserving projections (LPP) [17] is a linearized Laplacian Eigenmaps [18]; neighborhood preserving embedding (NPE) [19] and locally linear embedded Eigenspace analysis (LEA) [20] are two linearized versions of LLE; isometric projection (IsoProjection) [21] can be seen as a linearized Isomap.

As mentioned above, a lot of approaches, grouped into manifold learning, have been proposed for learning the underlying low-dimensional manifold. The advances of manifold learning manifest that holding the intrinsic structure of the high-dimensional data is of importance, when these samples are mapped into reduced space for various data analysis [22], [23]. In generally, the approaches of manifold learning aim at preserving either the global pairwise sample similarity, such as Isomap, or the locally geometric structure of the data, such as LLE into embedding subspace. Wherein, the locally geometric structure preserving has been used in numerous studies and is very effective for learning discriminative features (LDF) [17], [19], [24], based on the simple geometric intuitions that each sample and its neighbors close to a locally linear patch of the underlying manifold [11]. And the local structure is constructed by employing the traditional k-nearest neighbor (k-NN) strategy, such as LLE, LPP and NPE.

Besides manifold learning, sparse representation (SR) learned by sparse coding (Sc) recently also attracts considerable attentions in theory [25] and practice. Usually, high-dimensional signal contain relatively little information compared to its ambient dimensions, so it can be well approximated as a linear combination of just a few elements from an overcomplete dictionary [26], [27], [28], [29]. Note that actually the coefficient vector of the linear combination is just the SR of the corresponding sample. SR not only is a promising tool in image processing [30], [31], but also has been investigated for classification [32], [33] and DR [34], [35]. Especially, the sparse representation based classifier (SRC) [32] exploits directly all training samples as the dictionary to learn the SR of the test sample, and then classify the test sample to that category which gives the minimum reconstruction error. Subsequently, the sparsity preserving projections (SPP) is proposed to preserve the discriminative structure characterized by SR for LDF [34]. Meanwhile, 1-graph constructed by SR is presented to offer datum-adaptive neighborhood instead of selecting neighbors by using a distance metric [35]. Actually, the approach called sparse neighborhood preserving embedding (SNPE) [35] is identical to SPP. By introducing class information, the supervised versions of SNPE can be found in [36], [37]. In addition, other sparsity preserving methods are also put forth, such as the sparsity preserving discriminant analysis (SPDA) [38] and the Fast Fisher Sparsity Preserving Projections (FSPP) [39]. These related research results, exhibiting the high performance on classification in reduced space, suggest that keeping sparsity structure is also very helpful for LDF.

However, those DR methods above only consider a single characterization of data, either local or global structure, which is insufficient to represent the underlying structure of the real-world data [40]. Thus, recent studies try to integrate both the global and local structures of the data to pursuit higher performance, such as laplacian linear discriminant analysis (LapLDA) [40], locality preserving linear discriminant analysis (LocLDA) [41] and global and local structure preservation for feature selection (GLSPFS) [42]. These methods concentrate on collecting the discriminative information from global and local but fail to characterize a more discriminative structure. And to the best of our knowledge, no one integrates neighborhood structure and sparsity structure and provides an empirical insight into the relationship between them. Thus, we in this paper analyze the discrimination of neighborhood structure and sparsity structure and then present two combined methods for performance improvement.

More specifically, neighborhood structure is more discriminative in low-dimensional space while sparsity is more effective in high-dimensional space. Thus we first present the rough combined method, called combination method of SPP and NPE (CSNP), to seek a trade-off between the objectives of SPP and NPE. However, such traditional combination method is prone to degrade in practice. For arriving at our goal of integration, we solidify the both structures into a hybrid structure and then propose the Sparsity and Neighborhood Preserving Projections algorithm (SNPP) for LDF. In the end, we conduct extensive experiments to verify the effectiveness of the proposed approaches. Specifically, our main contributions of this paper can be summarized as follows.

  • (1)

    We in this paper analysis the relationship between the sparsity structure and the neighborhood structure of the data, and exploit their respective advantages to develop the stronger approach which supports our analysis by its experimental evidence. In addition, the resulting approach can be interpreted by the general graph embedding framework [43].

  • (2)

    We put forward the hybrid structure of the data, by combining the neighborhood structure and the sparsity structure, to characterize a more discriminative structure and thus results in the power features for classification. Noting that the defined hybrid representation (HR), related to the hybrid structure, is a multiple collaborative representations [33], [44] and analogously can be effective for classification.

  • (3)

    Utilizing the deep-level integration instead of the shallow trade-off, SNPP successfully avoids the degradation appearing in CSNP and suggests a new route for combined methods.

  • (4)

    According to our investigations, we draw the experimental conclusion: sparsity structure is more important for high-dimensional data while neighborhood structure is more important for low-dimensional data.

In addition, the proposed methods can be extended to supervised and semi-supervised versions through DR framework having been proposed in [43], [45]. The remainder of this paper is organized as follows. Section 2 reviews mainly related works. Our proposed algorithms are proposed in Section 3. In Section 4, experimental results on an artificial data set, data sets from UCI and a face data set are shown. Finally, we conclude the paper in Section 5.

Section snippets

Dimensionality reduction techniques with SR and NR

We in this section recall two representations, SR and NR, and two corresponding DR methods, SPP and NPE. Note that here SR and NR are used to characterize the existing relationship of the input data. The problem we consider is stated as follows: Given the data matrix X=[x1,x2,,xn]m×n, linear DR is to seek a transformation matrix A=[a1,a2,,al]m×l and then map X to the reduced data matrix Y=[y1,y2,,yn]l×n (l<m) by Y=ATX. Without loss of generality, we denote the projection vector by a,

Sparsity and neighborhood preserving projections

Note that we here concentrate on LDF by holding more discriminative information collected from original data into reduced subspace. And since both SR and NR are useful for LDF [19], [32], [34], [35], it is reasonable to take both of them into account for pursuing higher performance. Although some recent researches deem that sparsity is also a local structure [35], SR and NR indeed reflect the underlying structure of the data from different perspectives. Thus, we try to integrate them into a

Experiments

In this section, we verify our proposed algorithms on public real-world data sets and compare them with PCA, LPP, NPE and SPP. Note that we select these approaches for comparison on account of our concentrating on discussing the two ways for neighborhood selection and our approaches being linear and unsupervised. And our experiments are organized as follows. First, we map a toy data set and three UCI data sets [58] into the reduced space for insight into the interclass separation. Then we

Conclusions and future work

Recent research results show that both the sparsity and the neighborhood structure of the data are helpful for LDF. In this paper, we for taking into account both of them propose the two integrated DR methods, CSNP and SNPP. For removing the degradation of CSNP, we solidify the two structures into our defined hybrid structure and then preserve such hybrid structure into reduced space, resulting in the method SNPP. The introduced degrees of freedom not only make both SPP and NPE be the special

Acknowledgements

The authors would like to thank any anonymous reviewers for helpful suggestions.

Yupei Zhang received the B.Eng. degree in computer science and technology from East China Institute of Technology, Fuzhou, China, in 2009, and received the M.Eng. degree in computer software and theory from Zhengzhou University, Zhengzhou, China, in 2013. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include sparse representation, pattern recognition and machine learning.

References (60)

  • M.H. Hansen et al.

    Model selection and the principle of minimum description length

    J. Am. Stat. Assoc.

    (2001)
  • M.L. Raymer et al.

    Dimensionality reduction using genetic algorithms

    IEEE Trans. Evol. Comput.

    (2000)
  • R. Tibshirani

    Regression shrinkage and selection via the lasso

    J. R. Stat. Soc. Ser. B: Methodol.

    (1996)
  • L. Yu et al.

    Efficient feature selection via analysis of relevance and redundancy

    J. Mach. Learn. Res.

    (2004)
  • I.T. Jolliffe

    Principal Component Analysis

    (2002)
  • D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: Proceedings of the International Conference on...
  • Y. Bengio et al.

    Representation learning: a review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • T.F. Cox et al.

    Multidimensional Scaling

    (1994)
  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • B. Schölkopf, A. Smola, K.-R. Müller, Kernel principal component analysis, in: Advance in Kernel Methods: Support...
  • B. Scholkopft, K.-R. Mullert, Fisher discriminant analysis with kernels, in: Proceedings of the IEEE Workshop Neural...
  • L.J. van der Maaten et al.

    Dimensionality reduction: a comparative review

    J. Mach. Learn. Res.

    (2009)
  • X. He, P. Niyogi, Locality preserving projections, in: Proceedings of the Advances in Neural information processing...
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • X. He, D. Cai, S. Yan, H.-J. Zhang, Neighborhood preserving embedding, in: Proceedings of the International Conference...
  • Y. Fu et al.

    Locally Linear Embedded Eigenspace Analysis, IFP-TR

    (2005)
  • D. Cai, X. He, J. Han, Isometric projection, in: Proceedings of the National Conference on Artificial Intelligence,...
  • M. Sugiyama

    Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis

    J. Mach. Learn. Res.

    (2007)
  • Cited by (25)

    • Hierarchical sparse coding from a Bayesian perspective

      2018, Neurocomputing
      Citation Excerpt :

      In the case of fixed dictionary, it is usually referred to as sparse coding that we consider in this paper. As the kernel of sparse modeling, sparse coding not only delivers a concise representation for data explanation [7] and analysis [8–10], but also yields a succinct linear model for data reconstruction [11] and prediction [12]. Therefore, sparse coding attracts an enormous amount of studies in recent years and has achieved many appealing results in computer vision [13], pattern analysis [8], compressed sensing [14] and images retrieval [15].

    • Low-rank preserving embedding

      2017, Pattern Recognition
      Citation Excerpt :

      In this paper, we focus on unsupervised linear DR due to its well generalization and extensibility. Compared with nonlinear DR, linear DR methods have many merits: objective problem can be solved simply and efficiently; projection matrix can be used everywhere, either training data or test data; reduced representation can be achieved using simple algebraic manipulations [15]. Thus, many nonlinear methods based on manifold assumption have been recast into linearized versions.

    • Graph regularized nonnegative sparse coding using incoherent dictionary for approximate nearest neighbor search

      2017, Pattern Recognition
      Citation Excerpt :

      Nearest neighbor retrieval aims to, from a large-scale data set, find a data subset that is most similar to a query sample. It is always a fundamental component in a wide range applications, including dimensionality reduction [1], pattern classification [2] and image retrieval [3,4]. Nowadays high-dimensional data collection, however, results in expensive computational cost and deteriorative performance of traditional routes due to the curse of dimensionality [5].

    • Sparse multiple maximum scatter difference for dimensionality reduction

      2017, Digital Signal Processing: A Review Journal
    View all citing articles on Scopus

    Yupei Zhang received the B.Eng. degree in computer science and technology from East China Institute of Technology, Fuzhou, China, in 2009, and received the M.Eng. degree in computer software and theory from Zhengzhou University, Zhengzhou, China, in 2013. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include sparse representation, pattern recognition and machine learning.

    Ming Xiang received the B.Eng. and Ph.D. degrees from Northwestern Polytechnical University, Xi’an, China, in 1987 and 1999, respectively, and currently works as an associate professor in the department of computer science and technology in Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include information fusion, pattern recognition and machine learning.

    Bo Yang received the B.Eng. degree in computer science and technology from Xi’an University of Posts & Telecommunication, Xi’an, China, in 2005, and received the M.Eng. degree in computer system architecture from Xidian University, Xi’an, China, in 2009. He is currently a Ph.D. candidate in the department of computer science and technology, Xi’an Jiaotong University, Xi’an, China. His current research interests mainly include manifold learning, pattern recognition and machine learning.

    View full text