Locality adaptive preserving projections for linear dimensionality reduction

https://doi.org/10.1016/j.eswa.2020.113352Get rights and content

Highlights

  • Seeking the local structure in original feature space is shown to be error-prone.

  • We propose a locality adaptive projection approach for neighborhood preserving.

  • Experimental results demonstrate the feasibility of the proposed method.

Abstract

Dimensionality reduction techniques aim to transform the high-dimensional data into a meaningful reduced representation and have been consistently playing a fundamental role in the study of intrinsic dimensionality estimation and the design of an intelligent expert system towards real-world applications. From the perspective of manifold learning, locality preserving projections is a classical and commonly used dimensionality reduction method and it essentially learns the low-dimensional embedding under the constraint of preserving the local geometry of data. However, since it determines the neighborhood relationships in the original feature space that probably contains noisy and irrelevant features, the derived similarity between the neighbors are unreliable and the corresponding local data manifold tends to be error-prone, which inevitably leads to degraded performance for subsequent data analyses. Hence, how to accurately identify the true neighbor relationships for each sample remains crucial to the robustness improvement. In this work, we propose a novel approach, termed locality adaptive preserving projections (LAPP), to adaptively determine the neighbors and their relationships in the optimal subspace rather than in the original space. Specifically, due to the absence of prior knowledge of local properties of the underlying manifold, LAPP adopts a coarse-to-fine strategy to iteratively update the projected low-dimensional subspace and optimize the identification of the local structure of the data. Moreover, an iterative algorithm with fast convergence is utilized to solve the transformation matrix for explicit out-of-sample extension. Besides, LAPP is easy to implement and its key idea can be potentially extended to other methods for neighbor-finding and similarity measurement. To evaluate the performance of LAPP, we conduct comparative experiments on numerous synthetic and real-world datasets. Experimental results show that seeking the local structure in the original feature space misleads the selection of neighbors and the calculation of similarity and that the proposed method helps alleviate the negative effect of noisy and irrelevant features, which demonstrates its effectiveness. Besides, this study has the potential to enlighten relevant studies to consider the problem of optimizing the neighborhood relationships.

Introduction

For a variety of research fields and real-world applications that range from face recognition (He, Yan, Hu, Niyogi & Zhang, 2005) and smoke detection (Yuan, Xia, Shi, Li & Li, 2017) to activity recognition (Wang, Chen, Yang, Zhao & Chang, 2016) and finance management (Tayalı & Tolun, 2018; Zhong & Enke, 2017), we are often confronted with high-dimensional data and required to develop powerful analysis methods for the discovery of knowledge and the design of a decision support system, especially in the era of big data where we are faced with a massive amount of data that are characterized by complexity, variety, and high-dimensionality. Consequently, the prediction and evaluation models directly trained on such data not only suffer immensely from the curse of dimensionality, but also larger computational loads. Even worse, if the original feature space fails to reflect the intrinsic structure of the data, it leads to degraded performance and lowers the confidence of a decision system to a large extent (Qiao, Chen & Tan, 2010). For example, in terms of a face recognition system, we often organize a w*h face image into a w*h dimensional vector for appearance-based techniques, which is too large for robust face recognition (Bhowmik, Saha, Singha, Bhattacharjee & Dutta, 2019). In the task of building an intelligent expert system for daily stock market analyses, researchers usually collect a wide range of financial and economic features to maximize the stock market return. However, some of these features are irrelevant to the task and even redundant to each other (Zhong & Enke, 2017). Undoubtedly, this poses a serious challenge to the exploration of intrinsic dimensionality of the data, the efficiency of many machine learning models, and the generalization ability of a system for real-world scenarios. Accordingly, one common way to mitigate this problem is to utilize an effective and efficient dimensionality reduction method to reduce data dimensionality (Bhowmik et al., 2019; van der Maaten, Postma & Herik, 2009).

As an important preprocessing technique in data analysis, dimensionality reduction techniques basically work by transforming the data of high-dimensionality into a meaningful low-dimensional representation in a linear or non-linear way and they have been consistently playing a fundamental and important role in better revealing the intrinsic structure of the data and greatly facilitating the subsequent tasks (Zhao, Wang & Nie, 2018; Zhong & Enke, 2017). Particularly, dimensionality reduction contributes to the tasks of classification, regression, clustering, visualization, and data compression in a variety of applications such as face recognition, information retrieval, and disease diagnosis (Becht et al., 2019; van der Maaten & Hinton, 2008). For example, principle component analysis seeks a group of irrelevant variables by discarding redundant information and it helps reduce noise and improve the performance of a classifier. The key assumption behind dimensionality reduction is that the original feature space contains irrelevant features and some features are redundant to each other, we can then find a group of new features to represent the original ones (Tenenbaum, De Silva & Langford, 2000). Therefore, the task of dimensionality reduction is to find a reduced representation with the intrinsic dimensionality of the data by deriving an appropriately linear/non-linear transformation function under the carefully devised constraint conditions (van der Maaten et al., 2009; Zhao et al., 2018).

According to the requirement for the availability of data labels, existing dimensionality reduction techniques can be broadly categorized into three groups: supervised dimensionality reduction methods, unsupervised dimensionality reduction methods, and semi-supervised dimensionality reduction methods. Principle component analysis (PCA) is the most widely used unsupervised dimensionality reduction method and it attempts to seek a subspace by maximizing the variance of the projected data (Martínez & Kak, 2001). In contrast to PCA, linear discriminant analysis (LDA) utilizes the label information and it seeks the transformation matrix by simultaneously maximizing the rank of between-class scatter matrix and minimizing the rank of within-class scatter matrix in order to pull samples with the same label close and separate samples with different labels far from each other (Martínez & Kak, 2001). Though simple and intuitive, PCA and LDA are widely used in data reprocessing and perform well in a wealth of applications such as face recognition, seismic series analysis, visualization, and clustering (Belhumeur, Hespanha & Kriegman, 1997). However, both PCA and LDA only utilize the global structure of the data and assume there does not exist the local properties of the data, which limits their performance in handling complex cases when the above condition is not satisfied (Belhumeur et al., 1997; Martínez & Kak, 2001).

In contrast, another line of research is to explore the local properties of the data. From the perspective of manifold learning, dimensionality reduction essentially aims to find the low-dimensional manifold that is embedded into a high-dimensional space and this embedding keeps the data geometric characteristics as much as possible (Garcia-Vega & Castellanos-Dominguez, 2019; Tenenbaum et al., 2000). Accordingly, researchers have investigated the manifold learning and its application in dimensionality reduction. Isometric mapping (ISOMAP) (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis & Saul, 2000), and Laplacian eigenmaps (LE) (He & Niyogi, 2004) are three representative local methods that find a lower-dimensional embedding of the data lying on or around a high-dimensional non-linear manifold. They have achieved satisfactory performance on multiple application domains (Krstanović et al., 2016; van der Maaten et al., 2009), however, they do not provide explicit mapping between the original data and the reduced representation. That is, researchers are generally required to recompute the projection vectors in coping with out-of-sample extension, which greatly limits their flexibility in use and leads to high time costs in processing streaming data. To allow for the efficient embedding of new datapoints, researchers have investigated the linearized version of several non-linear dimensionality reduction methods. For example, He, Cai, Yan and Zhang (2005) proposed the neighborhood preserving embedding (NPE) to linearly approximate LLE. Locality preserving projections (LPP) is a linear approximation to LE (He & Niyogi, 2004). Specifically, LPP is a commonly used and well-performing approach that attempts to obtain a linear transformation matrix by preserving the local neighborhood relationships of the data. LPP has a remarkable advantage in dimensionality reduction and returns an explicit mapping for serving the out-of-sample extension. Compared with most of existing manifold learning methods, LPP not only preserves the local properties of the data, but also returns an explicit transformation matrix. Particularly, the two components of LPP include the construction of neighbor graph and the measurement of similarity between neighbors, both of which largely determine its performance. In addition, several variants of LPP have been proposed and experimentally validated, such as the discriminant locality preserving projections (DLPP) that makes use of label information (Yu, Teng & Liu, 2006) and the null space discriminant locality preserving projections (NDLPP) that is targeted at the small sample size problem of DLPP (Yang, Gong, Gu, Li & Liang, 2008; Yu et al., 2006).

Although LPP and its variants have been successfully applied for real-world applications, it takes the risk of choosing false nearest neighbors and incorrectly calculating the similarity between neighbors and the derived local manifold tends to be error-prone, which inevitably leads to degraded performance for subsequent data analyses. This is mainly because LPP measures the similarity between neighbors in the original feature space where there exist noisy and irrelevant features (Wang et al., 2016; Zhao et al., 2018). Obviously, the obtained neighbor relationships in the optimal subspace are more reliable than the ones in the original feature space and can better reflect the truth. Therefore, how to maximally mitigate the effect of noisy factors and accurately identify the true neighbor relationships for each sample remains crucial. However, we have no prior knowledge of the optimal subspace, which poses a challenge to the determination of the true similarity between neighbors and the robustness improvement of manifold learning-based methods. Accordingly, in this study, we propose a novel approach, termed locality adaptive preserving projections (LAPP) to adaptively determine the neighborhood relationships in the optimal subspace rather than in the original feature space. Specifically, due to the absence of prior knowledge of local properties of the underlying manifold, LAPP adopts a coarse-to-fine strategy to handle the chicken and egg situation. Moreover, an iterative algorithm with fast convergence is utilized to solve the constrained optimization problem for explicit out-of-sample extension. This enables us to better reveal the underlying manifold and obtain corresponding robust embeddings. Particularly, the main contributions of this study are as follows. Frist, we analyze the manifold learning-based dimensionality reduction techniques, especially the commonly used LPP and point out that seeking the local structure in original feature space is error-prone in terms of neighbor-finding and similarity measurement. This potentially motivates researchers to pay special attention to such a problem for other dimensionality reduction methods. Second, we propose a locality adaptive preserving projections approach to optimizing the measurement of neighbor relationships. The proposed method iteratively updates the projected low-dimensional subspace and optimizes the identification of the local structure of the data. Besides, its key idea can be potentially extended to other similar methods. Third, we implement and evaluate the proposed approach on numerous synthetic and real-world datasets. Extensive experimental results show that constructing the neighbor graph in the original feature space suffers from lower performance, which demonstrates the effectiveness of the proposed method.

The reminder of this study is organized as follows. Section 2 briefly reviews related work on dimensionality reduction techniques by introducing four commonly used methods. We detail the proposed locality adaptive preserving projections method and its motivation in Section 3. Section 4 gives the experimental setup and results on both synthetic and real-world datasets and presents corresponding analyses. The last section concludes this study with a brief summary and discusses insightful future research directions.

Section snippets

Related work

Over the past few decades, a large number of dimensionality reduction methods have been proposed and used in diverse areas (e.g., decision support systems, face recognition, and data visualization), and we can categorize them from different perspectives. According to whether the mapping function between the high-dimensional space and the reduced feature space is linear, we can group dimensionality reduction techniques into linear methods (e.g., PCA and LDA) and non-linear methods (e.g., LLE,

Locality adaptive preserving projections

As we discussed above, the selection of nearest neighbors and the measurement of neighborhood relationships largely determines the performance of locality preserving projections. Particularly, how to largely reduce the effect of unimportant factors and accurately seek the neighbors in the optimal subspace remains critical. Ideally, if we have prior knowledge of the noisy and irrelevant features of the data, we get the true pairwise distances between neighbors and derive the optimal feature

Experimental results and analysis

To evaluate the effectiveness of the proposed method, we conduct extensive experiments on two synthetic Swiss roll datasets, three face recognition benchmark datasets, including the Yale face database (YALE), Olivetti research laboratory database (ORL), and extended Yale Face Database B (E_YALE), as well as one hand-written digit recognition dataset MNIST. We compare LAPP with other four well-performing dimensionality reduction methods, including two global methods (PCA and LDA) and two local

Conclusions

Dimensionality reduction techniques have been consistently playing an important role in the procedure of data analysis and the design of an intelligent expert system such as face recognition and disease diagnosis, and they significantly facilitate, among other tasks, the classification, clustering, visualization, and compression in handling the data of high-dimensionality. Due to the complex non-linear relations inherent in data, however, dimensionality reduction methods that preserve the local

CRediT authorship contribution statement

Aiguo Wang: Conceptualization, Methodology, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Shenghui Zhao: Validation, Formal analysis, Writing - review & editing. Jinjun Liu: Writing - review & editing. Jing Yang: Formal analysis, Writing - review & editing. Li Liu: Writing - original draft, Writing - review & editing. Guilin Chen: Conceptualization, Methodology, Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was funded by the National Natural Science Foundation of China (No. 61902068), the Key Research and Development Project of Anhui Province (No. KJ2019ZD44), the Major Special Projects of Anhui Province(No. 201903A06020026), and the Anhui Provincial Natural Science Foundation (No. 1908085MF211).

References (35)

  • H. Zhao et al.

    Adaptive neighborhood MinMax projections

    Neurocomputing

    (2018)
  • X. Zhong et al.

    Forecasting daily stock market return using dimensionality reduction

    Expert Systems with Applications

    (2017)
  • E. Becht et al.

    Dimensionality reduction for visualizing single-cell data using UMAP

    Nature Biotechnology

    (2019)
  • P. Belhumeur et al.

    Eigenfaces vs. fisherfaces: Recognition using class specific linear projection

    IEEE Transactions on Pattern Analysis on Machine Intelligence

    (1997)
  • D. Cai et al.

    Orthogonal laplacianfaces for face recognition

    IEEE Transactions on Image Processing

    (2006)
  • X. He et al.

    Neighborhood preserving embedding

  • X. He et al.

    Locality preserving projections

  • Cited by (31)

    • Locality Preserving Projections with Autoencoder

      2024, Expert Systems with Applications
    View all citing articles on Scopus
    View full text