Elsevier

Neural Networks

Volume 112, April 2019, Pages 1-14
Neural Networks

Robust dimensionality reduction via feature space to feature space distance metric learning

https://doi.org/10.1016/j.neunet.2019.01.001Get rights and content

Abstract

Images are often represented as vectors with high dimensions when involved in classification. As a result, dimensionality reduction methods have to be developed to avoid the curse of dimensionality. Among them, Laplacian eigenmaps (LE) have attracted widespread concentrations. In the original LE, point to point (P2P) distance metric is often adopted for manifold learning. Unfortunately, they show few impacts on robustness to noises. In this paper, a novel supervised dimensionality reduction method, named feature space to feature space distance metric learning (FSDML), is presented. For any point, it can construct a feature space spanned by its k intra-class nearest neighbors, which results in a local projection on its nearest feature space. Thus feature space to feature space (S2S) distance metric will be defined to Euclidean distance between two corresponding projections. On one hand, the proposed S2S distance metric displays superiority on robustness by the local projection. On the other hand, the projection on the nearest feature space contributes to fully mining local geometry information hidden in the original data. Moreover, both class label similarity and dissimilarity are also measured, based on which an intra-class graph and an inter-class graph will be individually modeled. Finally, a subspace can be found for classification by maximizing S2S based manifold to manifold distance and preserving S2S based locality of manifolds, simultaneously. Compared to some state-of-art dimensionality reduction methods, experiments validate the proposed method’s performance either on synthesized data sets or on benchmark data sets.

Introduction

Dimensionality reduction is efficient and unavoidable both for high-dimensional data visualization and for high-dimensional data feature extraction, which has been validated in a variety of applications such as information retrieval (Bai et al., 2018, Rivera-Caicedo et al., 2017), pattern recognition (Bloom et al., 2017, Srinivasa Perumal and Chandra Mouli, 2016), computer vision (Chen and Daly, 2018, Liu, and Zhang et al., 2016), and data mining (Houari et al., 2016, Zhang et al., 2016, Zhang et al., 2015). Up to date, a diverse set of dimensionality reduction algorithms have been put forward. Firstly, linear approaches are developed to reduce the dimensions of the original data (Gyamfi et al., 2018, Huang and Zheng, 2006, Juuti et al., 2018, Tao et al., 2009, Turk and Pentland, 1991, Wang et al., 2016, Zhang and Chow, 2012). Later, in order to extract features from nonlinear distributed data more efficiently, some nonlinear models including t-distributed stochastic neighbor embedding (t-SNE) as well as its extensions are also fast growing (Gisbrecht et al., 2015, van der Maaten and Hinton, 2008, Vepakomma et al., 2016). Recently, deep learning neural networks have also attracted an increasing amount of attentions, where canonical correlation analysis networks (CCANets) and generative adversarial networks (GANs) have shown their performance on dimensionality reduction (Goodfellow et al., 2016, Yang et al., 2017).

As one kind of nonlinear dimensionality reduction method, manifold learning has been concentrating on because of its nonlinear nature, geometric intuition and computational practicability. Manifold learning can be categorized into two main types. One type attempts to preserve global properties of the original data in their low dimensional representations, for example isometric mapping (ISOMAP) (Tenenbaum, de Silva, & Langford, 2000). The other type tries to keep local structure information of the original data in their low dimensional projections such as locally linear embedding (LLE) (Roweis & Saul, 2000) and Laplacian eigenmaps (LE) (Belkin & Niyogi, 2003).

Among all the traditional manifold learning approaches, more and more researchers have shown their favors to LE because it is computationally simpler and can offer useful results on a broader range of manifolds (He, Yang, Hu, Niyogi, & Zhang, 2005). Based on the original LE, p-Laplacian regularized sparse coding (p-LSC) exploits p-Laplacian regularization to preserve the local geometry, which is a nonlinear transformation to standard graphLaplacian (Liu, Zha, Wang, Lu, and Tao, 2016). However, poor generalization ability limits LE’s applications on many real-world data. Some tricks such as charting, linearization, kernelization, and tensorization have also been introduced to make extensions to LE (He et al., 2005, Yan et al., 2007). Locality preserving projection (LPP) is just a classical linear approximation to LE. Based on LPP, the orthogonality is constrained to mine more local geometry, which is termed orthogonal LPP (OLPP) (Cai, He, Han, & Zhang, 2006). Both LPP and OLPP only focus on local structure of the original data and no non-local information has been taken into account. Thus Yang et al. present an unsupervised discriminant projection (UDP) to explore a linear subspace with the maximum non-local scatter and the minimum local scatter (Yang, Zhang, Yang, & Niu, 2007). This property makes UDP more intuitive and more powerful than most manifold learning based dimensionality reduction methods. In addition, UDP can also be viewed as a simplified version of LPP on the assumption that the local density is uniform (Deng, Hu, Guo, Zhang, & Zhang, 2007). In these LE based methods mentioned above, class information is ignored for subspace location.

In other LE based dimensionality reduction algorithms, prior class labels are adopted to guide the construction of the local graph. On one hand, labels are introduced to adjust the weights between points in the local neighborhood graph such as locally preserving discriminant projection (LPDP) (Gui, Jia, Zhu, Wang, & Huang, 2010) and orthogonal discriminant projection (ODP) (Li, Wang, & Huang, 2009). On the other hand, both local structure information and class labels are all considered to select points consisting of an inter-class graph and an intra-class graph, respectively. In marginal Fisher analysis (MFA) (He et al., 2005), an intrinsic graph and a penalty graph, characterizing the intra-class point adjacency relationship and the inter-class marginal point adjacency relationship, are modeled to achieve the discriminant projection subspace. Same as MFA, other methods have also formed Fisher objective functions to maximize the inter-class data separability and to minimize the intra-class data compactness, simultaneously (Wan, 2012, Wei et al., 2012). Furthermore, for the sake of overcoming small sample size problem, Lu et al. define a multi-manifold margin (Lu, Tan, & Wang, 2013). By constrained with the localized pairwise cannot-link (CL) and must-link (ML), which are introduced to keep the intrinsic proximity relations of the inter-class and the intra-class similarity pairs, a constrained large margin local projection (CMLP) method can significantly enlarge the margin between the inter-class and the intra-class clusters (Zhang, Zhao, & Chow, 2012). Similar to these extensions to LE, other classical manifold learning methods such as ISOMAP and LLE have also been modified to new versions. A semi-supervised local multi-manifold ISOMAP is put forward by Zhang et al., which aims to minimize ML distances in the same manifold and to maximize CL distances over different manifolds (Zhang, Zhang, Qin, Zhang, Li, and Li, 2018). Using the pairwise CL and ML constraints to specify the types of neighborhoods, M-ISOMAP computes the shortest path distances over constrained neighborhood graphs to find large margins (Zhang, Chow, & Zhao, 2013). With the locally least linear reconstruction technique in LLE, a reconstructive discriminant analysis (RDA)expects to explore a low dimensional subspace where the intra-class reconstruction scatter of samples will be minimized and the inter-class reconstruction scatter of samples will be maximized (Chen & Jin, 2012).

For all these manifold learning based approaches, they have a point in common that the nearest neighbors graph must be constructed in advance. But how to determine the parameter k, i.e. the number of the nearest neighbors, is still an open problem. Following the idea in l1-graph, Yang et al. construct a l2-graphwhere any sample can be represented by all the samples instead of just those nearest neighbors, based on which a collaborative representation based projection (CRP) is presented (Yang, Wang, & Sun, 2015). In parameterless reconstructive discriminant analysis (PRDA), the notable characteristic is that PRDA is parameter-free rather than parameter-dependent as RDA (Huang & Gao, 2016). Some others also use sparse representation classification (SRC) to adaptively determine the parameter that those with non-zero representation coefficients will be taken as the neighbors. Yang et al. propose SRC steered discriminative projection (SRC-DP) to maximize the ratio of between-class reconstruction residual to within-class reconstruction residual in the projected space (Yang, Chu, Zhang, Xu, & Yang, 2013). In addition, by integrating SRC and linear regression classification (LRC), the maximum nearest subspace margin criterion is designed for feature extraction (Chen, Li, & Jin, 2013). SRC benefits to select neighbors automatically, but it cannot ensure that they are all locally located, which may lead to failure of manifold locality exploring. However, manifold learning make contributions to nonlinear dimensionalityreduction by mining the local geometry information in the original data.

In addition, when modeling the nearest neighbor graph, LE based methods often adopt point to point (P2P) distance metric to select the nearest neighbors on a local patch of manifold. If some noisy points are contained, it will easily incur noisy sensitivity problem. For one point, some noises may have shorter P2P distance to it than those on manifold. Using KNN criterion, these noisy points will be wrongly taken as neighbors, which cannot well characterize the true local geometry of the manifold. Thus LE based manifold learning approaches will not be successfully applied to these noisy data for dimensionality reduction. How to improve the robustness of dimensionality reduction attracts many researchers’ interests. Chang et al. bring forward a robust LLE, where each point is assigned a weight, determined by robust PCA, to restrain the influence of noisy points (Chang & Yeung, 2006). Thus its robustness will be improved. Hou et al. model a unified framework to unfold the manifold, under which local learning approaches can be formulated to the semi-definite programs with robustness (Hou, Zhang, Wu and Jiao, 2009). In local linear transformation embedding (LLTE) (Hou, Wang, Wu, and Yi, 2009), analysis is made to the robust problem of LLE and then some tricks such as local linear transformation and three-stage LLTE are performed to samples containing noises and outliers. Because of the robust characteristic of l1-norm, it is also introduced in some algorithms. 2DPCA is combined to l1-norm for image analysis, which can be optimized with non-greedy strategy (Wang et al., 2017, Wang et al., 2015). At the same time, 2DPCA can also be reformulated to an angle version (Gao, Ma, Liu, Gao, & Nie, 2018). A sparse learning framework is also proposed by adopting l1- norm to LLE based methods, thus they can be popularized to sparse cases (Lai, Wong, Xu, Yang, & Zhang, 2016). Since l21-norm is robust to variances of images, a l21-norm based unified rotational invariant dimensionality reduction framework is constructed, under which LE can be extended to a more generalized form (Lai, Xu, Yang, Shen, & Zhang, 2017). In order to avoid the problem of sensitive to data variations in ridge regression, a robust discriminant regression (RDR) uses l21-norm as the basic metric in the objective function (Lai et al., 0000). Recently, lp- and lm-norm are also recommended toreplace l2-norm in traditional linear discriminant analysis (LDA) with satisfactory-enough performance (Ye, Fu, Zhang, Zhao, & Naiem, 2018). These norms have shown their contributions on robustness. Due to the difficulty of solving sparse problems, a greedy learning trick has to be taken, however, it easily gets stuck in local optimal solutions.

In addition, the nearest linear combination (NLC) approaches have been proposed to learn information hidden in more than one feature point with the same class and improve the recognition performance significantly (Li et al., 2000, Zhang and Nie et al., 2018). Multiple feature points can be linearly composed of a subspace to represent the class, which results in the concept of feature space. In a nearest feature space classifier (NFSC), the query point will be appointed the class label of the feature space with the nearest Euclidean distance to it, where point to feature space (P2S) distance metric plays an important role. Chen et al. rewrite Fisher discriminant analysis (FDA) by introducing the projection of point on its nearest feature space, which outperforms some other approaches using point to point (P2P) distance metric (Chen, Han, Wang, & Fan, 2011). But the predefined P2S distance metric is still not enough exploring the local geometry of the original data. Either for P2P distance metric or for P2S distance metric, one or two original points will be directly involved in. On the condition that some noises are contained in the original data, they will be inevitably drawn into P2S or P2P distance metric. As a result, the problem of noisy sensitivity may occur to P2S or P2P distance metric based dimensionality reduction. Thus it is desired to bring forward other distance metric, which can further optimize the feature extraction performance and improve the robustness.

In this paper, a supervised dimensionality reduction method, termed feature space to feature space distance metric learning (FSDML), is proposed, where a feature space to feature space (S2S) distance metric is newly defined. For any point, there exists the nearest feature space spanned by its k intra-class nearest neighbors. It will result in a local projection between them. It is the local projection transformation that can suppress the impacts of noisy points. In manifold learning, it is often assumed that any point and its k nearest neighbors are on a manifold local patch. It is the same case to the point and its nearest feature space ones. However, if the point is contaminated by noise, it will be viewed as a point out of manifold. By mapping the noisy point into the nearest feature space, consequently, its projection and the feature space points will be regarded as locating on a manifold patch. From the viewpoint of manifold learning, the noise sensitivity problem will be naturally overcome to substitute the noisy point by the projection on its nearest feature space. For this reason, S2S distance metric can be reasoned to distance between two projections. Meanwhile, the projection shows close relation to the points in the nearest feature space and can be represented to their linear combination, where it can also find the full usage of the local geometry information. Based on the proposed S2S distance metric, combining to class label based similarity and dissimilarity measurements for all the points, an inter-class graph and an intra-class graph will be constructed, from which a multi-manifold distance metric and the locality of manifolds can be deduced, respectively. At last, a low dimensional linear subspace will be explored by maximizing a generalized Fisher criterion with the ratio of S2S based multi-manifold distance metric to S2S based locality of manifolds.

The main contributions of the proposed FSDML method are listed below:

  • (1)

    We propose a new S2S distance metric. Instead of set to set distance in Zhou et al. (2018), which is composed of the class-identity term, the relative distance term, and the regularization term, the new S2S distance metric is just Euclidean distance between any two projections on the corresponding nearest feature spaces, where the local geometry existed in the nearest neighbors will be taken fully advantages of. Thus more local structure information can be approached for multi-manifold learning.

  • (2)

    On the basis of the proposed S2S distance metric, we also prove that it is of high robustness to noisy points, which can be found from both theoretical analysis and simulation experimental results on three synthesis noisy data sets and benchmark AR face data subsets.

  • (3)

    A novel S2S based manifold to manifold (M2M) distance metric is also proposed to represent the apartness of manifolds with different labels. In addition, the proposed M2M distance metric can be deduced to the inter-class graph Laplacian using both S2S distance metric and class label based dissimilarity relations.

The remainder of the paper is organized as follows. Section 2 reviews LE along with its classical extensions. Besides, anothermanifold learning based dimensionality reduction method named nearest feature space embedding (NFSE) is also described inSection 2. Then a new S2S distance metric is presented, accompanied with some theoretical analysis in Section 3. The proposed FSDML algorithm is analyzed and justified in Section 4. Section 5 offers some simulation experimental results on benchmark data sets, followed by some analysis. Finally, conclusions are drawn and future works are also expected in Section 6.

Section snippets

Related works

There are many dimensionality reduction methods related to the proposed FSDML. In this section, we review LE algorithm and its classical unsupervised and supervised versions, i.e. UDP and MFA. Moreover, another manifold learning based method NFSE, where point to the nearest feature space distance metric is contained, will be also discussed.

Feature space to feature space distance metric

In this Section, a new metric which can quantify the distance between any two feature spaces is proposed. The proposed S2S distance metric shows close connections to local geometry structure, class label information and noisy robustness.

It is generally believed that some of the intra-class data can span a feature space. Thus for any point labeled with the ith class in the original high dimensional space, the number of its intra-class data will be ni1. If we only select k(k<ni1) intra-class

S2S based distance metric based multi-manifold learning

Assume that data with the same class are located on one manifold and points with different labels are resided on different manifolds, data classification can be transformed to multi-manifold learning. In the unsupervised graph spectrum mapping, i.e. LPP and UDP, local information is adopted to calculate the similarity metric between two points. However, in this paper, both a similarity matrix and a dissimilarity matrix will be globally determined just using data label information to construct

Experiments

In this Section, the performance of FSDML is evaluated by making comparisons to some related dimensionality reduction methods. Firstly, the traditional linear method with supervised learning, i.e. LDA, is employed. Then, manifold learning based dimensionality reduction or feature extraction approaches including both an unsupervised version and a supervised extension to the original Laplacian spectrum embedding such as UDP and MFA have also been taken in the experiments. Then, a local LDA with

Conclusions and future works

In this paper, a supervised dimensionality reduction method named feature space to feature space distance metric learning (FSDML) is presented for discriminant graph spectrum embedding. In the proposed method, a feature space to feature space (S2S) distance metric is newly defined, which shows contributions to improve the robustness to noisy points and to mine more local discriminant information hidden in the original data. Moreover, data class label information is also taken into account to

Acknowledgments

This work was supported in part by the grants of Natural Science Foundation of China (61572381, 61273303, 61472280 and 61602349), China Post-doctoral Science Foundation (2016M601646) and Natural Science Foundation of Hubei Province (2018CFB575).

References (57)

  • HouariR. et al.

    Dimensionality reduction in data mining: A Copula approach

    Expert Systems with Applications

    (2016)
  • HuangP. et al.

    Parameterless reconstructive discriminant analysis for feature extraction

    Neurocomputing

    (2016)
  • JuutiM. et al.

    Stochastic discriminantanalysis for linear supervised dimension reduction

    Neurocomputing

    (2018)
  • LiB. et al.

    Supervised feature extraction based on orthogonal discriminant projection

    Neurocomputing

    (2009)
  • LiuM. et al.

    A classification modelfor semantic entailment recognition with feature combination

    Neurocomputing

    (2016)
  • Rivera-CaicedoJ.P. et al.

    Hyperspectral dimensionality reduction for biophysical variable statisticalretrieval

    Journal of Photogrammetry & Remote Sensing

    (2017)
  • WangS. et al.

    Semi-supervised linear discriminant analysis for dimension reduction and classification

    Pattern Recognition

    (2016)
  • WeiD. et al.

    Graph embedding based feature selection

    Neurocomputing

    (2012)
  • YangX. et al.

    Canonical correlation analysisnetworks for two-view image recognition

    Information Science

    (2017)
  • YangW. et al.

    A collaborative representationbased projections method for feature extraction

    Pattern Recognition

    (2015)
  • YeQ. et al.

    Lp- and Ls-norm distance based robust linear discriminant analysis

    Neural Networks

    (2018)
  • ZhangZ. et al.

    Robust linearly optimized discriminant analysis

    Neurocomputing

    (2012)
  • ZhangZ. et al.

    M-Isomap: Orthogonal constrained marginal isomap for nonlinear dimensionality reduction

    IEEE Transactions on Cybernetics

    (2013)
  • ZhangY. et al.

    Semi-supervised local multi-manifold isomap by linear embedding forfeature extraction

    Pattern Recognition

    (2018)
  • ZhangZ. et al.

    Constrained large margin local projection algorithms and extensions for multimodal dimensionality reduction

    Pattern Recognition

    (2012)
  • BelkinM. et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Computation

    (2003)
  • CaiD. et al.

    Orthogonal laplacian faces for face recognition

    IEEE Transactions on Image Processing

    (2006)
  • ChenY.N. et al.

    Facerecognition using nearest feature space embedding

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • Cited by (53)

    View all citing articles on Scopus
    View full text