Elsevier

Neural Networks

Volume 155, November 2022, Pages 287-307
Neural Networks

Unsupervised robust discriminative subspace representation based on discriminative approximate isometric embedding

https://doi.org/10.1016/j.neunet.2022.06.003Get rights and content

Abstract

Subspace learning has shown a tremendous potential in the fields of machine learning and computer vision due to its effectiveness. Subspace representation is a key subspace learning method that encodes subspace membership information. To effectively encode the subspace memberships of data, some structured prior constraints are imposed on the subspace representation, such as low-rank, sparse, and so on. To handle various noises, existing methods tend to separate a specific type of noise using a specific way to obtain robust subspace representation. When encountering diversified noises, their subspace-preserving property may not be guaranteed. To address this issue, we propose a novel unsupervised robust discriminative subspace representation to mitigate the impacts of diversified noises via discriminative approximate isometric embedding, rather than directly separating noises from the high-dimensional space, as done like the existing methods. To ensure the performance of our approach, we provide a crucial theorem, termed as noisy Johnson–Lindenstrauss theorem. Meanwhile, Laplacian rank constraint is imposed on the discriminative subspace representation to uncover the ground truth subspace memberships of noisy data and improve the graph connectivity of subspaces. Extensive experiments on several benchmark datasets and two large-scale datasets validate the effectiveness and robustness of our approach with respect to diversified noises.

Introduction

Subspace learning is a geometric method that models high-dimensional data by using a low-dimensional model or structure. It fosters a wide range of applications, such as manifold learning (Goh & Vidal, 2008), face recognition (Lee, Ho, & Kriegman, 2005), motion segmentation (Vidal, Ma, & Sastry, 2005), feature extraction (Liu and Yan, 2011, Zhang et al., 2017, Zhang, Zhang et al., 2019), outlier detection (You, Robinson, & Vidal, 2017), dimensionality reduction (Li, 2019, Liu et al., 2019), image classification, data representation (Ren et al., 2020, Zhang et al., 2020), and so on. According to different criteria, in general, it can be divided into different groups: (1) supervised subspace learning and unsupervised subspace learning, (2) linear subspace learning and nonlinear subspace learning, and (3) noise-free subspace learning and robust subspace learning. Linear subspace learning, or subspace learning for short, is based on the self-expressiveness theory that each data point within a subspace can be represented as a linear combination of the other data points from a union of subspaces. Nonlinear subspace learning, or deep subspace learning, exploits a deep neural network to learn an embedding of data points, which has a structure of union of linear subspaces. The distribution of data points is supported on a mixture of multiple low-dimensional submanifolds. The aim of nonlinear subspace learning is to learn a smooth mapping that maps each of the submanifolds to a linear subspace, so that the self-expressiveness theory can be applied to the resulting linear subspaces.

We must point out that the research focus of this paper is linear subspace learning, rather than nonlinear subspace learning, due to the following twofold reasons: (1) the aim of nonlinear subspace learning, in essence, is to transform the intrinsic structure of data or deep features or data representation to the linear subspace representation (i.e., a linear combination of low-dimensional subspaces). For example, the deep subspace model (Ji, Zhang, Li, Salzmann, & Reid, 2017) in essence is based on sparse subspace representation, or Frobenius-norm based representation, rather than a new subspace representation. So, we want to focus on the essential subspace representations behind these deep subspace models. (2) the effectiveness of nonlinear subspace learning needs to be further studied. For example, the recent work (Haeffele, You, & Vidal, 2020) shows that the deep subspace model is ill-posed and results in trivial data geometries in the embedded space. Besides this, it further validates that a significant portion of the performance benefits of the deep subspace model is attributed to an ad-hoc post-processing step rather than the deep model itself.

Subspace representation is a crucial linear subspace learning means, which is used to model the subspace memberships of data points under certain structured prior constraints. In general, according to the difference of structured prior constraints imposed on the subspace representation, the existing subspace representation can be divided into seven categories: (1) low-rank representation, (2) sparse representation, (3) low-rank and sparse representation, (4) Frobenius-norm based representation, (5) block diagonal representation, (6) entropy based representation, and (7) combinatorial representation based on the above six representations.

It is worth noting that the major concern of this paper is subspace representation, rather than subspace clustering, since many different subspace clustering methods are based on the same subspace representation. Besides this, subspace representation can be employed not only for subspace clustering, but also for similarity characterization, affinity graph learning, dimensionality reduction, and so on. Therefore, our research interests mainly focus on the essential subspace representations behind these different methods.

In many real-world applications, subspace representation needs to be robust for noises, corruptions, and outliers. Although the existing subspace representation methods show their robustness, there are still the following limitations for them.

  • they attempt to model diversified noises by using specific ways, e.g., l1-norm, l2,1-norm, or Frobenius-norm, which are only suitable for sparse noise or Gaussian noise (Peng, Lu, Yi, & Tang, 2016). When encountering diversified noises, their robustness may degrade.

  • the subspace-preserving property claims that one data point in a union of subspaces can be expressed as a linear combination of the other data points in its own subspace. Their subspace-preserving properties are obeyed only when certain conditions are satisfied, e.g., the relationship between the data distribution and subspace geometry (Elhamifar and Vidal, 2013, Soltanolkotabi et al., 2012, You and Vidal, 2015). These conditions may not be satisfied when encountering larger noises or corruptions or diversified noises. Furthermore, the correctness of subspace clustering will be affected for them.

  • the graph connectivity of subspaces (Nasihatkon & Hartley, 2011) asserts that the data points lying in different subspaces should not be connected to each other meanwhile the data points lying in the same subspace should form a graph-connected component. Their graph connectivity of subspaces may be undermined in the presence of larger noises or corruptions or diversified noises.

  • their performances depend heavily on the optimal values of parameters. The tuning of parameters is time-consuming and difficult despite the bounds of parameters are given in some literature (Wang, Xu, & Leng, 2019).

To overcome the limitations mentioned above, in this paper, we propose a novel unsupervised robust discriminative subspace representation (URDSR) to handle diversified noises for robust subspace learning. Unlike the existing methods, our approach aims to seek a low-dimensional discriminative subspace to mitigate the impacts of diversified noises, rather than separating noises directly from the high-dimensional space, as done like the existing methods. The key idea is to carry out a discriminative approximate isometric embedding to perform both approximate isometric embedding and global geometric discriminability for high-dimensional noisy data. We give a key theorem, termed as noisy Johnson–Lindenstrauss theorem, to lay a solid theoretical foundation for our approach. In addition, the Laplacian rank instead of the low-rank is imposed on our discriminative subspace representation to ensure a block diagonal structure, which characterizes correctly the subspace memberships of noisy data. As a result, no more assumptions and conditions are further required for our approach to guarantee the subspace-preserving property, and the problem of graph connectivity of subspaces is also addressed effectively for our approach. Besides this, only one parameter needs to be tuned for our approach, and it is self-tuning for its optimal value.

In summary, the main contributions of this paper are summarized as follows:

  • (1)

    we provide a subspace representation learning methodology which is robust to different types of noises.

  • (2)

    we exploit discriminative approximate isometric embedding to mitigate the impacts of diversified noises.

  • (3)

    we give the noisy Johnson–Lindenstrauss theorem and its proof to lay a solid theoretical foundation for our approach.

  • (4)

    we conduct extensive experiments to evaluate our approach against the related competing methods.

The rest of this paper is organized as follows: we first briefly review the related work in Section 2. Then we present the methodology in Section 3 and give its optimization method in Section 4. Section 5 describes the algorithm and gives its theoretical analysis. Experimental results and discussion are reported in Section 6. Finally, Section 7 concludes this paper.

Section snippets

Related work

For the sake of clarity and convenience, the important notations used in this paper are listed in Table 1. We suggest readers to refer to the notations listed in Table 1 for better understanding the proposed approach in this paper.

Low-rank representation (LRR) (Liu et al., 2012) aims to reveal subspace structure by enforcing a low-rank constraint on the subspace representation.

Owing to the low-rank optimization is NP-hard, nuclear-norm is used to replace the rank function as its convex envelope

Subspace-preserving representation

In this section, we first give the definition of self- expressiveness property, which is the foundation of subspace representation theory. Based on the self-expressiveness property, we further introduce the concept of subspace-preserving property as a preparation for the subsequent parts.

Definition 1

Self-Expressiveness Property Elhamifar & Vidal, 2013

Every data point xi drawn from a union of subspaces can be expressed as a linear combination of the other data points except itself. Specifically, there exists a representation ciRn with respect to xi such

Optimization

In this section, we introduce how to solve the model (4). In general, it is difficult to solve directly the model (4) since its optimization objective function involves multiple variables and constraints. To make the problem tractable, we use alternating direction method of multipliers (ADMM) (Boyd, Parikh, Chu, Peleato, Eckstein, et al., 2011) to solve it by introducing an auxiliary variable Z. Thus, the problem can be further rewritten as follows: minP,C,Z,FPXPXZF2tr(PX̂X̂P)+2αtr(FLCF)

Algorithm

The whole learning procedure of the proposed approach is described in Algorithm 1. It is worth mentioning that our approach simultaneously performs both robust affinity graph learning and subspace representation learning. More importantly, the parameter α of the proposed approach is self-tuning in the sense that α is adaptively adjusted without requiring manual tuning as the number graphcomponent(GrC) of connected components of the learned graph GrC changes. Besides this, our approach is

Experiments

In this section, several experiments on the benchmark datasets and large-scale datasets are conducted to evaluate our approach URDSR. These experiments include (1) discriminability of low-dimensional embedding, (2) subspace representation comparison, (3) noisy data clustering, (4) subspace-preserving test, (5) robust affinity graph learning based on Laplacian rank, (6) convergence verification and comparison, (7) visualization of recovered face images, and (8) evaluation on the large-scale

Conclusion

In this paper, we present a novel unsupervised robust discriminative subspace representation to handle diversified noises. The main differences between our approach and the existing methods lie in the following aspects: First, to mitigate the impacts of diversified noises, we use discriminative approximate isometric embedding, which can better preserve both intrinsic global and local geometry of the original data, rather than directly separating noises from the high-dimensional space, as done

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (54)

  • Donoho, D. L. (2005). Neighborly polytopes and sparse solutions of underdetermined linear equations: Stanford Technical...
  • ElhamifarE. et al.

    Sparse subspace clustering: Algorithm, theory, and applications

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2013)
  • FanK.

    On a theorem of Weyl concerning eigenvalues of linear transformations I

    Proceedings of the National Academy of Sciences of the United States of America

    (1949)
  • GohA. et al.

    Clustering and dimensionality reduction on Riemannian manifolds

  • GriffinG. et al.

    Caltech-256 object category dataset

    (2007)
  • HaeffeleB.D. et al.

    A critique of self-expressive deep subspace clustering

    (2020)
  • JiP. et al.

    Deep subspace clustering networks

  • KaneD.M. et al.

    Sparser johnson-lindenstrauss transforms

    Journal of the ACM

    (2014)
  • LeeK.-C. et al.

    Acquiring linear subspaces for face recognition under variable lighting

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • LiuG. et al.

    Robust recovery of subspace structures by low-rank representation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2012)
  • LiuG. et al.

    Robust subspace segmentation by low-rank representation

  • LiuG. et al.

    Latent low-rank representation for subspace segmentation and feature extraction

  • LiuG. et al.

    Robust subspace clustering with compressed data

    IEEE Transactions on Image Processing

    (2019)
  • LuC. et al.

    Subspace clustering by block diagonal representation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2019)
  • LuC.-Y. et al.

    Robust and efficient subspace segmentation via least squares regression

  • ManningC.D. et al.

    Introduction to information retrieval

    (2008)
  • NasihatkonB. et al.

    Graph connectivity in sparse subspace clustering

  • Cited by (0)

    View full text