Elsevier

Pattern Recognition

Volume 100, April 2020, 107123
Pattern Recognition

Prototype learning and collaborative representation using Grassmann manifolds for image set classification

https://doi.org/10.1016/j.patcog.2019.107123Get rights and content

Highlights

  • Principle component and variation subspaces are constructed for an over-complete dictionary.

  • A novel prototype and variation model (P+V) based collaborative representation for Grassmann manifolds is proposed to deal with image set classification naturally.

  • Previous special matrices are generalized to common sparse matrices.

  • Experimental results show the superiorities of our methods.

Abstract

Image set classification using manifolds is becoming increasingly more attractive since it considers non-Euclidean geometry. However, with the success of dictionary learning for image set classification using manifolds, how to learn an over-complete dictionary is still challenging. This paper proposes a novel prototype subspace learning method, in which a set of images is represented by a linear subspace and then mapped onto a Grassmann manifold. With this subspace representation, class prototypes and intra-class differences can be represented as principal components and variation subspaces, respectively. Isometric mapping further maps the manifolds into the symmetrical space via collaborative representation, which permits a closed-term solution. The proposed method is evaluated for face recognition, object recognition and action recognition. Extensive experimental results on the Honda, Extended YaleB, ETH-80 and Cambridge-Gesture datasets verify the superiority of the proposed method over the state-of-the-art methods.

Introduction

Traditional classification techniques are used for single image classification problems. With the development of digital imaging technology, such as multiview cameras and photo albums, multiple images can be conveniently available for one subject. Therefore, image set classification has recently attracted increased attention in computer vision. Compared with a single image, a set of images comprises individual appearance changes under arbitrary poses, illumination conditions, expressions and other factors. While one subject can be described using increasingly more information, image set classification becomes increasingly challenging due to large intraclass variations.

Image set classification generally focuses on two key issues: how to effectively model an image set and how to measure the similarity between two sets. Many image set classification methods have been proposed [1], [2]. Usually, linear subspaces are used to model image sets [3], [4], and the canonical angle between two subspaces is defined as the similarity between the two sets. Among them, the mutual subspace method (MSM) [3] is the most classic image set subspace method. However, the MSM fails to distinguish the similarities among intraclass and interclass objects. As such, the constrained mutual subspace method (CMSM) [5] was proposed to map original subspaces into a difference subspace such that the difference components can be extracted. It’s notable that this constrained subspace is derived by removing the base vectors of all the reference class subspaces with large variances and keeping the base vectors with relatively small variances. Unfortunately, the CMSM neglects the relationships among subspaces from the same class. Boosted Manifold Principal Angles (BoMPA) [6] extend the concept of principal angles between linear subspaces to manifolds with arbitrary nonlinearities. By contrast, sample-based methods [7] use an affine hull/convex hull to represent an image set. These methods measure the similarity by matching the specific sample-based statics. As studied in [1], an image set can also be represented as a linear subspace, which is denoted as a point on a Grassmann manifold.

Sparse representation and dictionary learning are attracting significant attention [8] for classification. SRC [8] treats the entire training samples as a dictionary for face recognition. To have discriminative sparse coding, several methods consider non-Euclidean geometry. Harandi et al. [9] embed Grassmann manifolds into the space of symmetric matrices using an isometric mapping. Later, sparse coding using symmetric positive definite manifolds and Bregman Divergences was introduced in [10]. Sparse representation emphasizes the l1-norm sparsity but ignores the collaborative ability [11]. It is also difficult to acquire many image sets to construct an over-complete dictionary. Recently, Deng et al. [12] proposed a superposed linear representation-based classification (SLRC) model that represents a test image using a superposition of the class centroids and the shared intraclass differences. SLRC can improve the robustness against contaminated training samples and it requires fewer samples to construct an over-complete dictionary. However, SLRC is used for single image classification, and it may not be appropriate for image set classification. Hence, designing an over-complete dictionary in the collaborative representation framework is significant for image set classification. This paper aims to introduce a novel prototype plus variation subspaces (P+V) model and allows this model to naturally and effectively deal with image set classification. How to derive the prototype representation and variation dictionaries in the collaborative representation framework still remains challenging.

To solve the aforementioned limitations, this paper proposes prototype learning and collaborative representation using Grassmann manifolds (GPLCR) and its generalized version (GGPLCR) for image set classification. The proposed methods learn the prototypes for image sets using Grassmann manifolds to construct dictionary to improve the discrimination. More significantly, the proposed methods jointly learn to derive an over-complete dictionary by extracting less discriminative intraclass variances. Therefore, the learned prototypes and variations of the image sets contain common class characteristics and intraclass variations, respectively. Collaborative representation using Grassmann manifolds is developed for classification that maps into the symmetrical space. This optimization with the l2-norm regularization is simple, and it admits a closed-form solution. An illustration of the proposed methods is shown in Fig. 1.

The main contributions of this paper are summarized as follows.

  • We naturally and effectively address image set classification via P+V model. This model can preserve more discriminative class-specific portions and also extract less discriminative intraclass variances. A subsequent over-complete dictionary is constructed to improve the robustness against contaminated training samples and require fewer samples.

  • We integrate prototype learning into the collaborative representation framework, which changes dictionary learning problems into standard Euclidean problems with L1 and L2 penalty terms. Further, we generalize previous special matrices into common sparse matrices and achieve better performance.

  • The proposed method is applied to several recognition tasks including face recognition, object recognition and action recognition. Experimental results show the superiority of the proposed method over the state-of-the-art methods.

The rest of the paper is organized as follows: In Section 2, we briefly introduce the related work. After creating the P+V model, we employ this model in collaborative representation on Grassmann manifolds in Section 3. In Section 4, we give the experimental evaluation and analysis. The conclusion is drawn in Section 5.

Section snippets

Image set classification

In the past decade, there have been a number of works on classification based on image sets. These relevant approaches mainly fall into two categories: parametric and nonparametric methods. Parametric methods use distribution functions such as single Gaussian or Gaussian mixture models (GMM) [13] to represent each image set and measure the similarity between two distribution functions in terms of the Kullback–Leibler Divergence. However, these methods have difficulties estimating the parameters

Proposed method

In this section, we first propose a detailed method to generate prototypes (principal component subspaces) and variations (variation subspaces). Then, we will employ this P+V model to the collaborative representation framework using Grassmann manifolds. Finally, we generalize the previous special matrices into common sparse matrices.

Experiments

We evaluate the proposed method for three visual classification tasks: face recognition, object categorization and gesture recognition. All tasks are handled as classification problems that are based on collaborative representation.

Conclusion

In this paper, we propose an image set classification method called the Grassmann prototype learning and collaborative representation(GPLCR) and its generalized version(GGPLCR). The proposed method learned an over-complete dictionary by building the (P+V) model for the symmetrical space, in which Grassmann manifolds can be embedded. Then, we conduct collaborative representation for the Grassmann manifolds and generalize the sparse coefficient matrices. This method further addressed the

Acknowledgments

This work is supported by the National Science Foundation of China under grant no. 61673220 and no. 61906091, the Natural Science Foundation of Jiangsu Province, China (Youth Fund Project) under grant no. BK20190440, the Fundamental Research Funds for the Central Universities under grant no. 30919011229.

Dong Wei received the B.S. degree from the Nanjing University of Science and Technology, Nanjing, China in 2017. He is currently a doctoral student in Nanjing University of Science and Technology. His research interests include pattern recognition, data mining and image set classification.

References (41)

  • H. Cevikalp et al.

    Face recognition based on image sets

    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on

    (2010)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach.Intell.

    (2009)
  • M. Harandi et al.

    Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution

    Computer Vision (ICCV), 2013 IEEE International Conference on

    (2013)
  • M.T. Harandi et al.

    Sparse coding on symmetric positive definite manifolds using bregman divergences

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • L. Zhang et al.

    Sparse representation or collaborative representation: which helps face recognition?

    Computer Vision (ICCV), 2011 IEEE International Conference on

    (2011)
  • W. Deng et al.

    Face recognition via collaborative representation: its discriminant nature and superposed representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • O. Arandjelovic et al.

    Face recognition with image sets using manifold density divergence

    Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Conference on

    (2005)
  • T.-K. Kim et al.

    Discriminative learning and recognition of image set classes using canonical correlations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • X. Shen et al.

    Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval

    IEEE Trans. Cybern.

    (2017)
  • R. Wang et al.

    Covariance discriminative learning: a natural and efficient approach to image set classification

    Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on

    (2012)
  • Cited by (21)

    • Discrete aggregation hashing for image set classification

      2024, Expert Systems with Applications
    • Two-directional two-dimensional fractional-order embedding canonical correlation analysis for multi-view dimensionality reduction and set-based video recognition

      2023, Expert Systems with Applications
      Citation Excerpt :

      So far, a variety of efforts on set-based video recognition have been dedicated. For instance, covariance discriminative learning (CDL) (Wang et al., 2012), Log-Euclidean metric learning (LEML) (Huang, Wang, Shan, Li et al., 2015), multi-model fusion metric learning (MMFML) (Gao et al., 2019), Riemannian covariance descriptors (RieCovDs) (Chen et al., 2020), prototype learning and collaborative representation using Grassmann manifolds (GPLCR) (Wei et al., 2019), constrained mutual convex cone method (CMCM) (Sogi et al., 2022), sparse approximated nearest points (SANP) (Hu et al., 2012; Zhao et al., 2019), image set based collaborative representation and classification (ISCRC) (Zhu et al., 2014), regularized nearest points (RNP) (Yang et al., 2013). These methods treat each video as a whole, and use some geometry to model the video.

    • Interpolation-based nonrigid deformation estimation under manifold regularization constraint

      2022, Pattern Recognition
      Citation Excerpt :

      Based on the assumption, the adjacency graph, constructed by the specific distance threshold, often uses graph laplacian to represent the intrinsic structure of manifold. The idea of manifold learning has been widely applied to many different tasks, including classification [12], feature selection [13], clustering [14], image retrieval [15] and image registration [16]. In the past few years, many different studies involved in manifold regularization have achieved great progress.

    View all citing articles on Scopus

    Dong Wei received the B.S. degree from the Nanjing University of Science and Technology, Nanjing, China in 2017. He is currently a doctoral student in Nanjing University of Science and Technology. His research interests include pattern recognition, data mining and image set classification.

    Xiaobo Shen received his B.S. and Ph.D. from School of Computer Science and Engineering, Nanjing University of Science and Technology in 2011 and 2017 respectively. He is currently a Professor with the School of Computer Science and Engineering, Nanjing University of Science and Technology, China. He has authored over 30 technical papers in prominent journals and conferences, such as IEEE TNNLS, IEEE TIP, IEEE TCYB, NIPS, ACM MM, AAAI, and IJCAI. His primary research interests are Multi-view Learning, Multi-label Learning, Network Embedding and Hashing.

    Quansen Sun received the Ph.D. degree in pattern recognition and intelligence system from the Nanjing University of Science and Technology (NJUST), Nanjing, China, in 2006. He is a Professor with the Department of Computer Science, NJUST. He was with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, in 2004 and 2005, respectively. He has published more than 100 scientific papers. His current research interests include pattern recognition, image processing, remote sensing information system, and image set classification.

    Xizhan Gao received the B.S. degree and the M.S. degree from the Liaocheng University, Liao-cheng, China in 2011 and 2015, respectively. He is currently a doctoral student in Nanjing University of Science and Technology. His research interests include pattern recognition, data mining and image set classification.

    Wenzhu Yan received the B.S. and M.S. degree from Southwest University of Science and Technology, in 2013 and 2016 respectively. He is currently a doctoral student in Nanjing University of Science and Technology. His current research interests include pattern recognition and image processing.

    View full text