Prototype learning and collaborative representation using Grassmann manifolds for image set classification

doi:10.1016/j.patcog.2019.107123

Pattern Recognition

Volume 100, April 2020, 107123

https://doi.org/10.1016/j.patcog.2019.107123 Get rights and content

Highlights

•
Principle component and variation subspaces are constructed for an over-complete dictionary.
•
A novel prototype and variation model (P+V) based collaborative representation for Grassmann manifolds is proposed to deal with image set classification naturally.
•
Previous special matrices are generalized to common sparse matrices.
•
Experimental results show the superiorities of our methods.

Abstract

Image set classification using manifolds is becoming increasingly more attractive since it considers non-Euclidean geometry. However, with the success of dictionary learning for image set classification using manifolds, how to learn an over-complete dictionary is still challenging. This paper proposes a novel prototype subspace learning method, in which a set of images is represented by a linear subspace and then mapped onto a Grassmann manifold. With this subspace representation, class prototypes and intra-class differences can be represented as principal components and variation subspaces, respectively. Isometric mapping further maps the manifolds into the symmetrical space via collaborative representation, which permits a closed-term solution. The proposed method is evaluated for face recognition, object recognition and action recognition. Extensive experimental results on the Honda, Extended YaleB, ETH-80 and Cambridge-Gesture datasets verify the superiority of the proposed method over the state-of-the-art methods.

Introduction

Traditional classification techniques are used for single image classification problems. With the development of digital imaging technology, such as multiview cameras and photo albums, multiple images can be conveniently available for one subject. Therefore, image set classification has recently attracted increased attention in computer vision. Compared with a single image, a set of images comprises individual appearance changes under arbitrary poses, illumination conditions, expressions and other factors. While one subject can be described using increasingly more information, image set classification becomes increasingly challenging due to large intraclass variations.

Image set classification generally focuses on two key issues: how to effectively model an image set and how to measure the similarity between two sets. Many image set classification methods have been proposed [1], [2]. Usually, linear subspaces are used to model image sets [3], [4], and the canonical angle between two subspaces is defined as the similarity between the two sets. Among them, the mutual subspace method (MSM) [3] is the most classic image set subspace method. However, the MSM fails to distinguish the similarities among intraclass and interclass objects. As such, the constrained mutual subspace method (CMSM) [5] was proposed to map original subspaces into a difference subspace such that the difference components can be extracted. It’s notable that this constrained subspace is derived by removing the base vectors of all the reference class subspaces with large variances and keeping the base vectors with relatively small variances. Unfortunately, the CMSM neglects the relationships among subspaces from the same class. Boosted Manifold Principal Angles (BoMPA) [6] extend the concept of principal angles between linear subspaces to manifolds with arbitrary nonlinearities. By contrast, sample-based methods [7] use an affine hull/convex hull to represent an image set. These methods measure the similarity by matching the specific sample-based statics. As studied in [1], an image set can also be represented as a linear subspace, which is denoted as a point on a Grassmann manifold.

Sparse representation and dictionary learning are attracting significant attention [8] for classification. SRC [8] treats the entire training samples as a dictionary for face recognition. To have discriminative sparse coding, several methods consider non-Euclidean geometry. Harandi et al. [9] embed Grassmann manifolds into the space of symmetric matrices using an isometric mapping. Later, sparse coding using symmetric positive definite manifolds and Bregman Divergences was introduced in [10]. Sparse representation emphasizes the l₁-norm sparsity but ignores the collaborative ability [11]. It is also difficult to acquire many image sets to construct an over-complete dictionary. Recently, Deng et al. [12] proposed a superposed linear representation-based classification (SLRC) model that represents a test image using a superposition of the class centroids and the shared intraclass differences. SLRC can improve the robustness against contaminated training samples and it requires fewer samples to construct an over-complete dictionary. However, SLRC is used for single image classification, and it may not be appropriate for image set classification. Hence, designing an over-complete dictionary in the collaborative representation framework is significant for image set classification. This paper aims to introduce a novel prototype plus variation subspaces ( $P + V$ ) model and allows this model to naturally and effectively deal with image set classification. How to derive the prototype representation and variation dictionaries in the collaborative representation framework still remains challenging.

To solve the aforementioned limitations, this paper proposes prototype learning and collaborative representation using Grassmann manifolds (GPLCR) and its generalized version (GGPLCR) for image set classification. The proposed methods learn the prototypes for image sets using Grassmann manifolds to construct dictionary to improve the discrimination. More significantly, the proposed methods jointly learn to derive an over-complete dictionary by extracting less discriminative intraclass variances. Therefore, the learned prototypes and variations of the image sets contain common class characteristics and intraclass variations, respectively. Collaborative representation using Grassmann manifolds is developed for classification that maps into the symmetrical space. This optimization with the l₂-norm regularization is simple, and it admits a closed-form solution. An illustration of the proposed methods is shown in Fig. 1.

The main contributions of this paper are summarized as follows.

•
We naturally and effectively address image set classification via $P + V$ model. This model can preserve more discriminative class-specific portions and also extract less discriminative intraclass variances. A subsequent over-complete dictionary is constructed to improve the robustness against contaminated training samples and require fewer samples.
•
We integrate prototype learning into the collaborative representation framework, which changes dictionary learning problems into standard Euclidean problems with L1 and L2 penalty terms. Further, we generalize previous special matrices into common sparse matrices and achieve better performance.
•
The proposed method is applied to several recognition tasks including face recognition, object recognition and action recognition. Experimental results show the superiority of the proposed method over the state-of-the-art methods.

The rest of the paper is organized as follows: In Section 2, we briefly introduce the related work. After creating the P+V model, we employ this model in collaborative representation on Grassmann manifolds in Section 3. In Section 4, we give the experimental evaluation and analysis. The conclusion is drawn in Section 5.

Section snippets

Image set classification

In the past decade, there have been a number of works on classification based on image sets. These relevant approaches mainly fall into two categories: parametric and nonparametric methods. Parametric methods use distribution functions such as single Gaussian or Gaussian mixture models (GMM) [13] to represent each image set and measure the similarity between two distribution functions in terms of the Kullback–Leibler Divergence. However, these methods have difficulties estimating the parameters

Proposed method

In this section, we first propose a detailed method to generate prototypes (principal component subspaces) and variations (variation subspaces). Then, we will employ this $P + V$ model to the collaborative representation framework using Grassmann manifolds. Finally, we generalize the previous special matrices into common sparse matrices.

Experiments

We evaluate the proposed method for three visual classification tasks: face recognition, object categorization and gesture recognition. All tasks are handled as classification problems that are based on collaborative representation.

Conclusion

In this paper, we propose an image set classification method called the Grassmann prototype learning and collaborative representation(GPLCR) and its generalized version(GGPLCR). The proposed method learned an over-complete dictionary by building the (P+V) model for the symmetrical space, in which Grassmann manifolds can be embedded. Then, we conduct collaborative representation for the Grassmann manifolds and generalize the sparse coefficient matrices. This method further addressed the

Acknowledgments

This work is supported by the National Science Foundation of China under grant no. 61673220 and no. 61906091, the Natural Science Foundation of Jiangsu Province, China (Youth Fund Project) under grant no. BK20190440, the Fundamental Research Funds for the Central Universities under grant no. 30919011229.

Dong Wei received the B.S. degree from the Nanjing University of Science and Technology, Nanjing, China in 2017. He is currently a doctoral student in Nanjing University of Science and Technology. His research interests include pattern recognition, data mining and image set classification.

References (41)

T.K. Kim et al.
Boosted manifold principal angles for image set-based recognition
Pattern Recognit.
(2007)
H.L. Tan et al.
Grassmann manifold for nearest points image set classification
Pattern Recognit. Lett.
(2015)
M. Yang et al.
Face recognition based on regularized nearest points between image sets
Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on
(2013)
Q.S. Zeng et al.
Multi-local model image set matching based on domain description
Pattern Recognit.
(2014)
P. Zheng et al.
Image set classification based on cooperative sparse representation
Pattern Recognit.
(2017)
R. Wang et al.
Manifold-manifold distance with application to face recognition based on image set
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
(2008)
R. Wang et al.
Manifold discriminant analysis
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
(2009)
O. Yamaguchi et al.
Face recognition using temporal image sequence
Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on
(1998)
H.L. Tan et al.
Regularized constraint subspace based method for image set classification
Pattern Recognit.
(2018)
K. Fukui et al.
Face recognition using multi-viewpoint patterns for robot vision
Robotics Research. The Eleventh International Symposium
(2005)

H. Cevikalp et al.

Face recognition based on image sets

Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on

(2010)

J. Wright et al.

Robust face recognition via sparse representation

IEEE Trans. Pattern Anal. Mach.Intell.

(2009)

M. Harandi et al.

Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution

Computer Vision (ICCV), 2013 IEEE International Conference on

(2013)

M.T. Harandi et al.

Sparse coding on symmetric positive definite manifolds using bregman divergences

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

L. Zhang et al.

Sparse representation or collaborative representation: which helps face recognition?

Computer Vision (ICCV), 2011 IEEE International Conference on

(2011)

W. Deng et al.

Face recognition via collaborative representation: its discriminant nature and superposed representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

O. Arandjelovic et al.

Face recognition with image sets using manifold density divergence

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Conference on

(2005)

T.-K. Kim et al.

Discriminative learning and recognition of image set classes using canonical correlations

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

X. Shen et al.

Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval

IEEE Trans. Cybern.

(2017)

R. Wang et al.

Covariance discriminative learning: a natural and efficient approach to image set classification

Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on

(2012)

Cited by (21)

Discrete aggregation hashing for image set classification
2024, Expert Systems with Applications
With the development of vision technology, image set classification (ISC) has flourished in the image processing field. Different from the one-shot image classification, ISC focuses on the set rather than a one-shot image. Hence, ISC can synthesize the abundant set information to alleviate various appearance variations. Despite the great success of the existing ISC methods, there are still some problems: (1) They usually face an expensive time complexity, which directly limits the practical application; (2) They largely ignore the intrinsic relationships between different sets. In light of this, we propose a novel Discrete Aggregation Hashing (DAH) for fast ISC. To be specific, to extract more semantic information from each set and each sample, we adopt the same projection standard to embed dual semantic labels (i.e., sample label and set label) into instance and set hash codes. Then we regard set hash codes as set-specific centers. A hashing aggregation strategy is proposed to learn compact discriminative instance hash codes via iteratively aggregating intrinsic neighborhood representations around each central node. Therefore, instance hash codes can obtain greater intra-set compactness and inter-set separability. Extensive experiments demonstrate that our DAH can obtain promising performance and outperform these state-of-the-art ISC methods on four image set datasets.
Reconciliation of statistical and spatial sparsity for robust visual classification
2023, Neurocomputing
Recent image classification algorithms, by learning deep features from large-scale datasets, have achieved significantly better results comparing to the classic feature-based approaches. However, there are still various challenges of image classifications in practice, such as classifying noisy image or image-set queries, and training deep image classification models over the limited-scale dataset. Instead of applying generic deep features, the model-based approaches can be more effective and data-efficient for robust image and image-set classification tasks, as various image priors are exploited for modeling the inter- and intra-set data variations while preventing over-fitting. In this work, we propose a novel Joint Statistical and Spatial Sparse representation scheme, dubbed J3S, to model the image or image-set data for classification. J3S utilized joint sparse representation to reconcile both the local image structures and global Gaussian distribution mapped into Riemannian manifold. The learned J3S models are used for robust image and image-set classification tasks. Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
Two-directional two-dimensional fractional-order embedding canonical correlation analysis for multi-view dimensionality reduction and set-based video recognition
2023, Expert Systems with Applications
Citation Excerpt :
So far, a variety of efforts on set-based video recognition have been dedicated. For instance, covariance discriminative learning (CDL) (Wang et al., 2012), Log-Euclidean metric learning (LEML) (Huang, Wang, Shan, Li et al., 2015), multi-model fusion metric learning (MMFML) (Gao et al., 2019), Riemannian covariance descriptors (RieCovDs) (Chen et al., 2020), prototype learning and collaborative representation using Grassmann manifolds (GPLCR) (Wei et al., 2019), constrained mutual convex cone method (CMCM) (Sogi et al., 2022), sparse approximated nearest points (SANP) (Hu et al., 2012; Zhao et al., 2019), image set based collaborative representation and classification (ISCRC) (Zhu et al., 2014), regularized nearest points (RNP) (Yang et al., 2013). These methods treat each video as a whole, and use some geometry to model the video.
Set-based video recognition is an important application in practice, and many specialized approaches have been proposed. However, most of these methods either only use one kind of visual features for classification, or are sensitive to the noises and the number of training images when using several kinds of features, resulting in limited discrimination. To explore a possible solution to these issues, in this paper, a novel efficiently and effectively dimensionality reduction method, named two-directional two-dimensional fractional-order embedding canonical correlation analysis ((2D) $^{2}$ FECCA), is proposed. (2D) $^{2}$ FECCA borrows the idea of fractional-order embedding to correct the estimation of sample covariance matrices, which can significantly reduce the influence of noise disturbance and effectively utilizing the discriminative information from different view features. In addition, several set-based video recognition schemes are introduced to determine the labels of the test videos. Extensive experimental results on four familiar single image based databases and two video based benchmark databases demonstrate the effectiveness of the proposed method. These quantitative assessments reinforce the significance as well as the importance of embedding the proposed method in other intelligent systems application areas.
Interpolation-based nonrigid deformation estimation under manifold regularization constraint
2022, Pattern Recognition
Citation Excerpt :
Based on the assumption, the adjacency graph, constructed by the specific distance threshold, often uses graph laplacian to represent the intrinsic structure of manifold. The idea of manifold learning has been widely applied to many different tasks, including classification [12], feature selection [13], clustering [14], image retrieval [15] and image registration [16]. In the past few years, many different studies involved in manifold regularization have achieved great progress.
This paper addresses the image/surface deformation problem by estimating interpolation functions pixel by pixel(or voxel by voxel) between control point pairs using labeled control points and unlabeled feature points as input. The labeled control points are usually selected by users and labeled through user operations; the unlabeled feature points are extracted from the source image. We formulate the interpolation function estimation at each pixel as a weighted semi-supervised learning problem. Specially, we employ moving least squares to estimate the nonrigid deformation function according to the weights between each pixel and the labeled control points and exploit manifold regularization to preserve the intrinsic geometric information of the unlabeled feature points contained in the object. Moreover, we define the nonrigid deformation function in a reproducing kernel Hilbert space to derive a closed-form solution. To reduce the computational complexity, we also adopt a sparse approximation to realize a fast implementation. It is worth mentioning that our proposed method is a unified framework with two different basis functions. Both basis-function-based methods are applied to 2D image deformation, 3D surface deformation, and medical image registration. Extensive experiments on the data and the resulting mean opinion score (MOS) on the 2D deformation demonstrate that our methods are superior to state-of-the-art ones.
A class-specific mean vector-based weighted competitive and collaborative representation method for classification
2022, Neural Networks
Collaborative representation-based classification (CRC), as a typical kind of linear representation-based classification, has attracted more attention due to the effective and efficient pattern classification performance. However, the existing class-specific representations are not competitively learned from collaborative representation for achieving more informative pattern discrimination among all the classes. With the purpose of enhancing the power of competitive and discriminant representations among all the classes for favorable classification, we propose a novel CRC method called the class-specific mean vector-based weighted competitive and collaborative representation (CMWCCR). The CMWCCR mainly contains three discriminative constraints including the competitive, mean vector and weighted constraints that fully employ the discrimination information in different ways. In the competitive constraint, the representations from any one class and the other classes are adapted for learning competitive representations among all the classes. In the newly designed mean vector constraint, the mean vectors of all the class-specific training samples with the corresponding class-specific representations are taken into account to further enhance the competitive representations. In the devised weighted constraint, the class-specific weights are constrained on the representation coefficients to make the similar classes have more representation contributions to strengthening the discrimination among all the class-specific representations. Thus, these three constraints in the unified CMWCCR model can complement each other for competitively learning the discriminative class-specific representations. To verify the CMWCCR classification performance, the extensive experiments are conducted on twenty-eight data sets in comparisons with the state-of-the-art representation-based classification methods. The experimental results show that the proposed CMWCCR is an effective and robust CRC method with satisfactory performance.
Neighborhood preserving embedding on Grassmann manifold for image-set analysis
2022, Pattern Recognition
Modeling image sets as points on Grassmann manifold has attracted increasing interests in computer vision community and has been applied to many applications. However, such approaches have suffered from the limitation that high computational cost on Grassmann manifold must be involved, especially high-dimensional ones. In this paper, we propose an unsupervised robust dimensionality reduction algorithm for Grassmann manifold based on Neighborhood Preserving Embedding (GNPE). We first introduce two strategies to construct the coefficients-based similarity graph to eliminate the effects of errors. Then, a projection is learned from the high-dimensional Grassmann manifold to the relative low-dimensional one with more discriminative capability, where the local neighborhood structure is well preserved. To address the issue that the estimated similarity graph is unreliable with noise and outliers, we further propose a unified learning framework which performs similarity learning and projection learning simultaneously. By leveraging the interactions between these two essential tasks, we can capture accurate structures and learn discriminative projections. The proposed method can be optimized by an efficient iterative algorithm. Experiments on various image set classification and clustering tasks clearly show that our model achieves consistent improvements in terms of both effectiveness and efficiency.

View all citing articles on Scopus

Xiaobo Shen received his B.S. and Ph.D. from School of Computer Science and Engineering, Nanjing University of Science and Technology in 2011 and 2017 respectively. He is currently a Professor with the School of Computer Science and Engineering, Nanjing University of Science and Technology, China. He has authored over 30 technical papers in prominent journals and conferences, such as IEEE TNNLS, IEEE TIP, IEEE TCYB, NIPS, ACM MM, AAAI, and IJCAI. His primary research interests are Multi-view Learning, Multi-label Learning, Network Embedding and Hashing.

Quansen Sun received the Ph.D. degree in pattern recognition and intelligence system from the Nanjing University of Science and Technology (NJUST), Nanjing, China, in 2006. He is a Professor with the Department of Computer Science, NJUST. He was with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, in 2004 and 2005, respectively. He has published more than 100 scientific papers. His current research interests include pattern recognition, image processing, remote sensing information system, and image set classification.

Xizhan Gao received the B.S. degree and the M.S. degree from the Liaocheng University, Liao-cheng, China in 2011 and 2015, respectively. He is currently a doctoral student in Nanjing University of Science and Technology. His research interests include pattern recognition, data mining and image set classification.

Wenzhu Yan received the B.S. and M.S. degree from Southwest University of Science and Technology, in 2013 and 2016 respectively. He is currently a doctoral student in Nanjing University of Science and Technology. His current research interests include pattern recognition and image processing.

View full text

Prototype learning and collaborative representation using Grassmann manifolds for image set classification

Highlights

Abstract

Introduction

Section snippets

Image set classification

Proposed method

Experiments

Conclusion

Acknowledgments

Pattern Recognit.

Pattern Recognit. Lett.

Pattern Recognit.

Pattern Recognit.

Manifold-manifold distance with application to face recognition based on image set

Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on

Manifold discriminant analysis

Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on

Face recognition using temporal image sequence

Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on

Regularized constraint subspace based method for image set classification

Pattern Recognit.

Face recognition using multi-viewpoint patterns for robot vision

Robotics Research. The Eleventh International Symposium

Face recognition based on image sets

Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on

Robust face recognition via sparse representation

IEEE Trans. Pattern Anal. Mach.Intell.

Dictionary learning and sparse coding on grassmann manifolds: an extrinsic solution

Computer Vision (ICCV), 2013 IEEE International Conference on

Sparse coding on symmetric positive definite manifolds using bregman divergences

IEEE Trans. Neural Netw. Learn. Syst.

Sparse representation or collaborative representation: which helps face recognition?

Computer Vision (ICCV), 2011 IEEE International Conference on

Face recognition via collaborative representation: its discriminant nature and superposed representation

IEEE Trans. Pattern Anal. Mach. Intell.

Face recognition with image sets using manifold density divergence

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Conference on

Discriminative learning and recognition of image set classes using canonical correlations

IEEE Trans. Pattern Anal. Mach. Intell.

Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval

IEEE Trans. Cybern.

Covariance discriminative learning: a natural and efficient approach to image set classification

Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on