Discriminative block-diagonal covariance descriptors for image set classification☆
Introduction
In recent years, the changing nature of visual data acquisition has attracted increasing interest in classification with image sets [1], [2], [3], [4], [5], [6] in the context of practical applications, such as visual surveillance, face recognition with multi-view images, and dynamic scene classification using long term observations. In comparison with the conventional single-shot image classification, image set classification uses sets of multiple images from the same class both as a gallery and as probe samples. These image sets usually cover wide variations of a specific class of objects caused by camera pose changes, non-rigid transformations, or various conditions of illumination. In view of the richer information, a more robust performance can be expected by considering image sets, rather than single-shot images, as the input to decision making. Nevertheless, huge intra-class variability and inter-class ambiguity of image sets have made the effective representation of this information a major issue [5].
Among the previous work on this topic, the prevalent image set classification methods rely on the assumption of specific parametric distributions or geometrical structures. For instance, single Gaussian [7] or Gaussian mixture models [1], [8] have been used to describe the distribution of images in an image set, with Kullback-Leibler divergence adopted as a measure of similarity between different distributions. However, Kim et al. [9] showed that robust performance cannot be guaranteed when the statistical correlation between the gallery and probe sets is weak. Instead of relying on image pixels, a more recent wave of methods exploits some type of image descriptor. The covariance matrix proposed by Tuzel et al. [10] as a region descriptor has received particular attention because of its demonstrable effectiveness in widespread applications, successful examples of which include object recognition [10], human tracking [11], texture categorisation [12], etc. This descriptor has become particularly popular for modelling image sets, because of its efficacy and robustness in capturing data variations [13], [14], [15].
As a second-order statistic, the covariance matrix represents an image set with features from different image samples. While offering advantages and desired properties, full rank Covariance Descriptors (CovDs) naturally lie on a Riemannian manifold of Symmetric Positive Definite (SPD) matrices [16]. As a consequence, conventional learning methods based on Euclidean geometry are inadequate for analysing SPD matrices owing to their neglect of manifold geometry. In an attempt to generalise algorithms from a Euclidean space to Riemannian manifolds, previous studies [13], [17], [18] utilised Riemannian metrics to account for the manifold geometry with promising results.
Despite these achievements, there are still some issues left. First, CovDs constructed from image sets are rarely of full rank, since the dimensionality of CovDs is often larger than the number of images in a set. This results in unreliable covariance estimation and renders Riemannian metrics for SPD matrices inapplicable. To avoid the matrix singularity, one popular solution is to regularise the rank-deficient CovDs by adding a small perturbation to the zero eigenvalues of the matrix. However, a recent study [19] pointed out that this regularisation may deteriorate the performance of CovDs. In addition, the computational complexity of analysing high-dimensional SPD matrices is taxing. As a countermeasure, some algorithms [20], [21] have been proposed to map high-dimensional SPD matrices to a low-dimensional space, but the learning of the mapping (formulated as a manifold optimisation problem) is also time-consuming.
An exactly block-diagonal structure is highly desired for subspace segmentation methods [22], [23] since it can characterize the sample clusters and subspace segmentation more accurately. Based on the self-expression property in Elhamifar and Vidal [23], an ideal block-diagonal structure can also be used to capture the underlying data of samples by embedding the global structure information and discriminative capability [24]. Therefore, promising classification results can be achieved when combing the block-diagonal structure with the discriminative data representation [25], [26]. However, existing block-diagonal representation studies mainly focus on data in vector form, while barely any attention is dedicated to associating block-diagonal structure with Riemannian manifold. In this paper, we propose a novel approach to constructing discriminative block-diagonal CovDs of image sets for the task of classification. The key innovations of the proposed method include: First, we propose representing an image set with a set of block CovDs instead of the full covariance matrix. The aim is to reduce computation time and address the singularity problem. Second, we provide a strategy for building block-diagonal SPD matrices with optimised subsets of these block CovDs, which are obtained by taking the discriminative information of each image block into account. Last, we extend our approach to the bidirectional setting that achieves further size-reduction of a block-diagonal SPD matrix. Moreover, motivated by the proven success of deep networks (e.g., Convolutional Neural Network (CNN)), we show that discriminative block-diagonal CovDs built from CNN features also outperform the simple combination of CovDs and deep architectures. This indicates that our approach is not limited to shallow features and works well with deep features as well. In general, we map the original CovDs on a high-dimensional manifold to more discriminative SPD matrices on a low-dimensional one. The key concepts of our approach are illustrated in Fig. 1.
The rest of this paper is organised as follows. Section 2 introduces the backgrounds of the proposed method. Section 3 presents the proposed method. Section 4 reports the experimental results obtained on a number of image set classification benchmarks. The conclusion is drawn in Section 5.
Section snippets
Preliminaries
This section provides an overview of Riemannian geometry on the manifold of SPD matrices and related Riemannian metrics.
The proposed approach
In this section, we introduce the proposed block-diagonal CovDs representation for image sets. We first describe the process of constructing a block-diagonal SPD matrix from CovDs. Then we extend our approach to the bidirectional setting.
Experiments and the experimental results
Our experiments aim to demonstrate the following:
- 1.
The descriptor based on the proposed block-diagonal covariance structure, obtained by partitioning the image, significantly improves the classification performance as well as computational efficiency.
- 2.
The proposed descriptor is effective in conjunction with both the original image as well as its deep feature representation.
- 3.
The classification accuracy gains are particularly significant in the case of metric-based methods.
- 4.
The bidirectional variant
Conclusion
In this paper, we proposed a discriminative block-diagonal structure for modelling image sets with SPD matrices. Instead of the original CovDs, the proposed method partitions each image into square blocks and construct a block diagonal CovDs. In particular, we have derived a criterion of block discriminability of this CovDs representation to find an optimized subset of these blocks, which finally forms block-diagonal SPD matrices for classification, namely BDCovDs and 2D2BDCovDs. Our
Declaration of Competing Interest
Authors declare that they have no conflict of interest.
Acknowledgements
This work was partially supported by the EPSRC Programme Grant (FACER2VM) EP/N007743/1, the EPSRC/DSTL/MURI project EP/R018456/1, the National Natural Science Foundation of China (Grant nos. 61672265, U1836218), and the 111 project of ministry of education of China (Grant no. B12018).
References (51)
- et al.
Kernel Grassmannian distances and discriminant analysis for face recognition from image sets
Pattern Recognit. Lett.
(2009) - et al.
Feature selection for pattern classification with Gaussian mixture models: a new objective criterion
Pattern Recognit. Lett.
(1996) - et al.
Deformable registration of diffusion tensor MR images with explicit orientation optimization
Med. Image Anal.
(2006) - et al.
Choroid segmentation from optical coherence tomography with graph-edge weights learned from deep convolutional neural networks
Neurocomputing
(2017) - et al.
Face recognition with image sets using manifold density divergence
Computer Vision and Pattern Recognition, 2005 IEEE Computer Society Conference on
(2005) - et al.
Dirichlet process mixture models on symmetric positive definite matrices for appearance clustering in video surveillance applications
Computer Vision and Pattern Recognition, 2011 IEEE Computer Society Conference on
(2011) - et al.
Image set based face recognition using self-regularized non-negative coding and adaptive distance metric learning
IEEE Trans. Image Process.
(2013) - et al.
Beyond gauss: image-set matching on the Riemannian manifold of PDFs
2015 IEEE International Conference on Computer Vision
(2015) - et al.
Deep reconstruction models for image set classification
IEEE Trans. Pattern Anal. Mach. Intell.
(2015) - et al.
Face recognition from long-term observations
European Conference on Computer Vision
(2002)
Discriminative learning and recognition of image set classes using canonical correlations
IEEE Trans. Pattern Anal. Mach. Intell.
Region covariance: a fast descriptor for detection and classification
European Conference on Computer Vision
Covariance tracking using model update based on lie algebra
Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on
Tensor sparse coding for region covariances
European Conference on Computer Vision
Covariance discriminative learning: a natural and efficient approach to image set classification
Computer Vision and Pattern Recognition, 2012 IEEE Computer Society Conference on
Kernel learning for extrinsic classification of manifold features
Computer Vision and Pattern Recognition, 2013 IEEE Computer Society Conference on
Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning
2013 IEEE International Conference on Computer Vision
Statistical analysis of tensor fields
International Conference on Medical Image Computing and Computer-Assisted Intervention
Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach
European Conference on Computer Vision
Kernel methods on the Riemannian manifold of symmetric positive definite matrices
Computer Vision and Pattern Recognition, 2013 IEEE Computer Society Conference on
Image set classification by symmetric positive semi-definite matrices
Applications of Computer Vision, 2016 IEEE Winter Conference on
Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification
International Conference on Machine Learning
Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods
IEEE Trans. Pattern Anal. Mach. Intell.
Robust recovery of subspace structures by low-rank representation
IEEE Trans. Pattern Anal. Mach. Intell.
Sparse subspace clustering: algorithm, theory, and applications
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (0)
- ☆
Editor: Sudeep Sarkar