Elsevier

Pattern Recognition Letters

Volume 136, August 2020, Pages 230-236
Pattern Recognition Letters

Discriminative block-diagonal covariance descriptors for image set classification

https://doi.org/10.1016/j.patrec.2020.05.018Get rights and content

Highlights

  • We propose a novel block-diagonal covariance descriptor for image set classification.

  • The proposed descriptor significantly improves the performance as well as efficiency.

  • The proposed descriptor is effective with both shallow features and deep features.

Abstract

Image set classification has recently received much attention due to its various applications in pattern recognition and computer vision. To compare and match image sets, the major challenges are to devise an effective and efficient representation and to define a measure of similarity between image sets. In this paper, we propose a method for representing image sets based on block-diagonal Covariance Descriptors (CovDs). In particular, the proposed image set representation is in the form of non-singular covariance matrices, also known as Symmetric Positive Definite (SPD) matrices, that lie on Riemannian manifold. By dividing each image of an image set into square blocks of the same size, we compute the corresponding block CovDs instead of the global one. Taking the relative discriminative power of these block CovDs into account, a block-diagonal SPD matrix can be constructed to achieve a better discriminative capability. We extend the proposed approach to work with bidirectional CovDs and achieve a further boost in performance. The resulting block-diagonal SPD matrices combined with Riemannian metrics are shown to provide a powerful basis for image set classification. We perform an extensive evaluation on four datasets for several image set classification tasks. The experimental results demonstrate the effectiveness and efficiency of the proposed method.

Introduction

In recent years, the changing nature of visual data acquisition has attracted increasing interest in classification with image sets [1], [2], [3], [4], [5], [6] in the context of practical applications, such as visual surveillance, face recognition with multi-view images, and dynamic scene classification using long term observations. In comparison with the conventional single-shot image classification, image set classification uses sets of multiple images from the same class both as a gallery and as probe samples. These image sets usually cover wide variations of a specific class of objects caused by camera pose changes, non-rigid transformations, or various conditions of illumination. In view of the richer information, a more robust performance can be expected by considering image sets, rather than single-shot images, as the input to decision making. Nevertheless, huge intra-class variability and inter-class ambiguity of image sets have made the effective representation of this information a major issue [5].

Among the previous work on this topic, the prevalent image set classification methods rely on the assumption of specific parametric distributions or geometrical structures. For instance, single Gaussian [7] or Gaussian mixture models [1], [8] have been used to describe the distribution of images in an image set, with Kullback-Leibler divergence adopted as a measure of similarity between different distributions. However, Kim et al. [9] showed that robust performance cannot be guaranteed when the statistical correlation between the gallery and probe sets is weak. Instead of relying on image pixels, a more recent wave of methods exploits some type of image descriptor. The covariance matrix proposed by Tuzel et al. [10] as a region descriptor has received particular attention because of its demonstrable effectiveness in widespread applications, successful examples of which include object recognition [10], human tracking [11], texture categorisation [12], etc. This descriptor has become particularly popular for modelling image sets, because of its efficacy and robustness in capturing data variations [13], [14], [15].

As a second-order statistic, the covariance matrix represents an image set with features from different image samples. While offering advantages and desired properties, full rank Covariance Descriptors (CovDs) naturally lie on a Riemannian manifold of Symmetric Positive Definite (SPD) matrices [16]. As a consequence, conventional learning methods based on Euclidean geometry are inadequate for analysing SPD matrices owing to their neglect of manifold geometry. In an attempt to generalise algorithms from a Euclidean space to Riemannian manifolds, previous studies [13], [17], [18] utilised Riemannian metrics to account for the manifold geometry with promising results.

Despite these achievements, there are still some issues left. First, CovDs constructed from image sets are rarely of full rank, since the dimensionality of CovDs is often larger than the number of images in a set. This results in unreliable covariance estimation and renders Riemannian metrics for SPD matrices inapplicable. To avoid the matrix singularity, one popular solution is to regularise the rank-deficient CovDs by adding a small perturbation to the zero eigenvalues of the matrix. However, a recent study [19] pointed out that this regularisation may deteriorate the performance of CovDs. In addition, the computational complexity of analysing high-dimensional SPD matrices is taxing. As a countermeasure, some algorithms [20], [21] have been proposed to map high-dimensional SPD matrices to a low-dimensional space, but the learning of the mapping (formulated as a manifold optimisation problem) is also time-consuming.

An exactly block-diagonal structure is highly desired for subspace segmentation methods [22], [23] since it can characterize the sample clusters and subspace segmentation more accurately. Based on the self-expression property in Elhamifar and Vidal [23], an ideal block-diagonal structure can also be used to capture the underlying data of samples by embedding the global structure information and discriminative capability [24]. Therefore, promising classification results can be achieved when combing the block-diagonal structure with the discriminative data representation [25], [26]. However, existing block-diagonal representation studies mainly focus on data in vector form, while barely any attention is dedicated to associating block-diagonal structure with Riemannian manifold. In this paper, we propose a novel approach to constructing discriminative block-diagonal CovDs of image sets for the task of classification. The key innovations of the proposed method include: First, we propose representing an image set with a set of block CovDs instead of the full covariance matrix. The aim is to reduce computation time and address the singularity problem. Second, we provide a strategy for building block-diagonal SPD matrices with optimised subsets of these block CovDs, which are obtained by taking the discriminative information of each image block into account. Last, we extend our approach to the bidirectional setting that achieves further size-reduction of a block-diagonal SPD matrix. Moreover, motivated by the proven success of deep networks (e.g., Convolutional Neural Network (CNN)), we show that discriminative block-diagonal CovDs built from CNN features also outperform the simple combination of CovDs and deep architectures. This indicates that our approach is not limited to shallow features and works well with deep features as well. In general, we map the original CovDs on a high-dimensional manifold to more discriminative SPD matrices on a low-dimensional one. The key concepts of our approach are illustrated in Fig. 1.

The rest of this paper is organised as follows. Section 2 introduces the backgrounds of the proposed method. Section 3 presents the proposed method. Section 4 reports the experimental results obtained on a number of image set classification benchmarks. The conclusion is drawn in Section 5.

Section snippets

Preliminaries

This section provides an overview of Riemannian geometry on the manifold of SPD matrices and related Riemannian metrics.

The proposed approach

In this section, we introduce the proposed block-diagonal CovDs representation for image sets. We first describe the process of constructing a block-diagonal SPD matrix from CovDs. Then we extend our approach to the bidirectional setting.

Experiments and the experimental results

Our experiments aim to demonstrate the following:

  • 1.

    The descriptor based on the proposed block-diagonal covariance structure, obtained by partitioning the image, significantly improves the classification performance as well as computational efficiency.

  • 2.

    The proposed descriptor is effective in conjunction with both the original image as well as its deep feature representation.

  • 3.

    The classification accuracy gains are particularly significant in the case of metric-based methods.

  • 4.

    The bidirectional variant

Conclusion

In this paper, we proposed a discriminative block-diagonal structure for modelling image sets with SPD matrices. Instead of the original CovDs, the proposed method partitions each image into square blocks and construct a block diagonal CovDs. In particular, we have derived a criterion of block discriminability of this CovDs representation to find an optimized subset of these blocks, which finally forms block-diagonal SPD matrices for classification, namely BDCovDs and 2D2BDCovDs. Our

Declaration of Competing Interest

Authors declare that they have no conflict of interest.

Acknowledgements

This work was partially supported by the EPSRC Programme Grant (FACER2VM) EP/N007743/1, the EPSRC/DSTL/MURI project EP/R018456/1, the National Natural Science Foundation of China (Grant nos. 61672265, U1836218), and the 111 project of ministry of education of China (Grant no. B12018).

References (51)

  • T.-K. Kim et al.

    Discriminative learning and recognition of image set classes using canonical correlations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • O. Tuzel et al.

    Region covariance: a fast descriptor for detection and classification

    European Conference on Computer Vision

    (2006)
  • F. Porikli et al.

    Covariance tracking using model update based on lie algebra

    Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on

    (2006)
  • R. Sivalingam et al.

    Tensor sparse coding for region covariances

    European Conference on Computer Vision

    (2010)
  • R. Wang et al.

    Covariance discriminative learning: a natural and efficient approach to image set classification

    Computer Vision and Pattern Recognition, 2012 IEEE Computer Society Conference on

    (2012)
  • R. Vemulapalli et al.

    Kernel learning for extrinsic classification of manifold features

    Computer Vision and Pattern Recognition, 2013 IEEE Computer Society Conference on

    (2013)
  • J. Lu et al.

    Image set classification using holistic multiple order statistics features and localized multi-kernel metric learning

    2013 IEEE International Conference on Computer Vision

    (2013)
  • Y. Xie et al.

    Statistical analysis of tensor fields

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2010)
  • M.T. Harandi et al.

    Sparse coding and dictionary learning for symmetric positive definite matrices: a kernel approach

    European Conference on Computer Vision

    (2012)
  • S. Jayasumana et al.

    Kernel methods on the Riemannian manifold of symmetric positive definite matrices

    Computer Vision and Pattern Recognition, 2013 IEEE Computer Society Conference on

    (2013)
  • M. Faraki et al.

    Image set classification by symmetric positive semi-definite matrices

    Applications of Computer Vision, 2016 IEEE Winter Conference on

    (2016)
  • Z. Huang et al.

    Log-euclidean metric learning on symmetric positive definite manifold with application to image set classification

    International Conference on Machine Learning

    (2015)
  • M. Harandi et al.

    Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • G. Liu et al.

    Robust recovery of subspace structures by low-rank representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • E. Elhamifar et al.

    Sparse subspace clustering: algorithm, theory, and applications

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Cited by (0)

    Editor: Sudeep Sarkar

    View full text