Elsevier

Image and Vision Computing

Volume 31, Issue 12, December 2013, Pages 895-904
Image and Vision Computing

Integration of multi-feature fusion and dictionary learning for face recognition

https://doi.org/10.1016/j.imavis.2013.10.002Get rights and content

Highlights

  • We propose two strategies for face recognition through multiple features.

  • Our methods integrate multi-feature fusion and dictionary learning.

  • The fusion process and dictionary learning are learned simultaneously.

  • Extensive experiments validate the merits of our methods.

Abstract

Recent research emphasizes more on analyzing multiple features to improve face recognition (FR) performance. One popular scheme is to extend the sparse representation based classification framework with various sparse constraints. Although these methods jointly study multiple features through the constraints, they just process each feature individually such that they overlook the possible high-level relationship among different features. It is reasonable to assume that the low-level features of facial images, such as edge information and smoothed/low-frequency image, can be fused into a more compact and more discriminative representation based on the latent high-level relationship. FR on the fused features is anticipated to produce better performance than that on the original features, since they provide more favorable properties. Focusing on this, we propose two different strategies which start from fusing multiple features and then exploit the dictionary learning (DL) framework for better FR performance. The first strategy is a simple and efficient two-step model, which learns a fusion matrix from training face images to fuse multiple features and then learns class-specific dictionaries based on the fused features. The second one is a more effective model requiring more computational time that learns the fusion matrix and the class-specific dictionaries simultaneously within an iterative optimization procedure. Besides, the second model considers to separate the shared common components from class-specified dictionaries to enhance the discrimination power of the dictionaries. The proposed strategies, which integrate multi-feature fusion process and dictionary learning framework for FR, realize the following goals: (1) exploiting multiple features of face images for better FR performances; (2) learning a fusion matrix to merge the features into a more compact and more discriminative representation; (3) learning class-specific dictionaries with consideration of the common patterns for better classification performance. We perform a series of experiments on public available databases to evaluate our methods, and the experimental results demonstrate the effectiveness of the proposed models.

Introduction

With the recent endeavor of computer vision researchers, lots of features have been designed to characterize various aspects of an object. Taking advantage of multiple features can provide more information for face recognition (FR), and the advantages of jointly analyzing multiple features are demonstrated in the literature [1], [2], [3], [4]. Although it is widely believed that recognition performance can benefit from multiple features, in front of the developed multi-feature approaches, it remains an exploratory task to design a more effective and more efficient method to exploit multiple features.

In recent years, several FR methods [5], [6], [7] have been developed based on the dictionary learning (DL) framework, and achieved very promising results. These DL-based FR methods are mainly developed in the following two tracks [8]:

  • 1.

    Directly making the dictionary discriminative, such as learning a class-specified sub-dictionary for each class;

  • 2.

    Making the sparse coefficients discriminative to propagate the discrimination power to the dictionary.

Even though DL-based recognition methods achieve very promising and even state-of-the-art performances, they only work on a single feature type, e.g. the original grayscale facial image or facial outline image, rather than multiple informative features. In other words, they cannot exploit multiple features of one face image and their possible semantic relationships to enhance FR performance.

Aware of the limitations of these DL-based methods that they cannot deal with multiple features, researchers have proposed several methods to tackle this problem [9], [10], [11]. Yuan and Yan propose a multi-task joint sparse representation based classification method (MTJSRC), which treats the recognition with multiple features as a multi-task problem, and each feature type is one task [9]. MTJSRC assumes that the coefficients share the same sparsity pattern among all the features. However, this assumption is too strict and is not held in practice. Therefore, Zhang et al. propose a joint dynamic sparse representation classification method (JDSRC) [10] to address this problem. They argue that the same sparsity pattern is shared among the coefficients at class-level, but not necessarily at atom-level. Yang et al. also address this problem by proposing a relaxed collaborative representation method (RCR), which assumes that the sparse codes among different features should be similar in appearance [11].

All the above three methods elaborately use multiple features and try to exploit the sparse pattern between the coefficients of different features. Although they produce improved performance, there are still some intrinsic problems:

  • 1.

    Since the overall dictionary consists of all the features from all training images, when the training data increase in number, the dictionary will become too large that can lower the computational efficiency;

  • 2.

    Simply taking all features into computation will raise the computational burden and will induce redundant information that does not benefit or even can degrade FR performance;

  • 3.

    Although different features are connected through the coefficient constraints, these methods neglect the internal relationships among these features which may enhance the FR performance further;

  • 4.

    The dictionary constituted by all the training data has common components that are shared by different classes, and these components can be interchangeably used for reconstructing the query images, in which way the performance can be compromised.

To address the above problems, we extend our previous work reported in [12], [13] with the proposed two different strategies to integrate multi-feature fusion process1 and dictionary learning framework. The first one is a two-step model, which first learns a fusion matrix from the training data to fuse different features and then learns class-specific dictionaries. The fusion process exploits the high-level relationship among different features, and fuse these features into a more compact and more discriminative representation. Over the fused features of one specific class, the corresponding dictionary is learned. The second strategy is to learn the fusion matrix and class-specific dictionaries simultaneously. This strategy takes more time but produces better performance. Moreover, in this scheme, we explicitly separate the common components from different classes in the dictionary to make the learned dictionary more compact and more discriminative. As demonstrated by the experimental results, our two strategies both achieve better performances than other closely related methods.

The rest of this paper is organized as follows. In Section 2, we briefly introduce the background and review several approaches that motivate ours. We elaborate the proposed two strategies in Section 3. Extensive experiments on three face recognition datasets are presented in Section 4. Finally, we conclude our paper in Section 5 with discussions.

Section snippets

Tensor Algebra

As we consider to generalize dictionary learning method to multiple features, we turn to the tensor algebra calculation framework. The notations and calculations are the following [16], [17]. High-order tensors are denoted by boldface Euler script letters, e.g. X. Specially, Xn symbolizes the matrix corresponding to the flattened tensor X along the nth mode. Mathematically, tensor element Xi1,i2,,iK of a Kth-order tensor XRI1×I2××IK maps to the element (ik,j) of matrix Xn, where:j=1+k=1knKi

Our methodology

As discussed in Section 2.2, these SRC-based multiple-feature FR methods focus too much on imposing constraints on coefficients and they ignore the semantic relationship among different features. As discussed in Section 1, there are some drawbacks among these methods. Concentrating on these concerns, we propose two different DL based multiple-feature fusion strategies to improve the FR performance. One is an efficient and simple method, where a core dictionary is learned based on the fused more

Experiments

In this section, we evaluate our two methods through a series experiments on three public available datasets: Extended Yale B [29], CMU-PIE [30] and LFW [31]. To fairly demonstrate the effectiveness of our method, we choose some closely related approaches for comparison. These methods include holistic SRC (H-SRC) [18], separate SRC (S-SRC) [10], MTJSRC [9], JDSRC [10] and RCR [11]. H-SRC and S-SRC act as baseline methods, in which H-SRC concatenates all the features into a huge vector, while

Conclusion and future work

In this paper, we discuss how to exploit multiple features for better FR performance. We demonstrate that popular sparse coding based methods only put effort on how to constrain sparse coefficients to connect different features, and the SRC based sparse coding scheme is time-consuming, when facing large-scale situations. To address these problems, we propose two different strategies.

The first one is to learn a fusion matrix based on Fisher criterion from the training data to fuse the different

Acknowledgment

This work is supported by the Natural Science Foundations of China (no. 61071218) and 973 Program (project no. 2010CB327904).

References (37)

  • S. Kong et al.

    A brief summary of dictionary learning based approach for classification

    (2012)
  • X.-T. Yuan et al.

    Visual Classification with Multi-task Joint Sparse Representation

    (2010)
  • H. Zhang et al.

    Multi-observation Visual Recognition via Joint Dynamic Sparse Representation

    (2011)
  • M. Yang et al.

    Relaxed Collaborative Representation for Pattern Classification

    (2012)
  • S. Kong et al.

    Learning individual-specific dictionaries with fused multiple features for face recognition

  • S. Kong et al.

    Multiple feature fusion for face recognition

  • S. Gupta et al.

    Anthropometric 3D face recognition

    Int. J. Comput. Vis.

    (2010)
  • T.G. Kolda et al.

    Tensor decompositions and applications

    SIAM Rev.

    (2009)
  • Cited by (16)

    • A vague set approach for identifying shot transition in videos using multiple feature amalgamation

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      Multiple feature fusion is still an open research problem [31]. This strategy has been used for several works like face identification [32,33], facial expression classification [34], surveillance video indexing [35], multimedia semantic understanding [36], target tracking in videos [37,38], natural language translation [39], audio features classification [40], visual tracking [41], image matching [42], image retrieval [43], place classification [44], robotics applications [45], multiple object tracking [46], shadow removal [47], thermal face recognition [48], social media applications [49], 3D model retrieval [50], drug pathway interaction prediction [51] etc. In some works [52,53], an ensemble of features were used without fusion, to solve a research problem.

    • Spatiotemporal features of DCE-MRI for breast cancer diagnosis

      2018, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Fig. 7 shows a tree representation of the feature types used as the primary features. Various studies have focused on using features from different domains to improve the classification process [58–62]. Most of the studies on breast cancer diagnosis use computational approaches and focus on different classification methods to reach the best diagnosis rate using spatial or temporal features.

    • A ℓ<inf>2, 1</inf> norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD

      2017, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Based on the traditional dictionary learning, an ℓ2, 1 regularization is introduced to allow the sparse representations share the consistent pattern across all the modalities of the same class. Wang et al. proposed two different strategies to learn a fusion matrix for training face images with multiple features based on dictionary learning framework [38]. The first one is to learn a fusion matrix based on Fisher criterion from the training data to fuse the different features; The second strategy is to learn the fusion matrix and the core dictionary simultaneously, and use MMC to refine the fusion matrix.

    • Open-set face recognition across look-alike faces in real-world scenarios

      2017, Image and Vision Computing
      Citation Excerpt :

      Recently, sparse representation has had prosperous performance on face recognition and it has been successfully applied to automatic face recognition. However, most sparse representation methods for face recognition require several images for each subject in the gallery [12–15,37–39]. Since some real-world databases for face recognition do not have multiple images per person in the gallery, with just one image per subject in the gallery, this paper proposes a novel idea to overcome this challenge by 3D modeling from gallery images and synthesizing them for generating several images.

    • Supervised within-class-similar discriminative dictionary learning for face recognition

      2016, Journal of Visual Communication and Image Representation
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Massimo Tistarelli.

    View full text