Integration of multi-feature fusion and dictionary learning for face recognition

doi:10.1016/j.imavis.2013.10.002

Image and Vision Computing

Volume 31, Issue 12, December 2013, Pages 895-904

https://doi.org/10.1016/j.imavis.2013.10.002 Get rights and content

Highlights

•
We propose two strategies for face recognition through multiple features.
•
Our methods integrate multi-feature fusion and dictionary learning.
•
The fusion process and dictionary learning are learned simultaneously.
•
Extensive experiments validate the merits of our methods.

Abstract

Recent research emphasizes more on analyzing multiple features to improve face recognition (FR) performance. One popular scheme is to extend the sparse representation based classification framework with various sparse constraints. Although these methods jointly study multiple features through the constraints, they just process each feature individually such that they overlook the possible high-level relationship among different features. It is reasonable to assume that the low-level features of facial images, such as edge information and smoothed/low-frequency image, can be fused into a more compact and more discriminative representation based on the latent high-level relationship. FR on the fused features is anticipated to produce better performance than that on the original features, since they provide more favorable properties. Focusing on this, we propose two different strategies which start from fusing multiple features and then exploit the dictionary learning (DL) framework for better FR performance. The first strategy is a simple and efficient two-step model, which learns a fusion matrix from training face images to fuse multiple features and then learns class-specific dictionaries based on the fused features. The second one is a more effective model requiring more computational time that learns the fusion matrix and the class-specific dictionaries simultaneously within an iterative optimization procedure. Besides, the second model considers to separate the shared common components from class-specified dictionaries to enhance the discrimination power of the dictionaries. The proposed strategies, which integrate multi-feature fusion process and dictionary learning framework for FR, realize the following goals: (1) exploiting multiple features of face images for better FR performances; (2) learning a fusion matrix to merge the features into a more compact and more discriminative representation; (3) learning class-specific dictionaries with consideration of the common patterns for better classification performance. We perform a series of experiments on public available databases to evaluate our methods, and the experimental results demonstrate the effectiveness of the proposed models.

Graphical abstract

Introduction

With the recent endeavor of computer vision researchers, lots of features have been designed to characterize various aspects of an object. Taking advantage of multiple features can provide more information for face recognition (FR), and the advantages of jointly analyzing multiple features are demonstrated in the literature [1], [2], [3], [4]. Although it is widely believed that recognition performance can benefit from multiple features, in front of the developed multi-feature approaches, it remains an exploratory task to design a more effective and more efficient method to exploit multiple features.

In recent years, several FR methods [5], [6], [7] have been developed based on the dictionary learning (DL) framework, and achieved very promising results. These DL-based FR methods are mainly developed in the following two tracks [8]:

1.
Directly making the dictionary discriminative, such as learning a class-specified sub-dictionary for each class;
2.
Making the sparse coefficients discriminative to propagate the discrimination power to the dictionary.

Even though DL-based recognition methods achieve very promising and even state-of-the-art performances, they only work on a single feature type, e.g. the original grayscale facial image or facial outline image, rather than multiple informative features. In other words, they cannot exploit multiple features of one face image and their possible semantic relationships to enhance FR performance.

Aware of the limitations of these DL-based methods that they cannot deal with multiple features, researchers have proposed several methods to tackle this problem [9], [10], [11]. Yuan and Yan propose a multi-task joint sparse representation based classification method (MTJSRC), which treats the recognition with multiple features as a multi-task problem, and each feature type is one task [9]. MTJSRC assumes that the coefficients share the same sparsity pattern among all the features. However, this assumption is too strict and is not held in practice. Therefore, Zhang et al. propose a joint dynamic sparse representation classification method (JDSRC) [10] to address this problem. They argue that the same sparsity pattern is shared among the coefficients at class-level, but not necessarily at atom-level. Yang et al. also address this problem by proposing a relaxed collaborative representation method (RCR), which assumes that the sparse codes among different features should be similar in appearance [11].

All the above three methods elaborately use multiple features and try to exploit the sparse pattern between the coefficients of different features. Although they produce improved performance, there are still some intrinsic problems:

1.
Since the overall dictionary consists of all the features from all training images, when the training data increase in number, the dictionary will become too large that can lower the computational efficiency;
2.
Simply taking all features into computation will raise the computational burden and will induce redundant information that does not benefit or even can degrade FR performance;
3.
Although different features are connected through the coefficient constraints, these methods neglect the internal relationships among these features which may enhance the FR performance further;
4.
The dictionary constituted by all the training data has common components that are shared by different classes, and these components can be interchangeably used for reconstructing the query images, in which way the performance can be compromised.

To address the above problems, we extend our previous work reported in [12], [13] with the proposed two different strategies to integrate multi-feature fusion process¹ and dictionary learning framework. The first one is a two-step model, which first learns a fusion matrix from the training data to fuse different features and then learns class-specific dictionaries. The fusion process exploits the high-level relationship among different features, and fuse these features into a more compact and more discriminative representation. Over the fused features of one specific class, the corresponding dictionary is learned. The second strategy is to learn the fusion matrix and class-specific dictionaries simultaneously. This strategy takes more time but produces better performance. Moreover, in this scheme, we explicitly separate the common components from different classes in the dictionary to make the learned dictionary more compact and more discriminative. As demonstrated by the experimental results, our two strategies both achieve better performances than other closely related methods.

The rest of this paper is organized as follows. In Section 2, we briefly introduce the background and review several approaches that motivate ours. We elaborate the proposed two strategies in Section 3. Extensive experiments on three face recognition datasets are presented in Section 4. Finally, we conclude our paper in Section 5 with discussions.

Section snippets

Tensor Algebra

As we consider to generalize dictionary learning method to multiple features, we turn to the tensor algebra calculation framework. The notations and calculations are the following [16], [17]. High-order tensors are denoted by boldface Euler script letters, e.g. $X$ . Specially, $X_{(n)}$ symbolizes the matrix corresponding to the flattened tensor $X$ along the nth mode. Mathematically, tensor element $X_{i_{1}, i_{2}, \dots, i_{K}}$ of a Kth-order tensor $X \in R^{I_{1} \times I_{2} \times \dots \times I_{K}}$ maps to the element (i_k,j) of matrix $X_{(n)}$ , where: $j = 1 + \sum_{\begin{array}{l} k = 1 \\ k \neq n \end{array}}^{K} (i)$

Our methodology

As discussed in Section 2.2, these SRC-based multiple-feature FR methods focus too much on imposing constraints on coefficients and they ignore the semantic relationship among different features. As discussed in Section 1, there are some drawbacks among these methods. Concentrating on these concerns, we propose two different DL based multiple-feature fusion strategies to improve the FR performance. One is an efficient and simple method, where a core dictionary is learned based on the fused more

Experiments

In this section, we evaluate our two methods through a series experiments on three public available datasets: Extended Yale B [29], CMU-PIE [30] and LFW [31]. To fairly demonstrate the effectiveness of our method, we choose some closely related approaches for comparison. These methods include holistic SRC (H-SRC) [18], separate SRC (S-SRC) [10], MTJSRC [9], JDSRC [10] and RCR [11]. H-SRC and S-SRC act as baseline methods, in which H-SRC concatenates all the features into a huge vector, while

Conclusion and future work

In this paper, we discuss how to exploit multiple features for better FR performance. We demonstrate that popular sparse coding based methods only put effort on how to constrain sparse coefficients to connect different features, and the SRC based sparse coding scheme is time-consuming, when facing large-scale situations. To address these problems, we propose two different strategies.

The first one is to learn a fusion matrix based on Fisher criterion from the training data to fuse the different

Acknowledgment

This work is supported by the Natural Science Foundations of China (no. 61071218) and 973 Program (project no. 2010CB327904).

References (37)

G.P. Kusuma et al.
PCA-based image recombination for multimodal 2D + 3D face recognition
J. Image Vision Comput.
(2011)
D. Wang et al.
Feature selection from high-order tensorial data via sparse decomposition
Pattern Recogn. Lett.
(2012)
H. Lu et al.
A survey of multilinear subspace learning for tensor data
Pattern Recogn.
(2011)
H.K. Ekenel et al.
Multiresolution face recognition
J. Image Vision Comput.
(2005)
L. Cao et al.
Heterogeneous Feature Machines for Visual Recognition
(2009)
X. Wang et al.
Using random subspace to combine multiple features for face recognition
Z. Liu et al.
Extracting multiple features in the cid color space for face recognition
IEEE Trans. Image Process.
(2010)
M. Yang et al.
Fisher Discrimination Dictionary Learning for Sparse Representation
(2011)
Z. Jiang et al.
Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD
(2011)
S. Kong et al.
A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality
(2012)

S. Kong et al.

A brief summary of dictionary learning based approach for classification

(2012)

X.-T. Yuan et al.

Visual Classification with Multi-task Joint Sparse Representation

(2010)

H. Zhang et al.

Multi-observation Visual Recognition via Joint Dynamic Sparse Representation

(2011)

M. Yang et al.

Relaxed Collaborative Representation for Pattern Classification

(2012)

S. Kong et al.

Learning individual-specific dictionaries with fused multiple features for face recognition

S. Kong et al.

Multiple feature fusion for face recognition

S. Gupta et al.

Anthropometric 3D face recognition

Int. J. Comput. Vis.

(2010)

T.G. Kolda et al.

Tensor decompositions and applications

SIAM Rev.

(2009)

Cited by (16)

A novel approach inspired by optic nerve characteristics for few-shot occluded face recognition
2020, Neurocomputing
Although there has been a growing body of work for face recognition, it is still a challenging task for faces under occlusion with limited training samples. In this work, we propose a novel framework to address the problem of few-shot occluded face recognition. In particular, inspired by the human being’s optic nerves characteristics that humans recognize the face under occlusion using contextual information rather than paying attention to the facial parts, we propose an effective feature extraction approach to capture the local and contextual information for face recognition. To enhance the robustness, we further introduce an adaptive fusion method to incorporate multiple features, including the proposed structural element feature, connected-granule labeling feature, and Reinforced Centrosymmetric Local Binary Pattern (RCSLBP). Final recognition is derived from the fusion of all classification results according to our proposed novel fusion method. Experimental results on three popular face image datasets of AR, Extended Yale B, and LFW demonstrate that our method performs better than many existing ones for few-shot face recognition in the presence of occlusion.
A vague set approach for identifying shot transition in videos using multiple feature amalgamation
2019, Applied Soft Computing Journal
Citation Excerpt :
Multiple feature fusion is still an open research problem [31]. This strategy has been used for several works like face identification [32,33], facial expression classification [34], surveillance video indexing [35], multimedia semantic understanding [36], target tracking in videos [37,38], natural language translation [39], audio features classification [40], visual tracking [41], image matching [42], image retrieval [43], place classification [44], robotics applications [45], multiple object tracking [46], shadow removal [47], thermal face recognition [48], social media applications [49], 3D model retrieval [50], drug pathway interaction prediction [51] etc. In some works [52,53], an ensemble of features were used without fusion, to solve a research problem.
Shot boundary detection (SBD) is the preliminary and most significant step in Content Based Video Retrieval (CBVR). As such the effectiveness of a CBVR system depends heavily on reliable detection of shot boundaries. In this work, a simple yet effective technique for amalgamating several distance features extracted from video frames has been proposed. The aim here is to develop a technique which is able to produce a better distance feature from the existing ones by hybridizing several distance metrics. In the proposed model, any number of distance features can be incorporated and fused together. The resultant feature is not only more robust but also immune to features which are inefficient. Robustness of the proposed method is tested by combining several low performing features with the more efficient ones. Several statistical amalgamation functions are also tested for determining the most efficient one in terms of F1 score. The power of vague sets has been harnessed to detect the shot boundaries effectively using the resultant distance feature. The proposed method is proved to be effective by means of the results obtained, which show that multiple feature amalgamation can lead to a hybrid distance feature which performs better than the best feature incorporated for SBD. The proposed technique is analyzed using ANOVA. A comparison with the other existing methods portray the efficacy of the proposed approach. This method can also be applied for other research problems where several features are to be fused together for producing superior results than the ones obtained by individual methods.
Spatiotemporal features of DCE-MRI for breast cancer diagnosis
2018, Computer Methods and Programs in Biomedicine
Citation Excerpt :
Fig. 7 shows a tree representation of the feature types used as the primary features. Various studies have focused on using features from different domains to improve the classification process [58–62]. Most of the studies on breast cancer diagnosis use computational approaches and focus on different classification methods to reach the best diagnosis rate using spatial or temporal features.
Breast cancer is a major cause of mortality among women if not treated in early stages. Previous works developed non-invasive diagnosis methods using imaging data, focusing on specific sets of features that can be called spatial features or temporal features. However, limited set of features carry limited information, requiring complex classification methods to diagnose the disease. For non-invasive diagnosis, different imaging modalities can be used. DCE-MRI is one of the best imaging techniques that provides temporal information about the kinetics of the contrast agent in suspicious lesions along with acceptable spatial resolution.
We have extracted and studied a comprehensive set of features from spatiotemporal space to obtain maximum available information from the DCE-MRI data. Then, we have applied a feature fusion technique to remove common information and extract a feature set with maximum information to be used by a simple classification method. We have also implemented conventional feature selection and classification methods and compared them with our proposed approach.
Experimental results obtained from DCE-MRI data of 26 biopsy or short-term follow-up proven patients illustrate that the proposed method outperforms alternative methods. The proposed method achieves a classification accuracy of 99% without missing any of the malignant cases.
The proposed method may help physicians determine the likelihood of malignancy in breast cancer using DCE-MRI without biopsy.
A ℓ<inf>2, 1</inf> norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD
2017, Computer Methods and Programs in Biomedicine
Citation Excerpt :
Based on the traditional dictionary learning, an ℓ2, 1 regularization is introduced to allow the sparse representations share the consistent pattern across all the modalities of the same class. Wang et al. proposed two different strategies to learn a fusion matrix for training face images with multiple features based on dictionary learning framework [38]. The first one is to learn a fusion matrix based on Fisher criterion from the training data to fuse the different features; The second strategy is to learn the fusion matrix and the core dictionary simultaneously, and use MMC to refine the fusion matrix.
Objective: The aim of this paper is to describe a novel algorithm for False Positive Reduction in lung nodule Computer Aided Detection(CAD).
Methods: In this paper, we describes a new CT lung CAD method which aims to detect solid nodules. Specially, we proposed a multi-kernel classifier with a ℓ_{2, 1} norm regularizer for heterogeneous feature fusion and selection from the feature subset level, and designed two efficient strategies to optimize the parameters of kernel weights in non-smooth ℓ_{2, 1} regularized multiple kernel learning algorithm. The first optimization algorithm adapts a proximal gradient method for solving the ℓ_{2, 1} norm of kernel weights, and use an accelerated method based on FISTA; the second one employs an iterative scheme based on an approximate gradient descent method.
Results: The results demonstrates that the FISTA-style accelerated proximal descent method is efficient for the ℓ_{2, 1} norm formulation of multiple kernel learning with the theoretical guarantee of the convergence rate. Moreover, the experimental results demonstrate the effectiveness of the proposed methods in terms of Geometric mean (G-mean) and Area under the ROC curve (AUC), and significantly outperforms the competing methods.
Conclusions: The proposed approach exhibits some remarkable advantages both in heterogeneous feature subsets fusion and classification phases. Compared with the fusion strategies of feature-level and decision level, the proposed ℓ_{2, 1} norm multi-kernel learning algorithm is able to accurately fuse the complementary and heterogeneous feature sets, and automatically prune the irrelevant and redundant feature subsets to form a more discriminative feature set, leading a promising classification performance. Moreover, the proposed algorithm consistently outperforms the comparable classification approaches in the literature.
Open-set face recognition across look-alike faces in real-world scenarios
2017, Image and Vision Computing
Citation Excerpt :
Recently, sparse representation has had prosperous performance on face recognition and it has been successfully applied to automatic face recognition. However, most sparse representation methods for face recognition require several images for each subject in the gallery [12–15,37–39]. Since some real-world databases for face recognition do not have multiple images per person in the gallery, with just one image per subject in the gallery, this paper proposes a novel idea to overcome this challenge by 3D modeling from gallery images and synthesizing them for generating several images.
The open-set problem is among the problems that have significantly changed the performance of face recognition algorithms in real-world scenarios. Open-set operates under the supposition that not all the probes have a pair in the gallery. Most face recognition systems in real-world scenarios focus on handling pose, expression and illumination problems on face recognition. In addition to these challenges, when the number of subjects is increased for face recognition, these problems are intensified by look-alike faces for which there are two subjects with lower intra-class variations. In such challenges, the inter-class similarity is higher than the intra-class variation for these two subjects. In fact, these look-alike faces can be created as intrinsic, situation-based and also by facial plastic surgery. This work introduces three real-world open-set face recognition methods across facial plastic surgery changes and a look-alike face by 3D face reconstruction and sparse representation. Since some real-world databases for face recognition do not have multiple images per person in the gallery, with just one image per subject in the gallery, this paper proposes a novel idea to overcome this challenge by 3D modeling from gallery images and synthesizing them for generating several images. Accordingly, a 3D model is initially reconstructed from frontal face images in a real-world gallery. Then, each 3D reconstructed face in the gallery is synthesized to several possible views and a sparse dictionary is generated based on the synthesized face image for each person. Also, a likeness dictionary is defined and its optimization problem is solved by the proposed method. Finally, the face recognition is performed for open-set face recognition using three proposed representation classifications. Promising results are achieved for face recognition across plastic surgery and look-alike faces on three databases including the plastic surgery face, look-alike face and LFW databases compared to several state-of-the-art methods. Also, several real-world and open-set scenarios are performed to evaluate the proposed method on these databases in real-world scenarios.
Supervised within-class-similar discriminative dictionary learning for face recognition
2016, Journal of Visual Communication and Image Representation
The current study puts forward a supervised within-class-similar discriminative dictionary learning (SCDDL) algorithm for face recognition. Some popular discriminative dictionary learning schemes for recognition tasks always incorporate the linear classification error term into the objective function or make some discriminative restrictions on representation coefficients. In the presented SCDDL algorithm, we propose to directly restrict the representation coefficients to be similar within the same class and simultaneously include the linear classification error term in the supervised dictionary learning scheme to derive a more discriminative dictionary for face recognition. The experimental results on three large well-known face databases suggest that our approach can enhance the fisher ratio of representation coefficients when compared with several dictionary learning algorithms that incorporate linear classifiers. In addition, the learned discriminative dictionary, the large fisher ratio of representation coefficients and the simultaneously learned classifier can improve the recognition rate compared with some state-of-the-art dictionary learning algorithms.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Massimo Tistarelli.

View full text

Integration of multi-feature fusion and dictionary learning for face recognition☆

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Tensor Algebra

Our methodology

Experiments

Conclusion and future work

Acknowledgment

J. Image Vision Comput.

Pattern Recogn. Lett.

Pattern Recogn.

Multiresolution face recognition

J. Image Vision Comput.

Heterogeneous Feature Machines for Visual Recognition

Using random subspace to combine multiple features for face recognition

Extracting multiple features in the cid color space for face recognition

IEEE Trans. Image Process.

Fisher Discrimination Dictionary Learning for Sparse Representation

Learning a Discriminative Dictionary for Sparse Coding via Label Consistent K-SVD

A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality

A brief summary of dictionary learning based approach for classification

Visual Classification with Multi-task Joint Sparse Representation

Multi-observation Visual Recognition via Joint Dynamic Sparse Representation

Relaxed Collaborative Representation for Pattern Classification

Learning individual-specific dictionaries with fused multiple features for face recognition

Multiple feature fusion for face recognition

Anthropometric 3D face recognition

Int. J. Comput. Vis.

Tensor decompositions and applications

SIAM Rev.