Improving face recognition performance using TeCS2 dictionary

https://doi.org/10.1016/j.patrec.2020.12.022Get rights and content

Highlights

  • Combining visual cues with face recognition algorithm to improve performance.

  • Visual queues such as color, shape, texture and symmetry help in understanding faces.

  • Extensive experiments show the effectiveness of the proposed framework.

  • Proposed approach outperforms our previous work and other state of art approaches.

Abstract

Human mind processes the different primitive components of image signals such as color, shape, texture, and symmetry in a parallel and complex fashion. Deep neural networks aim to learn all these components from the image in an unsupervised manner. However, learning the primitive features is not formally assured in a deep learning formulation, and, adding these features explicitly would improve the performance. Especially in face recognition, humans intuitively and implicitly employ the usage of primitive features such as color, shape, texture, and symmetry of faces. Inspired by this observation, this paper presents a novel approach in building a learning based TeCS2 space. This space consists of meta-level features obtained from dictionary learning and combining it with task specific deep learning classifiers (such as DenseNet) for face recognition. Confidence based fusion mechanism is presented to supplement the task specific deep learning classifier with the proposed TeCS2 features. The effectiveness of the proposed framework is evaluated on four benchmark face recognition datasets: (i) Disguised Faces in the Wild (DFW), (ii) Labeled faces in the wild (LFW), (iii) IIITD Plastic Surgery dataset, and (iv) Point and Shoot Challenge (PaSC).

Introduction

The success of supervised classifiers has been limited by the unavailability of large volumes of labeled data. Labeling data is an arduous and time-intensive job with the requirement of experts and skills, in most cases. To overcome this challenge, researchers have attempted diverse transfer learning based solutions that can be broadly grouped into two categories:

1) Data level: The main idea is to augment the unlabeled and labeled data to better learn the original data distribution, explicitly or implicitly. Data augmentation could be performed by repeating the data with small variations [1] or using synthetic data [2].

2) Model level: Mapping the data distribution to the labels require large labeled data. Due to insufficient train data, the mapping model suffers in not being able to learn the best parameters and converge at a global optima. To bootstrap training, these models are often initialized with weights pre-trained on a different, but related dataset [3]. Transfer learning provides the solution to transfer knowledge from the source domain (having large amounts of data) to the target domain (having limited data). There are two broad approaches to transfer knowledge: (i) domain adaptation [4] where the source domain is learnt independently of the target domain and later the covariate shift between the source and target domain is accounted for, and (ii) domain generalization [5] where the source and target domain are learnt together in a multi-task setting.

Human mind often employs the use of primary visual features such as color, shape, texture, or symmetry to perform object recognition [30], [31]. Similar to the goal of domain adaptation, these features are learnt from general purpose task-independent domains and are borrowed to perform specific tasks, such as face recognition. We derive similar intuition and motivation in the incorporation of color, shape, texture, and symmetry for deep supervised classification and can be better explained using the example in Fig. 1. The word ‘ball’, automatically enables the human mind to visualize a round object, along with different colors and textures based on the type of the ball. The complex features processed by human mind on seeing a ball image are emulated by feature descriptors such as SIFT, SURF, LBP, GIST, deep CNN filters, and its variants. Along with these complex features the human mind also captures primitive visual cues such as a tennis ball is green in color, a basketball has black lines while a football has black checked patterns. The addition of these visual cues to complex deep neural network features would increase performance of the supervised classifiers. Specifically, automated classification and recognition of unconstrained face images suffers due to the inherent and diverse challenges such as variations in illumination, pose, expression, disguises in face, or plastic surgery. Table 1 summarizes some of the existing approaches being applied to face recognition under these various challenges. However, a generic face image has a lot of common primitive features such as shape, color, and texture that are currently not being explicitly used in face recognition.

In this research, we build upon our previous work [17] which utilized color, shape and texture features to aid supervised classification and propose an improved approach for learning primitive visual cues from a generic dataset using a dictionary learning algorithm, named TeCS2 (Texture, Color, Shape, Symmetry) dictionary.1 The proposed algorithm performs supervised face recognition , by additionally using features from TeCS2 dictionary and a confidence based score fusion. We also study the performance impact of the proposed approach across four different challenges of face recognition: (i) face recognition with disguise using Disguised Faces in the Wild (DFW) dataset [32], [33], (ii) completely unconstrained face recognition using Labeled Faces in the Wild (LFW) dataset [34], (iii) low resolution face recognition using the Point and Shoot Challenge (PaSC) dataset [35], and (iv) variations due to plastic surgery using the IIITD Plastic Surgery dataset [36]. In contrast with our previous work, we add diversity to each of the attributes, add a new attribute (symmetry), and propose an adaptive score based fusion strategy. In addition, we show results on more face datasets and compare/contrast with the performance of our previous COST framework for each of these datasets. The rest of the paper is organized as follows: Section 2 explains the proposed approach for learning TeCS2 dictionary. Section 3 explains the datasets, protocol, and experimental analysis followed by conclusion and future directions.

Section snippets

Proposed approach

As shown in Fig. 2, the proposed approach aims to learn four different types of representation, one each for color, shape, texture and symmetry attributes. The representations are learnt using dictionary learning, and the learnt dictionaries are then used to find the attribute specific representation of a given input image. The new representation is used further for classification and verification by aiding a state of the art deep learning classifier. Our work uses DenseNet but the proposed

Experimental results on face datasets

We demonstrate the impact of the proposed framework on multiple face recognition challenges. The description of the benchmark datasets, the experimental protocol, and the performance of the proposed approach are discussed below.

Conclusion and future directions

This paper presents a novel framework of learning a TeCS2 space for visual processing using unsupervised dictionary learning. The proposed representation space is utilized to supplement a task specific classifier for improved performance. The framework is studied on different face recognition tasks across four datasets: Disguised Faces in the Wild (DFW), Labeled Faces in the Wild (LFW), IIITD Plastic Surgery dataset, and Point and Shoot dataset (PaSC). Experimentally, we show that supplementing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

M. Vatsa and R. Singh are partially supported through the research grant from MEITY. M. Vatsa is also supported through Swarnajayanti Fellowship by the Government of India.

References (47)

  • W. Liu et al.

    SphereFace: deep hypersphere embedding for face recognition

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • S. Sankaranarayanan, Y. Balaji, C. Castillo, R. Chellappa, Generate to adapt: aligning domains using generative...
  • A. Handa et al.

    Understanding real world indoor scenes with synthetic data

    IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • Z. Liu et al.

    Deep learning face attributes in the wild

    IEEE International Conference on Computer Vision

    (2015)
  • K. Saenko et al.

    Adapting visual category models to new domains

    European Conference on Computer Vision

    (2010)
  • K. Muandet et al.

    Domain generalization via invariant feature representation

    International Conference on Machine Learning

    (2013)
  • E. Smirnov et al.

    Hard example mining with auxiliary embeddings

    CVPR Workshop on Disguised Faces in the Wild

    (2018)
  • Y. Wen et al.

    A discriminative feature learning approach for deep face recognition

    European Conference on Computer Vision

    (2016)
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • A. Bansal et al.

    Deep features for recognizing disguised faces in the wild

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    (2018)
  • R. Ranjan et al.

    An all-in-one convolutional neural network for face analysis

    Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on

    (2017)
  • N. Kohli et al.

    Face verification with disguise variations via deep disguise recognizer

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    (2018)
  • D. Yi, Z. Lei, S. Liao, S.Z. Li, Learning face representation from scratch, arXiv preprint arXiv:1411.7923...
  • K. Zhang et al.

    Deep disguised faces recognition

    CVPR Workshop on Disguised Faces in the Wild

    (2018)
  • Y. Liu et al.

    Recurrent scale approximation for object detection in CNN

    IEEE international conference on computer vision

    (2017)
  • A. Suri et al.

    A-link: recognizing disguised faces via active learning based inter-domain knowledge, 10th International Conference on Biometrics Theory

    Applications and Systems

    (2019)
  • S. Suri et al.

    On matching faces with alterations due to plastic surgery and disguise

    2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS)

    (2018)
  • H.S. Bhatt et al.

    Recognizing surgically altered face images using multiobjective evolutionary algorithm

    IEEE Trans. Inf. Forensics Secur.

    (2013)
  • I. Gupta et al.

    Scattering transform for matching surgically altered face images

    International Conference on Pattern Recognition

    (2018)
  • M. Ebadi et al.

    Face recognition using patch manifold learning across plastic surgery from a single training exemplar per enrolled person

    Signal Image Video Process.

    (2020)
  • L. Wolf et al.

    Descriptor based methods in the wild

    Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition

    (2008)
  • J. Lu et al.

    Joint feature learning for face recognition

    IEEE Trans. Inf. Forensics Secur.

    (2015)
  • A. Majumdar et al.

    Face verification via class sparsity based supervised encoding

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • Cited by (3)

    View full text