Improving face recognition performance using TeCS dictionary
Introduction
The success of supervised classifiers has been limited by the unavailability of large volumes of labeled data. Labeling data is an arduous and time-intensive job with the requirement of experts and skills, in most cases. To overcome this challenge, researchers have attempted diverse transfer learning based solutions that can be broadly grouped into two categories:
1) Data level: The main idea is to augment the unlabeled and labeled data to better learn the original data distribution, explicitly or implicitly. Data augmentation could be performed by repeating the data with small variations [1] or using synthetic data [2].
2) Model level: Mapping the data distribution to the labels require large labeled data. Due to insufficient train data, the mapping model suffers in not being able to learn the best parameters and converge at a global optima. To bootstrap training, these models are often initialized with weights pre-trained on a different, but related dataset [3]. Transfer learning provides the solution to transfer knowledge from the source domain (having large amounts of data) to the target domain (having limited data). There are two broad approaches to transfer knowledge: (i) domain adaptation [4] where the source domain is learnt independently of the target domain and later the covariate shift between the source and target domain is accounted for, and (ii) domain generalization [5] where the source and target domain are learnt together in a multi-task setting.
Human mind often employs the use of primary visual features such as color, shape, texture, or symmetry to perform object recognition [30], [31]. Similar to the goal of domain adaptation, these features are learnt from general purpose task-independent domains and are borrowed to perform specific tasks, such as face recognition. We derive similar intuition and motivation in the incorporation of color, shape, texture, and symmetry for deep supervised classification and can be better explained using the example in Fig. 1. The word ‘ball’, automatically enables the human mind to visualize a round object, along with different colors and textures based on the type of the ball. The complex features processed by human mind on seeing a ball image are emulated by feature descriptors such as SIFT, SURF, LBP, GIST, deep CNN filters, and its variants. Along with these complex features the human mind also captures primitive visual cues such as a tennis ball is green in color, a basketball has black lines while a football has black checked patterns. The addition of these visual cues to complex deep neural network features would increase performance of the supervised classifiers. Specifically, automated classification and recognition of unconstrained face images suffers due to the inherent and diverse challenges such as variations in illumination, pose, expression, disguises in face, or plastic surgery. Table 1 summarizes some of the existing approaches being applied to face recognition under these various challenges. However, a generic face image has a lot of common primitive features such as shape, color, and texture that are currently not being explicitly used in face recognition.
In this research, we build upon our previous work [17] which utilized color, shape and texture features to aid supervised classification and propose an improved approach for learning primitive visual cues from a generic dataset using a dictionary learning algorithm, named TeCS (Texture, Color, Shape, Symmetry) dictionary.1 The proposed algorithm performs supervised face recognition , by additionally using features from TeCS dictionary and a confidence based score fusion. We also study the performance impact of the proposed approach across four different challenges of face recognition: (i) face recognition with disguise using Disguised Faces in the Wild (DFW) dataset [32], [33], (ii) completely unconstrained face recognition using Labeled Faces in the Wild (LFW) dataset [34], (iii) low resolution face recognition using the Point and Shoot Challenge (PaSC) dataset [35], and (iv) variations due to plastic surgery using the IIITD Plastic Surgery dataset [36]. In contrast with our previous work, we add diversity to each of the attributes, add a new attribute (symmetry), and propose an adaptive score based fusion strategy. In addition, we show results on more face datasets and compare/contrast with the performance of our previous COST framework for each of these datasets. The rest of the paper is organized as follows: Section 2 explains the proposed approach for learning TeCS dictionary. Section 3 explains the datasets, protocol, and experimental analysis followed by conclusion and future directions.
Section snippets
Proposed approach
As shown in Fig. 2, the proposed approach aims to learn four different types of representation, one each for color, shape, texture and symmetry attributes. The representations are learnt using dictionary learning, and the learnt dictionaries are then used to find the attribute specific representation of a given input image. The new representation is used further for classification and verification by aiding a state of the art deep learning classifier. Our work uses DenseNet but the proposed
Experimental results on face datasets
We demonstrate the impact of the proposed framework on multiple face recognition challenges. The description of the benchmark datasets, the experimental protocol, and the performance of the proposed approach are discussed below.
Conclusion and future directions
This paper presents a novel framework of learning a TeCS space for visual processing using unsupervised dictionary learning. The proposed representation space is utilized to supplement a task specific classifier for improved performance. The framework is studied on different face recognition tasks across four datasets: Disguised Faces in the Wild (DFW), Labeled Faces in the Wild (LFW), IIITD Plastic Surgery dataset, and Point and Shoot dataset (PaSC). Experimentally, we show that supplementing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
M. Vatsa and R. Singh are partially supported through the research grant from MEITY. M. Vatsa is also supported through Swarnajayanti Fellowship by the Government of India.
References (47)
- et al.
SphereFace: deep hypersphere embedding for face recognition
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - S. Sankaranarayanan, Y. Balaji, C. Castillo, R. Chellappa, Generate to adapt: aligning domains using generative...
- et al.
Understanding real world indoor scenes with synthetic data
IEEE Conference on Computer Vision and Pattern Recognition
(2016) - et al.
Deep learning face attributes in the wild
IEEE International Conference on Computer Vision
(2015) - et al.
Adapting visual category models to new domains
European Conference on Computer Vision
(2010) - et al.
Domain generalization via invariant feature representation
International Conference on Machine Learning
(2013) - et al.
Hard example mining with auxiliary embeddings
CVPR Workshop on Disguised Faces in the Wild
(2018) - et al.
A discriminative feature learning approach for deep face recognition
European Conference on Computer Vision
(2016) - et al.
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016) - et al.
Deep features for recognizing disguised faces in the wild
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
(2018)
An all-in-one convolutional neural network for face analysis
Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on
Face verification with disguise variations via deep disguise recognizer
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Deep disguised faces recognition
CVPR Workshop on Disguised Faces in the Wild
Recurrent scale approximation for object detection in CNN
IEEE international conference on computer vision
A-link: recognizing disguised faces via active learning based inter-domain knowledge, 10th International Conference on Biometrics Theory
Applications and Systems
On matching faces with alterations due to plastic surgery and disguise
2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS)
Recognizing surgically altered face images using multiobjective evolutionary algorithm
IEEE Trans. Inf. Forensics Secur.
Scattering transform for matching surgically altered face images
International Conference on Pattern Recognition
Face recognition using patch manifold learning across plastic surgery from a single training exemplar per enrolled person
Signal Image Video Process.
Descriptor based methods in the wild
Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition
Joint feature learning for face recognition
IEEE Trans. Inf. Forensics Secur.
Face verification via class sparsity based supervised encoding
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (3)
Virtual special issue on advances in digital security: Biometrics and forensics
2022, Pattern Recognition LettersDeep learning-based face detection and recognition on drones
2024, Journal of Ambient Intelligence and Humanized ComputingDisguise Resilient Face Verification
2022, IEEE Transactions on Circuits and Systems for Video Technology