Deep class-skewed learning for face recognition

doi:10.1016/j.neucom.2019.04.085

Neurocomputing

Volume 363, 21 October 2019, Pages 35-45

https://doi.org/10.1016/j.neucom.2019.04.085 Get rights and content

Abstract

Face datasets often exhibit highly-skewed class distribution, i.e., rich classes contain a plenty amount of instances, while only few images belong to poor classes. To mitigate this issue, we explore deep class-skewed learning from two aspects in this paper: feature augmentation and feature normalization. To deal with the imbalance distribution problem, we put forward a novel feature augmentation method termed Large Margin Feature Augmentation (LMFA) to augment hard features and equalize class distribution, leading to balanced classification boundaries between rich and poor classes. By considering the distribution gap between training and testing features, A novel feature normalization called Transferable Domain Normalization (TDN) is proposed to normalize domain-specific features to obey an identical Gaussian distribution, and enhance the feature generalization. Extensive experiments are conducted on five popular face recognition datasets including LFW, YTF, CFP, AgeDB and MegaFace. We achieve remarkable results on par with or better than the state-of-the-art methods, which demonstrate the effectiveness of our proposed learning class-balanced features.

Introduction

Recently, we have witnessed the great success of applying convolutional neural networks (CNNs) on face recognition [1], [2], [3], [4], [5], [6]. In order to train high-efficiency recognition models, it is necessary to collect abundant training data, design advanced network architecture and construct discriminative metric learning. Different from the large-scale datasets like ImageNet [7] where the instance number of each class is equally distributed, many face datasets naturally exhibit imbalance in their class distribution. For instance, widely-used face training datasets like CASIA-WebFace [8] and MS-Celeb-1M [9] are collected from the ranked images of search engines i.e., Google, Bing and Baidu, given specifically queried identities. According to the classical page rank algorithm used in search engines, a small number of well-known persons (usually rich classes in datasets) have sufficient face images with consecutive face variations, while a large number of little-known persons (usually poor classes in datasets) only contain a spot of face images with discrete face variations. In testing datasets like LFW [10] and CFP [11], the numbers of positive and negative face pairs are highly skewed since it is easier to obtain face images with different identities (negative) than faces with matched identities (positive) during data collection. Such class-imbalance problems existing in face recognition provide the perfect testbeds for studying generic imbalance learning algorithms. Indeed, without handling the class-imbalance issue, rich classes tend to dominate a greater impact on learning general and robust features than the poor ones, resulting in poor performance.

In this paper, we investigate some more effective and interesting methods for deep class-skewed learning. We show their important applications to face recognition from ubiquitously class-imbalanced datasets. Different from the previous works [12], [13], [14], the class-skewed learning methods in this paper can be attributed into two aspects: feature augmentation and feature normalization. The overview of the proposed methods is shown in Fig. 1.

The first aspect is feature augmentation, which helps to balance feature distributions between poor and rich classes. We observe that poor classes often contain very few samples with high degree of sparse and discrete face variations while rich classes usually possess very plenty instances with high degree of dense and consecutive face variations. The skewed variability of class-imbalanced data makes the genuine clusters from poor classes easy to overlap with other imposter clusters from rich classes. Such inter-class cluster overlap may confuse the underlying classification boundaries formed by poor and rich classes. To reduce this skewed variability, we propose a novel feature augmentation approach called Large Margin Feature Augmentation (LMFA) to augment cross-boundary features for ameliorating inter-class cluster invasion. In view of the biased instance distribution, LMFA generates new features of each class with inverse class frequencies to form dense and consecutive boundaries between poor and rich classes. Inspired by the large margin used in [4], [15], [16], LMFA augments large-margin features to impose margin constraints on a feature manifold for discriminative feature learning.

The second aspect is feature normalization, which contributes to learning domain-disentangled features for training and testing data. The open-set protocol is harder for face recognition because the training and testing classes are mutually exclusive. It usually requires discriminative feature learning to minimize intra-class distances and maximize inter-class differences. However, those disjoint classes cause the feature distribution gap between training and testing domains. Consequently, most available face models cannot effectively transfer the discrimination ability of training features to testing features under the open-set protocol, which results in performance deterioration of face recognition. According to the previous analysis, we propose a novel feature normalization method named Transferable Domain Normalization (TDN) to narrow the statistical gap between the training and testing data. Our objective, then, would transform both training and testing features to obey an identical Gaussian distribution with the mean 0 and variance 1. Such feature normalization introduces a tight constraint for global feature space regularization to learn more balanced class boundaries between poor and rich classes.

The major contributions of this paper are summarized as follows:

•
We reveal the deficiency of class-imbalanced datasets in theory when the softmax loss is used for training models. Rich classes prefer to exhibit intra-class compactness, while poor classes concentrate on learning inter-class separation.
•
We propose a novel feature augmentation method called Large Margin Feature Augmentation (LMFA) which effectively addresses the class-imbalance problem in face recognition. With very minimal effort, LMFA contributes to learning good face features with large margin between classes and achieves face verification accuracy of 99.15% on LFW.
•
We propose a novel feature normalization technique termed Transferable Domain Normalization (TDN) to learn domain-invariant face features between training and testing datasets under the open-set protocol. Without learned parameters and complex computation, TDN effectively transfers the discriminative capability from training features to testing features and obtains superior face verification accuracy 99.45% on LFW.
•
We demonstrate the effectiveness of our proposed methods with extensive experiments over five face datasets (LFW, YTF, CFP, AgeDB and MegaFace). The experimental results have shown the superior or competitive performance of LMFA and TDN over the state-of-the-art methods.

The remainder of the paper is organized as follows: In Section 2, some related works about deep imbalanced learning and deep face recognition are discussed. In Section 3, we simply analyze the class-imbalance problem in theory, and then the details of the proposed LMFA and TDN methods for deep class-skewed learning are described respectively. In Section 4, the experimental results are reported. In Section 5, the conclusions are presented.

Section snippets

Related work

Deep imbalanced learning: Prior works to investigate the class-skewed learning can be mainly attributed into three groups: data re-sampling [17], [18], generative learning [19], [20] and cost-sensitive learning [21], [22]. (1) The first group focuses on equalizing statistic distribution to learn equally feature representations for all classes by under-sampling rich classes or over-sampling poor classes (or both). However, such re-sampling scheme is well-known for some inherent deficiencies.

Problem elaboration

In order to better understand how training optimization is influenced by imbalanced training data, we briefly review face feature learning with the standard softmax loss in a binary classification problem. Suppose that we have two classes, where one rich class has $N_{r}$ face features $F_{r} = {F_{r, i}}_{i = 1}^{N_{r}}$ and another poor class has $N_{p}$ face features $F_{p} = {F_{p, j}}_{j = i}^{N_{p}}$ . The binary softmax loss is calculated as follows: $\begin{matrix} L & = & - \frac{1}{N_{r} + N_{p}} \sum_{i = 1}^{N_{r}} \log (\frac{e^{W_{r}^{⊤} F_{r, i}}}{e^{W_{r}^{⊤} F_{r, i}} + e^{W_{p}^{⊤} F_{r, i}}}) \\ - \frac{1}{N_{r} + N_{p}} \sum_{j = 1}^{N_{p}} \log (\frac{e^{W_{p}^{⊤} F_{p, j}}}{e^{W_{r}^{⊤} F_{p, j}} + e^{W_{p}^{⊤} F_{p, j}}}), \end{matrix}$

Experiments

We study the face recognition task on both small-scale and large-scale imbalanced datasets like CASIA-WebFace [8] and MS-Celeb-1M [9]. As shown in Fig. 4a, the training data of CASIA-WebFace are highly class-skewed. Only 7.76% of the 10K classes have more than 100 images while 38.41% of them have no more than 20 images. Such data imbalance makes it difficult for learning equally robust features. In Fig. 4b, the data distribution of MS-Celeb-1M exhibits more severe class-imbalance than

Conclusion

In this paper, we contribute to improving the deep class-skewed learning performance on face recognition through feature augmentation and feature normalization. Without loss of face identification information, we propose LMFA to augment large-margin face features to equalize the feature distribution and learn class-imbalanced classification boundaries between rich and poor classes. Then we further propose a feature normalization method called TDN to learn domain-transferable features between

Declaration of interests

None.

Acknowledgment

This work is supported by Chinese National Natural Science Foundation (61532018).

Pingyu Wang is currently a Ph.D. candidate at the Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include attribute classification, face recognition, person reidentification and computer vision.

References (40)

GuoY. et al.
One-shot face recognition by promoting underrepresented classes
(2017)
DingZ. et al.
One-shot face recognition via generative learning
Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on
(2018)
QiX. et al.
Face recognition via centralized coordinate learning
(2018)
SunY. et al.
Deep learning face representation from predicting 10,000 classes
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2014)
SunY. et al.
Deep learning face representation by joint identification-verification
Proceedings of the Advances in Neural Information Processing Systems
(2014)
WenY. et al.
A discriminative feature learning approach for deep face recognition
Proceedings of the European Conference on Computer Vision
(2016)
LiuW. et al.
Sphereface: deep hypersphere embedding for face recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017)
WangH. et al.
Cosface: large margin cosine loss for deep face recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018)
DengJ. et al.
Arcface: Additive angular margin loss for deep face recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2019)
DengJ. et al.
Imagenet: a large-scale hierarchical image database
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009.
(2009)

D. Yi et al.

Learning face representation from scratch

(2014)

GuoY. et al.

Ms-celeb-1m: a dataset and benchmark for large-scale face recognition

Proceedings of the European Conference on Computer Vision

(2016)

HuangG.B. et al.

Labeled faces in the wild: A database forstudying face recognition in unconstrained environments

Proceedings of the Workshop on faces in’Real-Life’Images: detection, Alignment, and Recognition

(2008)

S. Sengupta et al.

Frontal to profile face verification in the wild

Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)

(2016)

I. Masi et al.

Do we really need to collect millions of faces for effective face recognition?

Proceedings of the European Conference on Computer Vision

(2016)

ZhangX. et al.

Range loss for deep face recognition with long-tailed training data

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2017)

WuY. et al.

Deep face recognition with center invariant loss

Proceedings of the on Thematic Workshops of ACM Multimedia 2017

(2017)

LiuW. et al.

Large-margin softmax loss for convolutional neural networks

Proceedings of the ICML

(2016)

WangF. et al.

Additive margin softmax for face verification

IEEE Signal Processing Letters

(2018)

C. Drummond et al.

C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

Proceedings of the Workshop on Learning from Imbalanced Datasets Ii

(2003)

Cited by (31)

Relevant information undersampling to support imbalanced data classification
2021, Neurocomputing
Citation Excerpt :
Without consideration of the imbalance problem, the classification algorithms can be overwhelmed by the majority class and can ignore the minority one [1]. This phenomenon is encountered in numerous real-world applications such as medical diagnosis [2], face recognition [3], detection of fraudulent transactions [4], among others. Regarding this, there are three main alternatives to deal with the imbalanced data classification issue: i) Algorithm-level approaches that cope with the class distribution problem by modifying the learning stage.
Traditional classification algorithms suppose that the sample distribution among classes is balanced. Yet, such an assumption leads to biased performance over the majority class. This paper proposes a Relevant Information-based UnderSampling (RIUS) approach to select the most relevant examples from the majority class to improve the classification performance for imbalanced data scenarios. RIUS builds on the information-preservation principle that extracts the majority class’s underlying structure with fewer samples. Additionally, we couple our RIUS approach to the well-known Clustering-based Undersampling algorithm (CBUS) to enhance the data representation, and named this RIUS enhancement as CRIUS. Experimental results show that RIUS and CRIUS reveal the data’s relevant structure and reduce the loss of information by selecting the most informative instances.
Multi-class imbalanced big data classification on Spark
2021, Knowledge-Based Systems
Despite more than two decades of progress, learning from imbalanced data is still considered as one of the contemporary challenges in machine learning. This has been further complicated by the advent of the big data era, where popular algorithms dedicated to alleviating the class skew impact are no longer feasible due to the volume of datasets. Additionally, most of existing algorithms focus on binary imbalanced problems, where majority and minority classes are well-defined. Multi-class imbalanced data poses further challenges as the relationship between classes is much more complex and simple decomposition into a number of binary problems leads to a significant loss of information. In this paper, we propose the first compound framework for dealing with multi-class big data problems, addressing at the same time the existence of multiple classes and high volumes of data. We propose to analyze the instance-level difficulties in each class, leading to understanding what causes learning difficulties. We embed this information in popular resampling algorithms which allows for informative balancing of multiple classes. We propose an efficient implementation of the discussed algorithm on Apache Spark, including a novel version of SMOTE that overcomes spatial limitations in distributed environments of its predecessor. Extensive experimental study shows that using instance-level information significantly improves learning from multi-class imbalanced big data. Our framework can be downloaded from https://github.com/fsleeman/minority-type-imbalanced.
Post-comparison mitigation of demographic bias in face recognition using fair score normalization
2020, Pattern Recognition Letters
Citation Excerpt :
Consequently, there is an increased need for fair and unbiased biometric solutions [2,7]. Recent works mainly focused on learning less-biased face representations [11–17] for specific demographics. However, this requires computationally expensive template-replacement of the whole database if the recognition system is updated.
Current face recognition systems achieve high progress on several benchmark tests. Despite this progress, recent works showed that these systems are strongly biased against demographic sub-groups. Consequently, an easily integrable solution is needed to reduce the discriminatory effect of these biased systems. Previous work mainly focused on learning less biased face representations, which comes at the cost of a strongly degraded overall recognition performance. In this work, we propose a novel unsupervised fair score normalization approach that is specifically designed to reduce the effect of bias in face recognition and subsequently lead to a significant overall performance boost. Our hypothesis is built on the notation of individual fairness by designing a normalization approach that leads to treating “similar” individuals “similarly”. Experiments were conducted on three publicly available datasets captured under controlled and in-the-wild circumstances. Results demonstrate that our solution reduces demographic biases, e.g. by up to 82.7% in the case when gender is considered. Moreover, it mitigates the bias more consistently than existing works. In contrast to previous works, our fair normalization approach enhances the overall performance by up to 53.2% at false match rate of $10^{- 3}$ and up to 82.9% at a false match rate of $10^{- 5}$ . Additionally, it is easily integrable into existing recognition systems and not limited to face biometrics.
EyesGAN: Synthesize human face from human eyes
2020, Neurocomputing
Citation Excerpt :
Recently, face recognition technology has been rapidly developed and applied to solve many practical problems. Wang et al. [38] proposed a feature augmentation method termed Large Margin Feature Augmentation (LMFA) to improve face recognition rate effectively. Deng et al. [39] designed Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition, which required negligible computational overhead.
Face recognition recently has achieved remarkable success in many fields, especially in mobile payment, authentication, criminal investigation, and city management. However, face occlusion is still the key problem in person identification, such as in the field of anti-terrorism, criminal cases and public security. To solve this problem, an improved end-to-end deep generative adversarial network (named EyesGAN) has been proposed to synthesize human face from human eyes in this paper, which can be used as a potential scheme for masked face recognition. BicycleGAN is chosen as the baseline and effective improvements have been achieved. First, the self-attentional mechanism is introduced so that the improved model can more effectively learn about the internal mapping between human eyes and face. Second, the perceptual loss is applied to guide the model cyclic training and help with updating the network parameters so that the synthesized face can be of higher-similarity to the ground truth face. Third, EyesGAN has been designed by getting the utmost out of the performance of the perceptual loss and the self-attentional mechanism in GANs. A dataset of eyes-to-face synthesis has been reconstructed based on the public face datasets for training and testing. Finally, the faces synthesized by EyesGAN have been quantitatively and qualitatively compared with the results of existing methods. Extensive experiments demenstrate that our proposed method has performed better than the state-of-the-art methods in terms of Average Euclidean Distance, Average Cosine Similarity, Synthesis Accuracy Percentage, Fréchet Inception Distance, and Baidu face recognition rate (the accuracy achieved $96.1 %$ on 615 test data of CelebA database). In this paper, the feasibility of synthesizing human face from human eyes has been explored, and the attention map shows that our network can predict other parts of the face from eyes.
Deep hard modality alignment for visible thermal person re-identification
2020, Pattern Recognition Letters
Visible Thermal Person Re-Identification (VTReID) is essentially a cross-modality problem and widely encountered in real night-time surveillance scenarios, which is still in need of vigorous performance improvement. In this work, we design a simple but effective Hard Modality Alignment Network (HMAN) framework to learn modality-robust features. Since current VTReID works do not consider the cross-modality discrepancy imbalance, their models are likely to suffer from the selective alignment behavior. To solve this problem, we propose a novel Hard Modality Alignment (HMA) loss to simultaneously balance and reduce the modality discrepancies. Specifically, we mine the hard feature subspace with large modality discrepancies and abandon the easy feature subspace with small modality discrepancies to make the modality distributions more distinguishable. For mitigating the discrepancy imbalance, we pay more attention on reducing the modality discrepancies of the hard feature subspace than that of the easy feature subspace. Furthermore, we propose to jointly relieve the modality heterogeneity of global and local visual semantics to further boost the cross-modality retrieval performance. This paper experimentally demonstrates the effectiveness of the proposed method, achieving superior performance over the state-of-the-art methods on RegDB and SYSU-MM01 datasets.
Face recognition with dense supervision
2020, Neurocomputing
Citation Excerpt :
Deep face recognition. Great progress has been made in face recognition in recent years, owning to increasing data [1–3], sophisticated network architectures [4,5,16,17], well-designed normalization techniques [18] and loss functions [6,7,9,19,20]. Among these, Most researchers concentrate on designing loss functions.
Recent advances in face recognition mostly concentrate on designing more discriminative loss functions or adding normalization on features/weights to make a single feature more accurate. In this work, inspired by the frequently used multi-patch ensemble method for face recognition and part-based models for person re-identification, we propose a novel training strategy to enhance the discriminability of deeply learned feature from another perspective, namely learning with dense supervision. The main idea is to apply multiple classification losses on top of multiple component features extracted from a single network. Ideally, each component feature is expected to be accurate and have low correlation with the others. To this end, we first design a metric called feature consistency to evaluate the correlation between one component feature and the others, which is defined as the sum of distances between one component feature and the others, where the distance here is measured with KL divergence between corresponding softmax probabilities. Then we use feature consistency to select which component features to sample for one learning pass by importance sampling. The dense supervision significantly outperforms the single supervision baseline and even performs on par with its multi-patch ensemble counterpart which has much more parameters (9×). Our experimental results match state-of-the-art performance on LFW, YTF, MegaFace and surpass the others on LFW BLUFR and VGGFace2 pose protocol, thereby achieving state-of-the-art. Specially, results on VGGFace2 also show the superiority of the dense supervision on cross-pose face matching.

View all citing articles on Scopus

Fei Su is a female professor in the multimedia communication and pattern recognition lab, school of information and telecommunication, Beijing university of posts and telecommunications. She received the Ph.D. degree majoring in Communication and Electrical Systems from BUPT in 2000. She was a visiting scholar at electrical computer engineering department, Carnegie Mellon University from 2008 to 2009, Her current interests include pattern recognition, image and video processing and biometrics. She has authored and co-authored more than 70 journal and conference papers and some textbooks.

Zhicheng Zhao is an associate professor of Beijing University of Posts and Telecommunications. He was a visiting scholar at School of Computer Science, Carnegie Mellon University from 2015 to 2016. His research interests are computer vision, image and video semantic understanding and retrieval. He has authored and coauthored more than 60 journal and conference papers.

Yandong Guo received his Ph.D. in electrical and computer engineering from Purdue University at West Lafayette under the supervision of Prof. Charles Bouman and Prof. Jan Allebach in 2013. He received his B.S.E.E. and M.S.E.E. degree from Beijing University of Posts and Telecommunications in 2005 and 2008 respectively. He was a researcher at Microsoft Research, Redmond from 2013 to 2018. He is currently the chief scientist and vice president at XPeng Motors, taking charge of the AI center. He is also a visiting professor at Beijing University of Posts and Telecommunications, and visiting professor at University of Electronic Science and Technology. Dr. Yandong Guo’s research focuses on computer vision and artificial intelligence. The results of his research have been applied in Microsoft Bing image search, cloud AI service, knowledge graph, HP multi-functional printers, GE CT machine, and many other AI products with billions of users. In the year 2016, Dr. Yandong Guo led the perception team for the connected car project at Microsoft. He is the committee member/technical reviewer for many conferences including CVPR, ICCV, ECCV, ICML, NIPS, ICIP, ICASSP, Electronic Imaging, IJCAI, ACM MM, etc., and reviewer for transactions including T-IP, T-PAMI, T-MM, T-CSVT, etc.

Yanyun Zhao is a female associate professor in the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. She received the Ph.D. degree from Beijing University of Posts and Telecommunications in 2009. Her research interests include pattern recognition, image and video processing. She has authored and coauthored more than 60 journal and conference papers and some textbooks.

Bojin Zhuang is an senior research fellow of Ping An Technology(Shenzhen) Co., Ltd. His research interests are computer vision, nature language process and optimization theory.

View full text

Brief papersDeep class-skewed learning for face recognition

Abstract

Introduction

Section snippets

Related work

Problem elaboration

Experiments

Conclusion

Declaration of interests

Acknowledgment

Deep learning face representation from predicting 10,000 classes

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep learning face representation by joint identification-verification

Proceedings of the Advances in Neural Information Processing Systems

A discriminative feature learning approach for deep face recognition

Proceedings of the European Conference on Computer Vision

Sphereface: deep hypersphere embedding for face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Cosface: large margin cosine loss for deep face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Arcface: Additive angular margin loss for deep face recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Imagenet: a large-scale hierarchical image database

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009.

Learning face representation from scratch

Ms-celeb-1m: a dataset and benchmark for large-scale face recognition

Proceedings of the European Conference on Computer Vision

Labeled faces in the wild: A database forstudying face recognition in unconstrained environments

Proceedings of the Workshop on faces in’Real-Life’Images: detection, Alignment, and Recognition

Frontal to profile face verification in the wild

Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)

Do we really need to collect millions of faces for effective face recognition?

Proceedings of the European Conference on Computer Vision

Range loss for deep face recognition with long-tailed training data

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep face recognition with center invariant loss

Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Large-margin softmax loss for convolutional neural networks

Proceedings of the ICML

Additive margin softmax for face verification

IEEE Signal Processing Letters

C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

Proceedings of the Workshop on Learning from Imbalanced Datasets Ii

Brief papers
Deep class-skewed learning for face recognition