Elsevier

Neurocomputing

Volume 363, 21 October 2019, Pages 35-45
Neurocomputing

Brief papers
Deep class-skewed learning for face recognition

https://doi.org/10.1016/j.neucom.2019.04.085Get rights and content

Abstract

Face datasets often exhibit highly-skewed class distribution, i.e., rich classes contain a plenty amount of instances, while only few images belong to poor classes. To mitigate this issue, we explore deep class-skewed learning from two aspects in this paper: feature augmentation and feature normalization. To deal with the imbalance distribution problem, we put forward a novel feature augmentation method termed Large Margin Feature Augmentation (LMFA) to augment hard features and equalize class distribution, leading to balanced classification boundaries between rich and poor classes. By considering the distribution gap between training and testing features, A novel feature normalization called Transferable Domain Normalization (TDN) is proposed to normalize domain-specific features to obey an identical Gaussian distribution, and enhance the feature generalization. Extensive experiments are conducted on five popular face recognition datasets including LFW, YTF, CFP, AgeDB and MegaFace. We achieve remarkable results on par with or better than the state-of-the-art methods, which demonstrate the effectiveness of our proposed learning class-balanced features.

Introduction

Recently, we have witnessed the great success of applying convolutional neural networks (CNNs) on face recognition [1], [2], [3], [4], [5], [6]. In order to train high-efficiency recognition models, it is necessary to collect abundant training data, design advanced network architecture and construct discriminative metric learning. Different from the large-scale datasets like ImageNet [7] where the instance number of each class is equally distributed, many face datasets naturally exhibit imbalance in their class distribution. For instance, widely-used face training datasets like CASIA-WebFace [8] and MS-Celeb-1M [9] are collected from the ranked images of search engines i.e., Google, Bing and Baidu, given specifically queried identities. According to the classical page rank algorithm used in search engines, a small number of well-known persons (usually rich classes in datasets) have sufficient face images with consecutive face variations, while a large number of little-known persons (usually poor classes in datasets) only contain a spot of face images with discrete face variations. In testing datasets like LFW [10] and CFP [11], the numbers of positive and negative face pairs are highly skewed since it is easier to obtain face images with different identities (negative) than faces with matched identities (positive) during data collection. Such class-imbalance problems existing in face recognition provide the perfect testbeds for studying generic imbalance learning algorithms. Indeed, without handling the class-imbalance issue, rich classes tend to dominate a greater impact on learning general and robust features than the poor ones, resulting in poor performance.

In this paper, we investigate some more effective and interesting methods for deep class-skewed learning. We show their important applications to face recognition from ubiquitously class-imbalanced datasets. Different from the previous works [12], [13], [14], the class-skewed learning methods in this paper can be attributed into two aspects: feature augmentation and feature normalization. The overview of the proposed methods is shown in Fig. 1.

The first aspect is feature augmentation, which helps to balance feature distributions between poor and rich classes. We observe that poor classes often contain very few samples with high degree of sparse and discrete face variations while rich classes usually possess very plenty instances with high degree of dense and consecutive face variations. The skewed variability of class-imbalanced data makes the genuine clusters from poor classes easy to overlap with other imposter clusters from rich classes. Such inter-class cluster overlap may confuse the underlying classification boundaries formed by poor and rich classes. To reduce this skewed variability, we propose a novel feature augmentation approach called Large Margin Feature Augmentation (LMFA) to augment cross-boundary features for ameliorating inter-class cluster invasion. In view of the biased instance distribution, LMFA generates new features of each class with inverse class frequencies to form dense and consecutive boundaries between poor and rich classes. Inspired by the large margin used in [4], [15], [16], LMFA augments large-margin features to impose margin constraints on a feature manifold for discriminative feature learning.

The second aspect is feature normalization, which contributes to learning domain-disentangled features for training and testing data. The open-set protocol is harder for face recognition because the training and testing classes are mutually exclusive. It usually requires discriminative feature learning to minimize intra-class distances and maximize inter-class differences. However, those disjoint classes cause the feature distribution gap between training and testing domains. Consequently, most available face models cannot effectively transfer the discrimination ability of training features to testing features under the open-set protocol, which results in performance deterioration of face recognition. According to the previous analysis, we propose a novel feature normalization method named Transferable Domain Normalization (TDN) to narrow the statistical gap between the training and testing data. Our objective, then, would transform both training and testing features to obey an identical Gaussian distribution with the mean 0 and variance 1. Such feature normalization introduces a tight constraint for global feature space regularization to learn more balanced class boundaries between poor and rich classes.

The major contributions of this paper are summarized as follows:

  • We reveal the deficiency of class-imbalanced datasets in theory when the softmax loss is used for training models. Rich classes prefer to exhibit intra-class compactness, while poor classes concentrate on learning inter-class separation.

  • We propose a novel feature augmentation method called Large Margin Feature Augmentation (LMFA) which effectively addresses the class-imbalance problem in face recognition. With very minimal effort, LMFA contributes to learning good face features with large margin between classes and achieves face verification accuracy of 99.15% on LFW.

  • We propose a novel feature normalization technique termed Transferable Domain Normalization (TDN) to learn domain-invariant face features between training and testing datasets under the open-set protocol. Without learned parameters and complex computation, TDN effectively transfers the discriminative capability from training features to testing features and obtains superior face verification accuracy 99.45% on LFW.

  • We demonstrate the effectiveness of our proposed methods with extensive experiments over five face datasets (LFW, YTF, CFP, AgeDB and MegaFace). The experimental results have shown the superior or competitive performance of LMFA and TDN over the state-of-the-art methods.

The remainder of the paper is organized as follows: In Section 2, some related works about deep imbalanced learning and deep face recognition are discussed. In Section 3, we simply analyze the class-imbalance problem in theory, and then the details of the proposed LMFA and TDN methods for deep class-skewed learning are described respectively. In Section 4, the experimental results are reported. In Section 5, the conclusions are presented.

Section snippets

Related work

Deep imbalanced learning: Prior works to investigate the class-skewed learning can be mainly attributed into three groups: data re-sampling [17], [18], generative learning [19], [20] and cost-sensitive learning [21], [22]. (1) The first group focuses on equalizing statistic distribution to learn equally feature representations for all classes by under-sampling rich classes or over-sampling poor classes (or both). However, such re-sampling scheme is well-known for some inherent deficiencies.

Problem elaboration

In order to better understand how training optimization is influenced by imbalanced training data, we briefly review face feature learning with the standard softmax loss in a binary classification problem. Suppose that we have two classes, where one rich class has Nr face features Fr={Fr,i}i=1Nr and another poor class has Np face features Fp={Fp,j}j=iNp. The binary softmax loss is calculated as follows:L=1Nr+Npi=1Nrlog(eWrFr,ieWrFr,i+eWpFr,i)1Nr+Npj=1Nplog(eWpFp,jeWrFp,j+eWpFp,j),

Experiments

We study the face recognition task on both small-scale and large-scale imbalanced datasets like CASIA-WebFace [8] and MS-Celeb-1M [9]. As shown in Fig. 4a, the training data of CASIA-WebFace are highly class-skewed. Only 7.76% of the 10K classes have more than 100 images while 38.41% of them have no more than 20 images. Such data imbalance makes it difficult for learning equally robust features. In Fig. 4b, the data distribution of MS-Celeb-1M exhibits more severe class-imbalance than

Conclusion

In this paper, we contribute to improving the deep class-skewed learning performance on face recognition through feature augmentation and feature normalization. Without loss of face identification information, we propose LMFA to augment large-margin face features to equalize the feature distribution and learn class-imbalanced classification boundaries between rich and poor classes. Then we further propose a feature normalization method called TDN to learn domain-transferable features between

Declaration of interests

None.

Acknowledgment

This work is supported by Chinese National Natural Science Foundation (61532018).

Pingyu Wang is currently a Ph.D. candidate at the Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include attribute classification, face recognition, person reidentification and computer vision.

References (40)

  • GuoY. et al.

    One-shot face recognition by promoting underrepresented classes

    (2017)
  • DingZ. et al.

    One-shot face recognition via generative learning

    Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on

    (2018)
  • QiX. et al.

    Face recognition via centralized coordinate learning

    (2018)
  • SunY. et al.

    Deep learning face representation from predicting 10,000 classes

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • SunY. et al.

    Deep learning face representation by joint identification-verification

    Proceedings of the Advances in Neural Information Processing Systems

    (2014)
  • WenY. et al.

    A discriminative feature learning approach for deep face recognition

    Proceedings of the European Conference on Computer Vision

    (2016)
  • LiuW. et al.

    Sphereface: deep hypersphere embedding for face recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • WangH. et al.

    Cosface: large margin cosine loss for deep face recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2018)
  • DengJ. et al.

    Arcface: Additive angular margin loss for deep face recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • DengJ. et al.

    Imagenet: a large-scale hierarchical image database

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009.

    (2009)
  • D. Yi et al.

    Learning face representation from scratch

    (2014)
  • GuoY. et al.

    Ms-celeb-1m: a dataset and benchmark for large-scale face recognition

    Proceedings of the European Conference on Computer Vision

    (2016)
  • HuangG.B. et al.

    Labeled faces in the wild: A database forstudying face recognition in unconstrained environments

    Proceedings of the Workshop on faces in’Real-Life’Images: detection, Alignment, and Recognition

    (2008)
  • S. Sengupta et al.

    Frontal to profile face verification in the wild

    Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV)

    (2016)
  • I. Masi et al.

    Do we really need to collect millions of faces for effective face recognition?

    Proceedings of the European Conference on Computer Vision

    (2016)
  • ZhangX. et al.

    Range loss for deep face recognition with long-tailed training data

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • WuY. et al.

    Deep face recognition with center invariant loss

    Proceedings of the on Thematic Workshops of ACM Multimedia 2017

    (2017)
  • LiuW. et al.

    Large-margin softmax loss for convolutional neural networks

    Proceedings of the ICML

    (2016)
  • WangF. et al.

    Additive margin softmax for face verification

    IEEE Signal Processing Letters

    (2018)
  • C. Drummond et al.

    C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling

    Proceedings of the Workshop on Learning from Imbalanced Datasets Ii

    (2003)
  • Cited by (31)

    • Relevant information undersampling to support imbalanced data classification

      2021, Neurocomputing
      Citation Excerpt :

      Without consideration of the imbalance problem, the classification algorithms can be overwhelmed by the majority class and can ignore the minority one [1]. This phenomenon is encountered in numerous real-world applications such as medical diagnosis [2], face recognition [3], detection of fraudulent transactions [4], among others. Regarding this, there are three main alternatives to deal with the imbalanced data classification issue: i) Algorithm-level approaches that cope with the class distribution problem by modifying the learning stage.

    • Post-comparison mitigation of demographic bias in face recognition using fair score normalization

      2020, Pattern Recognition Letters
      Citation Excerpt :

      Consequently, there is an increased need for fair and unbiased biometric solutions [2,7]. Recent works mainly focused on learning less-biased face representations [11–17] for specific demographics. However, this requires computationally expensive template-replacement of the whole database if the recognition system is updated.

    • EyesGAN: Synthesize human face from human eyes

      2020, Neurocomputing
      Citation Excerpt :

      Recently, face recognition technology has been rapidly developed and applied to solve many practical problems. Wang et al. [38] proposed a feature augmentation method termed Large Margin Feature Augmentation (LMFA) to improve face recognition rate effectively. Deng et al. [39] designed Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition, which required negligible computational overhead.

    • Face recognition with dense supervision

      2020, Neurocomputing
      Citation Excerpt :

      Deep face recognition. Great progress has been made in face recognition in recent years, owning to increasing data [1–3], sophisticated network architectures [4,5,16,17], well-designed normalization techniques [18] and loss functions [6,7,9,19,20]. Among these, Most researchers concentrate on designing loss functions.

    View all citing articles on Scopus

    Pingyu Wang is currently a Ph.D. candidate at the Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing, China. His research interests include attribute classification, face recognition, person reidentification and computer vision.

    Fei Su is a female professor in the multimedia communication and pattern recognition lab, school of information and telecommunication, Beijing university of posts and telecommunications. She received the Ph.D. degree majoring in Communication and Electrical Systems from BUPT in 2000. She was a visiting scholar at electrical computer engineering department, Carnegie Mellon University from 2008 to 2009, Her current interests include pattern recognition, image and video processing and biometrics. She has authored and co-authored more than 70 journal and conference papers and some textbooks.

    Zhicheng Zhao is an associate professor of Beijing University of Posts and Telecommunications. He was a visiting scholar at School of Computer Science, Carnegie Mellon University from 2015 to 2016. His research interests are computer vision, image and video semantic understanding and retrieval. He has authored and coauthored more than 60 journal and conference papers.

    Yandong Guo received his Ph.D. in electrical and computer engineering from Purdue University at West Lafayette under the supervision of Prof. Charles Bouman and Prof. Jan Allebach in 2013. He received his B.S.E.E. and M.S.E.E. degree from Beijing University of Posts and Telecommunications in 2005 and 2008 respectively. He was a researcher at Microsoft Research, Redmond from 2013 to 2018. He is currently the chief scientist and vice president at XPeng Motors, taking charge of the AI center. He is also a visiting professor at Beijing University of Posts and Telecommunications, and visiting professor at University of Electronic Science and Technology. Dr. Yandong Guo’s research focuses on computer vision and artificial intelligence. The results of his research have been applied in Microsoft Bing image search, cloud AI service, knowledge graph, HP multi-functional printers, GE CT machine, and many other AI products with billions of users. In the year 2016, Dr. Yandong Guo led the perception team for the connected car project at Microsoft. He is the committee member/technical reviewer for many conferences including CVPR, ICCV, ECCV, ICML, NIPS, ICIP, ICASSP, Electronic Imaging, IJCAI, ACM MM, etc., and reviewer for transactions including T-IP, T-PAMI, T-MM, T-CSVT, etc.

    Yanyun Zhao is a female associate professor in the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. She received the Ph.D. degree from Beijing University of Posts and Telecommunications in 2009. Her research interests include pattern recognition, image and video processing. She has authored and coauthored more than 60 journal and conference papers and some textbooks.

    Bojin Zhuang is an senior research fellow of Ping An Technology(Shenzhen) Co., Ltd. His research interests are computer vision, nature language process and optimization theory.

    View full text