Data augmentation for face recognition

doi:10.1016/j.neucom.2016.12.025

Neurocomputing

Volume 230, 22 March 2017, Pages 184-196

https://doi.org/10.1016/j.neucom.2016.12.025 Get rights and content

Highlights

•
We present five data augmentation methods specific to face images.
•
Landmark perturbation method is able to generate different kinds of transformed face images automatically.
•
Different hairstyles and glasses of face image can be automatically synthesized.
•
Face images with different poses and illuminations can be generated according to 3D face model.

Abstract

Recently, Deep Convolution Neural Networks (DCNNs) have shown outstanding performance in face recognition. However, the supervised training process of DCNN requires a large number of labeled samples which are expensive and time consuming to collect. In this paper, we propose five data augmentation methods dedicated to face images, including landmark perturbation and four synthesis methods (hairstyles, glasses, poses, illuminations). The proposed methods effectively enlarge the training dataset, which alleviates the impacts of misalignment, pose variance, illumination changes and partial occlusions, as well as the overfitting during training. The performance of each data augmentation method is tested on the Multi-PIE database. Furthermore, comparison of these methods are conducted on LFW, YTF and IJB-A databases. Experimental results show that our proposed methods can greatly improve the face recognition performance.

Introduction

Face recognition in unconstrained environment has become increasingly prevalent in many applications, such as identity verification, intelligent visual surveillance and immigration automated clearance system. The classical pipeline of a modern face recognition system typically consists of face detection, face alignment, feature representation, and classification. Among them, feature representation is the most fundamental step. An excellent feature can improve the performance to some degree. Up to now, many approaches of face representation have been proposed. Hand crafted features, such as LBP [1], SIFT [2], were early used to extract image's appearance feature. Later, encoding-based features were developed to learn discriminative feature from data. For example, Fisher vector [3] use unsupervised learning techniques to learn the encoding dictionary from training data. Recently, convolutional neural networks (CNNs) provides a supervised or unsupervised learning framework for robust feature learning, and has demonstrated state-of-the-art performances [4], [5].

Since LeNet-5 [6] was firstly proposed by LeCun et al., variant CNNs have been designed and are prevalent in image classification [7], [8] and object detection [9]. They also have brought a revolution in face recognition, and even outperform human recognition performance [10], [11], [5]. For example, DeepID3 [10], FaceNet [11], BAIDU [5], have reached over 99% face verification accuracy on the widely used Labeled Faces in the Wild (LFW) database [12].

In order to achieve better performance, the networks become much deeper and wider [13]. Therefore, directly training a deep network from scratch requires a large amount of labeled face images, because there are many parameters in a deep network. Sometimes, training with limited data will easily leads to overfitting. With large network and limited training data the test error keeps increasing after several epochs even though the training error is still decreasing as the training epoch increased [14]. In order to address this problem, a large number of strategies have been proposed: fine-tuning models trained from other large public databases (e.g., ImageNet [15]), adopting various regularization methods(e.g., Dropout [14], Maxout [16], and DropConnect [17]), collecting more training data [18], [4], [11]. At present, collecting more training data is directly way to improve the performance. With more training data, the trained model has stronger generalization ability. Many state-of-the-art methods are based on large scale training datasets. For instance, DeepFace [4] trained on 4 Million photos of 4 k people; FaceNet [11] trained on 200 Million photos of 8 Million people.

By taking great advantage of social networks on Internet, a large number of images, including faces, objects, scenes, can be easily crawled by search engines. Being able to access large amount of data meets the needs of deep learning training, but annotating data is a tedious, laborious, and time-consuming work, which even requires volunteers with specific expert knowledge. As size of dataset increasing, mistakes, such as wrong labeling, redundancy and duplication are inevitable. Needless to say, getting a large scale database with correctly labeled is too difficult and expensive for research groups, particularly in academia. Therefore, data augmentation methods have been emerged to generate large number of training data using label-preserving transformations, such as flipping and cropping [7], [19], color casting [20], blur [21], etc. Experiments in [19] have shown that flipping and cropping reduced the top-1 error rate by over 2% in the ILSVRC-2013. Color casting, blur and contrast transformations, help the trained model equipped with a strong generalization ability to unseen but similar noise patterns in the training data [7], [20], [21].

However, the above mentioned methods, which can be efficient to improve neural network based image classification systems for different circumstances, are still not enough for face images. Face image has its own particularity and the main challenges for face recognition including poses, illumination, occlusion, etc. The previous common used data augmentation methods, which just make some simple transformations, cannot handle these problems. Hence, face specified data augmentation methods have been proposed. Jiang et al. [22] proposed an efficient 3D reconstruction method to generate face images with different poses, illuminations and expressions. Mohammadzade and Hatzinakos [23] proposed an expression subspace projection method to synthesize new expression images for each person. Seyyedsalehi et al. [24] tried to generate visual face images with different expressions by using nonlinear manifold separator neural network (NMSNN). Most of previous methods are suitable to constrained environment and only generate fixed types visual face images.

As various poses, illumination and occlusion are common problems in face recognition, these factors not only influence face image pre-processing such as face alignment but also affect face image feature extraction. Meanwhile, the training dataset of face recognition is limited and each person only has a few types of images. Even though DCNNs have a powerful representation ability, they still need different kinds of face images in each subject to learn face variations. At present, the limited training dataset is far from enough for robust feature representation model training and seriously decrease the recognition accuracy in these situations. In this paper, we propose five special data augmentation methods dedicated to these factors: (LP), hairstyles synthesis (HS), glasses synthesis (GS), poses synthesis (PS) and illuminations synthesis (IS). These methods aim to alleviate the impacts of misalignment, pose variance, illumination changes and partial occlusions. Moreover, they can be widely used to unconstrained environment. LP method which randomly perturbs the locations of landmark position before face normalization makes feature extraction model robust to misalignment (e.g., translation, rotation, scaling and shear). HS and GS can generate different hairstyles and glasses giving a face image, which enlarge the training set and make the model robust to similar occlusion. 3D face reconstruction, in contrast to [22], is able to reconstruct 3D face model from image with large pose. When the 3D face model reconstructed, we can use it to imitate different poses and illumination, which make the DCNN model robust to different poses and illuminations. Each data augmentation method is verified on Multi-PIE database. The comparison of different data augmentation methods are conducted on Labeled Faces in the Wild database (LFW) [12], YouTube Faces database (YTF) [25] and IARPA Janus Benchmark A database (IJB-A) [26]. Experimental results show that the proposed data augmentation methods can greatly improve the performance of face recognition.

The rest of this paper is organized as follows. Section 2 reviews related previous works. Our approaches of data augmentation are introduced in Section 3. The experimental results are presented in Section 4 and conclusions are drawn in Section 5.

Section snippets

Related work

At present, only a few datasets are publicly available, e.g. CASIA-WebFace dataset [27] including 10,575 subjects and 494,414 images, CACD dataset [28] including 2000 subjects and 163,446 images. Compared to the dataset used by the Internet giants like Google [11], which contains 200 million images and 8 million unique identities, the existing publicly accessible face datasets are relatively small and not enough for large DCNN model training.

Thus, a number of data augmentation methods have been

Data augmentation

Due to limited training dataset and each person only has a few types of images, it is not sufficient to train a deep and robust DCNN. A reasonable way to enlarge the training dataset size is to use data augmentation. As revealed in previous works [7], [20], [21], data augmentation methods help the trained DCNN model equipped with a strong generalization ability to unseen but similar noise patterns in the training data. In this Section, we will introduce five data augmentation methods specific

Experiment setup

GoogLeNet [8] is a typical DCNN architecture, whose excellent performance has been shown at ILSVRC-2014 contest. It has 27 layers and introduces inception model to find the optimal local construction. The feature extracted from the trained GoogLeNet model is not only discriminative but also low-dimension and sparsity. Due to these merits, we adopted GoogLeNet as default network to train the models for different data augmentation methods throughout our experiments.

CASIA-WebFace dataset [27],

Conclusion

This paper presents five data augmentation methods for improving face recognition performance, which aim at increasing the effective size of the training set. Compared with previous data augmentation methods, our methods are dedicated to face images and are more efficient in various situations. Experimental results on Multi-PIE database confirm the effectiveness of our methods. Experimental results on popular LFW, YTF and IJB-A databases show that our methods can significantly improve the

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) (Grant no. 61472386), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant XDA 06040103), and Chongqing Research Program of Basic Research and Frontier Technology-No. cstc2016jcyjA0011. The authors would like to thank You-Ji Feng and Cheng Cheng, for valuable discussions.

Jiang-Jing Lv received the B.S. degree in information and computing science from University of Science and Technology of Hunan, Hunan, China, in 2012. He is currently pursuing a Ph.D. degree in pattern recognition at Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China. His research interests include face recognition and deep learning.

References (57)

D. Jiang et al.
Efficient 3d reconstruction for face recognition
Pattern Recognit.
(2005)
Y. Xu et al.
Integrate the original face image and its mirror image for face recognition
Neurocomputing
(2014)
L. Liu et al.
Wow! you are so beautiful today!
ACM Trans. Multimed. Comput. Commun. Appl.
(2014)
Y. Wen et al.
Structured occlusion coding for robust face recognition
Neurocomputing
(2016)
H. Han et al.
A comparative study on illumination preprocessing in face recognition
Pattern Recognit.
(2013)
M.R. Faraji et al.
Face recognition under varying illuminations using logarithmic fractal dimension-based complete eight local directional patterns
Neurocomputing
(2016)
R. Gross et al.
Multi-pie
Image Vis. Comput.
(2010)
A. Goshtasby
Piecewise linear mapping functions for image registration
Pattern Recognit.
(1986)
T. Ahonen et al.
Face description with local binary patterns: application to face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2006)
D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of the IEEE International Conference...

K. Simonyan, O.M. Parkhi, A. Vedaldi, A. Zisserman, Fisher vector faces in the wild, in: Proceedings of the British...

Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: closing the gap to human-level performance in face verification,...

J. Liu, Y. Deng, C. Huang, Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, arXiv preprint...

Y. LeCun et al.

Backpropagation applied to handwritten zip code recognition

Neural Comput.

(1989)

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in:...

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with...

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection with region proposal networks, in:...

Y. Sun, D. Liang, X. Wang, X. Tang, Deepid3: Face Recognition with Very Deep Neural Networks, arXiv preprint...

F. Schroff, D. Kalenichenko, J. Philbin, Facenet: a unified embedding for face recognition and clustering, in:...

G.B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition...

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference...

G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving Neural Networks by Preventing...

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image database, in:...

I.J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, Y. Bengio, Maxout networks, in: Proceedings of the...

L. Wan, M. Zeiler, S. Zhang, Y.L. Cun, R. Fergus, Regularization of neural networks using dropconnect, in: Proceedings...

Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes, in: Proceedings of the IEEE...

A.G. Howard, Some Improvements on Deep Convolutional Neural Network Based Image Classification, arXiv preprint...

R. Wu, S. Yan, Y. Shan, Q. Dang, G. Sun, Deep Image: Scaling up Image Recognition, arXiv preprint...

Cited by (113)

Enhancing face recognition with latent space data augmentation and facial posture reconstruction
2024, Expert Systems with Applications
The small amount of training data for many state-of-the-art deep learning-based Face Recognition (FR) systems causes a marked deterioration in their performance. Although a considerable amount of research has addressed this issue by inventing new data augmentation techniques, using either input space transformations or Generative Adversarial Networks (GAN) for feature space augmentations, these techniques have yet to satisfy expectations. In this paper, we propose an approach named Face Representation Augmentation (FRA) for augmenting face datasets. To the best of our knowledge, FRA is the first method that shifts its focus towards manipulating the face embeddings generated by any face representation learning algorithm to create new embeddings representing the same identity and facial emotion but with an altered posture. Extensive experiments conducted in this study convince us of the efficacy of the methodology and its power to provide noiseless, completely new facial representations to improve the training procedure of any FR algorithm. Therefore, FRA can help the recent state-of-the-art FR methods by providing more data for training FR systems. The proposed method, using experiments conducted on the Karolinska Directed Emotional Faces (KDEF) dataset, improves the identity classification accuracies by 9.52 %, 10.04 %, and 16.60 %, in comparison with the base models of MagFace, ArcFace, and CosFace, respectively.
Comparison of two individual identification algorithms for snow leopards (Panthera uncia) after automated detection
2023, Ecological Informatics
Photo-identification of individual snow leopards (Panthera uncia) is the primary data source for density estimation via capture-recapture statistical methods. To identify individual snow leopards in camera trap imagery, it is necessary to match individuals from a large number of images from multiple cameras and historical catalogues, which is both time-consuming and costly. The camouflaged snow leopards also make it difficult for machine learning to classify photos, as they blend in so well with the surrounding mountain environment, rendering applicable software solutions unavailable for the species. To potentially make snow leopard individual identification available via an artificial intelligence (AI) software interface, we first trained and evaluated image classification techniques for a convolutional neural network, pose invariant embeddings (PIE) (a triplet loss network), and compared the accuracy of PIE to that of the HotSpotter algorithm (a SIFT-based algorithm). Data were acquired from a curated library of free-ranging snow leopards taken in Afghanistan between 2012 and 2019 and from captive animals in zoos in Finland, Sweden, Germany, and the United States. We discovered several flaws in the initial PIE model, such as a small amount of background matching, that was addressed, albeit likely not fixed, using background subtraction (BGS) and left-right mirroring (LR) techniques which demonstrated reasonable accuracy (Rank 1: 74% Rank-5: 92%) comparable to the Hotspotter results (Rank 1: 74% Rank 2: 84%)The PIE BGS LR model, in conjunction with Hotspotter, yielded the following results: Rank-1: 85%, Rank-5: 95%, Rank-20: 99%. In general, our findings indicate that PIE BGS LR, in conjunction with HotSpotter, can classify snow leopards more accurately than using either algorithm alone.
Research of soil surface image occlusion removal and inpainting based on GAN used for estimation of farmland soil moisture content
2023, Computers and Electronics in Agriculture
In smart agriculture, soil images are frequently used to estimate soil moisture content (SMC). The soil image can’t fully reflect the real soil surface condition due to the interference of weeds and other obstructions. To solve the occlusion problem, a new occlusion removal idea is proposed: identifying the occluded object, generating corresponding masks to cover it, and restoring the area. Therefore, it involves three parts: image inpainting network research, occlusion recognition and removal system, and practical application in farmland. Soil Surface Occlusion Image Inpainting Network (SOI NET) was proposed for image inpainting based on Generative Adversarial Networks (GAN). A two-stage generation network was designed: Rough Network and Refinement Network which uses the coherent semantic attention (CSA) module to improve the network's context information utilization capability. A soil surface image occlusion removal system was designed, including vehicle-mounted detection system, occlusion target recognition, mask generation and occlusion removal. Indoor and field models for estimating SMC using image color parameters were established. To verify the performance of SOI NET, experiments were carried out to compare with the traditional method and other deep learning methods. The actual farmland occlusion removal experiment was carried out with weeds as the removal object which are the most common occlusion. The results revealed that the system can effectively remove the occlusion from image surface images. In order to verify the improvement effect of occlusion removal image on the detection accuracy of vehicle-mounted terminals, further farmland experiments were conducted to estimate SMC. R² increased from 0.637 to 0.667 and Root Mean Square Error (RMSE) decreased from 1.916 % to 1.822 % after occlusion removal. SOI NET and soil occlusion removal system can make the processed images reflect the real soil conditions and provide help for soil image analysis.
Face recognition of a Lorisidae species based on computer vision
2023, Global Ecology and Conservation
The slow loris is a group of endangered small arboreal primates. In recent decades, illegal hunting and habitat degradation have led to a sharp decline in wild populations. The individuals are challenging to be identified both in captive and natural environments due to their cryptic nocturnal behavior. Computer vision has emerged as a new approach to face recognition for domestic and wild animals. We used a YOLOv5 +U-Net+VGG framework to realize the face identification of Bengal slow lorises (Nycticebus bengalensis) based on 1480 images of 30 individuals housed in Dehong Wildlife Rescue Center, China. This is the first human-annotated face image dataset of Lorisidae primates. The accuracy rate of this deep learning model set for face recognition reached 96.27 %. The results indicated that computer vision and deep learning technology could be used in individual identification of slow lorises, contributing to further study and conservation of this endangered taxa.
Augmenting the training database with the method of gradual similarity ratios in the face recognition systems
2023, Digital Signal Processing: A Review Journal
In face recognition systems, light direction, reflection, and emotional and physical changes on the face are some of the main factors that make recognition difficult. Deep metric learning algorithms called representative learning are frequently preferred in this field. However, in addition to the model's success in feature extraction, factors such as the distribution of samples in this database and appropriate classifier preferences also affect the overall performance of the face recognition system. This study it is aimed to create integrity in the database of a pre-trained deep neural network model by obtaining augmented data for classes with a limited number of samples. Thanks to this method called Graded Similarity Rates (GSR), augmented data that could disrupt class integrity has been removed from the database. This way, classes with limited examples are kept integrity, and classifier behavior is used more effectively. The model proposed in the experimental study reached 99.38% accuracy values compared to traditional data augmentation models. Experimental results have shown that the database has an acceptable level of success even at smaller vector sizes and is more organized.
Using one-dimensional convolutional neural networks and data augmentation to predict thermal production in geothermal fields
2023, Journal of Cleaner Production
Numerical simulation is the most common method to predict reservoir production temperatures during geothermal energy extraction. Considering the principle of numerical modeling, the numerical simulation establishment process requires a large amount of good exploration data. In addition, it is heavily influenced by subsurface heterogeneity. Also, despite the superior performance of deep learning models, sparse data is a critical challenge in the training process. Therefore, we propose a one-dimensional-convolutional neural network (1D-CNN) model and use data augmentation techniques to build a large-scale multiscale production temperature data set. The network learns the nonlinear relationship between boundary conditions and production temperature from the data set and reaches the production temperature prediction for a three-well geothermal system. The maximum difference in production temperature is 1.8181 °C and the generalization performance is improved by 59.6%. It is worth noting that the excellent generalization capability indicates that the data-driven concept behind the model is an easily interpretable one. As a new data processing concept, the “data-guided approach” is a key step in establishing a universal approach for application in the geothermal field.

View all citing articles on Scopus

Xiao-Hu Shao received the B.E. degree in Telecommunication Engineering from China University of Geosciences in 2009. He received the M.E. degree in Signal and Information Processing from University of Electronic Science and Technology of China in 2012. He joined in Chongqing Institute of Green and Intelligent Technology (CIGIT), Chinese Academy of Sciences as a research trainees from 2012 to 2015. He is pursuing Ph.D. degree in CIGIT and supervised by professor Xi Zhou. His research interests include object detection, 3D face reconstruction and face recognition.

Jia-Shui Huang received the M.S. and Ph.D. degrees in Computer Science from the Zhejiang University, Zhejiang, China, in 2006 and 2010 respectively. He is currently an associate professor at Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences. His research interests include computer vision and machine learning, with focus on face recognition and deep learning.

Xiang-Dong Zhou is an associate professor at the Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences. He received the B.S. degree in Applied Mathematics, the M.S. degree in Management Science and Engineering from National University of Defense Technology, Changsha, China, the Ph.D. degree in pattern recognition and artificial intelligence from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 1998, 2003 and 2009, respectively. He was a postdoctoral fellow at Tokyo University of Agriculture and Technology from March 2009 to March 2011. From May 2011 to October 2013, he was a research assistant and later an associate professor at the Institute of Software, Chinese Academy of Sciences. His research interests include machine learning and pattern recognition.

Xi Zhou received the B.S. and M.S. degrees in electronic science and technology from University of Science and Technology of China, Hefei, China, and the Ph.D. degree in electrical and computer engineering from University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 2010. He is a Professor with Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China, and the Founding Lead of the Intelligent Multimedia Research Center. He has authored or co-authored more than 40 technical papers with Google Scholar Citation more than 600 times. His research interests include pattern recognition, machine learning, and computer vision and multimedia. Dr. Zhou received the Best Paper Award from the International Conference on Image Processing in 2007, the Best Student Paper Award from International Conference on Pattern Recognition in 2008, and the Best Paper Award from ACM Multimedia in 2013.

View full text

Data augmentation for face recognition

Highlights

Abstract

Introduction

Section snippets

Related work

Data augmentation

Experiment setup

Conclusion

Acknowledgments

Pattern Recognit.

Neurocomputing

ACM Trans. Multimed. Comput. Commun. Appl.

Neurocomputing

Pattern Recognit.

Neurocomputing

Image Vis. Comput.

Pattern Recognit.

Face description with local binary patterns: application to face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Backpropagation applied to handwritten zip code recognition

Neural Comput.