Cross-dataset face analysis based on multi-task learning

Zhou, Caixia; Zhi, Ruicong; Hu, Xin

doi:10.1007/s10489-022-03173-4

Cross-dataset face analysis based on multi-task learning

Published: 05 October 2022

Volume 53, pages 12971–12984, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

248 Accesses
1 Altmetric
Explore all metrics

Abstract

Facial attributes are fundamental for studying deep structured information. Single-task face analysis reaches great performance, while analysis of multiple attributes meets challenges, including the network design and cross-dataset learning. In this paper, we propose cross-dataset face analysis based on multi-task learning (CFA-Net), which accomplishes landmark, head pose, age, gender, facial expression, and Action Unit (AU) analysis. Firstly, we balance between the shared and the task-specific structure to design an efficient and accurate network. To guarantee the excellent performance of each task, we utilize classification-based, regression-based, ranking-based, or deep label distribution learning-based methods to extract specific features for diverse tasks. Then, face analysis trained on a single dataset has strict requirements for this dataset. Even if this dataset currently meets the demand, the scalability is poor when tasks increase. Therefore, our training set is a mixture of multiple datasets, and each dataset covers one or several task related labels. Each sample possesses one or several tasks’ labels, and we adopt a sample-dependent loss strategy, which only penalizes available ground truth. The proposed CFA-Net only occupies 1.58G GPU memory and costs 0.021s to address one image. In summary, the proposed CFA-Net behaves fast, occupies less memory, and performs well in every subtask, even better than those under single-task training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Multi-task Deep Face Recognition

Pre-training Strategies and Datasets for Facial Representation Learning

Heterogeneous Multi-task Learning on Non-overlapping Datasets for Facial Landmark Detection

References

Agbo-Ajala O, Viriri S (2020) Deep learning approach for facial age classification: a survey of the state-of-the-art. Artif Intell Rev, 1–35
Cao J, Li Y, Zhang Z (2018) Partially shared multi-task convolutional neural network with local constraint for face attribute learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4290–4299
Cao W, Mirjalili V, Raschka S (2019) Consistent rank logits for ordinal regression with convolutional neural networks, arXiv:190107884.6
Chen B, Guan W, Li P, Ikeda N, Hirasawa K, Lu H (2021) Residual multi-task learning for facial landmark localization and expression recognition. Pattern Recogn 115:107893
Article Google Scholar
Chen S, Zhang C, Dong M, Le J, Rao M (2017) Using ranking-cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5183–5192
Chen Z, Badrinarayanan V, Lee CY, Rabinovich A (2018) Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning (PMLR), pp 794–803
Fanelli G, Dantone M, Gall J, Fossati A, Van Gool L (2013) Random forests for real time 3d face analysis. Int J Comput Vis 101(3):437–458
Article Google Scholar
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2235–2245
Gao BB, Zhou HY, Wu J, Geng X (2018) Age estimation using expectation of label distribution learning. In: IJCAI, pp 712–718
Han H, Jain AK, Wang F, Shan S, Chen X (2017) Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans Pattern Anal Mach Intell 40(11):2597–2609
Article Google Scholar
Hand EM, Chellappa R (2017) Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification. In: Proceedings of the Thirty-First AAAI conference on artificial intelligence, pp 4068– 4074
Hossein Farzaneh A, Qi X (2020) Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 406–407
Huang Z, Zhang J, Shan H (2021) When age-invariant face recognition meets face age synthesis: a multi-task learning framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7282–7291
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7482–7491
Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, pp 2144–2151
Kokkinos I (2017) Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6129–6138
Kollias D, Zafeiriou S (2018) Aff-wild2: extending the aff-wild database for affect recognition. arXiv:181107770
Kutvonen K, et al. (2020) Multi-task learning in computer vision
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2852–2861
Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: a region-based deep enhancing and cropp.ing app.roach for facial action unit detection. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 103–110
Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Asian conference on machine learning, pp 897–911
Liu S, Johns E, Davison AJ (2019) End-to-end multi-task learning with attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1871–1880
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Liu Z, Chen Z, Bai J, Li S, Lian S (2019) Facial pose estimation by deep learning from label distributions. In: Proceedings of the IEEE international conference on computer vision workshops, pp 0–0
Lu J, Goswami V, Rohrbach M, Parikh D, Lee S (2020) 12-in-1: Multi-task vision and language representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10437–10446
Ma C, Chen L, Yong J (2019) Au r-cnn: encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47
Article Google Scholar
Meyerson E, Miikkulainen R (2018) Pseudo-task augmentation: from deep multitask learning to intratask sharing—and back. In: International conference on machine learning (PMLR), pp 3511–3520
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Article Google Scholar
Niu X, Han H, Yang S, Huang Y, Shan S (2019) Local relationship learning with person-specific shape regularization for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11917–11926
Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4920–4928
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Article Google Scholar
Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International conference on automatic face & gesture recognition (FG 2017). IEEE, pp 17–24
Rothe R, Timofte R, Van Gool L (2015) Dex: deep expectation of app.arent age from a single image. In: Proceedings of the IEEE international conference on computer vision workshops, pp 10–15
Ruder S (2017) An overview of multi-task learning in deep neural networks, arXiv:170605098
Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18
Article Google Scholar
Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 705–720
Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Supp.ressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6897–6906
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Article MATH Google Scholar
Wang L, Wang S, Qi J, Suzuki K (2021) A multi-task mean teacher for semi-supervised facial affective behavior analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3603–3608
Wang S, Yin S, Hao L, Liang G (2021) Multi-task face analyses through adversarial learning. Pattern Recogn 114:107837
Article Google Scholar
Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE international conference on computer vision, pp 6971–6981
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, pp 499–515
Yan Y, Duffner S, Phutane P, Berthelier A, Naturel X, Blanc C, Garcia C, Chateau T (2020) Fine-grained facial landmark detection exploiting intermediate feature representations. Comput Vis Image Underst 200:103036
Article Google Scholar
Yang TY, Chen YT, Lin YY, Chuang YY (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096
Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C (2020) Gradient surgery for multi-task learning, arXiv:200106782
Yue X, Li J, Wu J, Chang J, Wan J, Ma J (2021) Multi-task adversarial autoencoder network for face alignment in the wild. Neurocomputing 437:261–273
Article Google Scholar
Zhang H, Wang M, Liu Y, Yuan Y (2020) Fdn: feature decoupling network for head pose estimation. In: AAAI, pp 12789– 12796
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, Liu P, Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706
Article Google Scholar
Zhang Y, Sun L (2018) Exploring correlations in multiple facial attributes through graph attention network, arXiv:181009162
Zhang Y, Fu K, Wang J, Cheng P (2020) Learning from discrete gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing 407:259–269
Article Google Scholar
Zhao K, Chu WS, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3391–3399
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China [grant numbers: 61673052], the National Research and Development Major Project [grant numbers: 2017YFD0400100], the Fundamental Research Fund for the Central Universities of China [grant numbers: FRF-TP-20-10B, FRF-GF-19-010A, FRF-IDRY-19-011].

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, People’s Republic of China
Caixia Zhou, Ruicong Zhi & Xin Hu
Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing, 100083, People’s Republic of China
Caixia Zhou, Ruicong Zhi & Xin Hu

Authors

Caixia Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ruicong Zhi
View author publications
You can also search for this author in PubMed Google Scholar
Xin Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruicong Zhi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, C., Zhi, R. & Hu, X. Cross-dataset face analysis based on multi-task learning. Appl Intell 53, 12971–12984 (2023). https://doi.org/10.1007/s10489-022-03173-4

Download citation

Accepted: 04 January 2022
Published: 05 October 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-03173-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-dataset face analysis based on multi-task learning

Abstract

Access this article

Similar content being viewed by others

Multi-task Deep Face Recognition

Pre-training Strategies and Datasets for Facial Representation Learning

Heterogeneous Multi-task Learning on Non-overlapping Datasets for Facial Landmark Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-dataset face analysis based on multi-task learning

Abstract

Access this article

Similar content being viewed by others

Multi-task Deep Face Recognition

Pre-training Strategies and Datasets for Facial Representation Learning

Heterogeneous Multi-task Learning on Non-overlapping Datasets for Facial Landmark Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation