Abstract
Facial attributes are fundamental for studying deep structured information. Single-task face analysis reaches great performance, while analysis of multiple attributes meets challenges, including the network design and cross-dataset learning. In this paper, we propose cross-dataset face analysis based on multi-task learning (CFA-Net), which accomplishes landmark, head pose, age, gender, facial expression, and Action Unit (AU) analysis. Firstly, we balance between the shared and the task-specific structure to design an efficient and accurate network. To guarantee the excellent performance of each task, we utilize classification-based, regression-based, ranking-based, or deep label distribution learning-based methods to extract specific features for diverse tasks. Then, face analysis trained on a single dataset has strict requirements for this dataset. Even if this dataset currently meets the demand, the scalability is poor when tasks increase. Therefore, our training set is a mixture of multiple datasets, and each dataset covers one or several task related labels. Each sample possesses one or several tasks’ labels, and we adopt a sample-dependent loss strategy, which only penalizes available ground truth. The proposed CFA-Net only occupies 1.58G GPU memory and costs 0.021s to address one image. In summary, the proposed CFA-Net behaves fast, occupies less memory, and performs well in every subtask, even better than those under single-task training.
Similar content being viewed by others
References
Agbo-Ajala O, Viriri S (2020) Deep learning approach for facial age classification: a survey of the state-of-the-art. Artif Intell Rev, 1–35
Cao J, Li Y, Zhang Z (2018) Partially shared multi-task convolutional neural network with local constraint for face attribute learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4290–4299
Cao W, Mirjalili V, Raschka S (2019) Consistent rank logits for ordinal regression with convolutional neural networks, arXiv:190107884.6
Chen B, Guan W, Li P, Ikeda N, Hirasawa K, Lu H (2021) Residual multi-task learning for facial landmark localization and expression recognition. Pattern Recogn 115:107893
Chen S, Zhang C, Dong M, Le J, Rao M (2017) Using ranking-cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5183–5192
Chen Z, Badrinarayanan V, Lee CY, Rabinovich A (2018) Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning (PMLR), pp 794–803
Fanelli G, Dantone M, Gall J, Fossati A, Van Gool L (2013) Random forests for real time 3d face analysis. Int J Comput Vis 101(3):437–458
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2235–2245
Gao BB, Zhou HY, Wu J, Geng X (2018) Age estimation using expectation of label distribution learning. In: IJCAI, pp 712–718
Han H, Jain AK, Wang F, Shan S, Chen X (2017) Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans Pattern Anal Mach Intell 40(11):2597–2609
Hand EM, Chellappa R (2017) Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification. In: Proceedings of the Thirty-First AAAI conference on artificial intelligence, pp 4068– 4074
Hossein Farzaneh A, Qi X (2020) Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 406–407
Huang Z, Zhang J, Shan H (2021) When age-invariant face recognition meets face age synthesis: a multi-task learning framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7282–7291
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7482–7491
Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, pp 2144–2151
Kokkinos I (2017) Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6129–6138
Kollias D, Zafeiriou S (2018) Aff-wild2: extending the aff-wild database for affect recognition. arXiv:181107770
Kutvonen K, et al. (2020) Multi-task learning in computer vision
Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2852–2861
Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: a region-based deep enhancing and cropp.ing app.roach for facial action unit detection. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 103–110
Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Asian conference on machine learning, pp 897–911
Liu S, Johns E, Davison AJ (2019) End-to-end multi-task learning with attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1871–1880
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Liu Z, Chen Z, Bai J, Li S, Lian S (2019) Facial pose estimation by deep learning from label distributions. In: Proceedings of the IEEE international conference on computer vision workshops, pp 0–0
Lu J, Goswami V, Rohrbach M, Parikh D, Lee S (2020) 12-in-1: Multi-task vision and language representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10437–10446
Ma C, Chen L, Yong J (2019) Au r-cnn: encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47
Meyerson E, Miikkulainen R (2018) Pseudo-task augmentation: from deep multitask learning to intratask sharing—and back. In: International conference on machine learning (PMLR), pp 3511–3520
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31
Niu X, Han H, Yang S, Huang Y, Shan S (2019) Local relationship learning with person-specific shape regularization for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11917–11926
Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4920–4928
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International conference on automatic face & gesture recognition (FG 2017). IEEE, pp 17–24
Rothe R, Timofte R, Van Gool L (2015) Dex: deep expectation of app.arent age from a single image. In: Proceedings of the IEEE international conference on computer vision workshops, pp 10–15
Ruder S (2017) An overview of multi-task learning in deep neural networks, arXiv:170605098
Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18
Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 705–720
Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing
Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Supp.ressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6897–6906
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Wang L, Wang S, Qi J, Suzuki K (2021) A multi-task mean teacher for semi-supervised facial affective behavior analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3603–3608
Wang S, Yin S, Hao L, Liang G (2021) Multi-task face analyses through adversarial learning. Pattern Recogn 114:107837
Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE international conference on computer vision, pp 6971–6981
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, pp 499–515
Yan Y, Duffner S, Phutane P, Berthelier A, Naturel X, Blanc C, Garcia C, Chateau T (2020) Fine-grained facial landmark detection exploiting intermediate feature representations. Comput Vis Image Underst 200:103036
Yang TY, Chen YT, Lin YY, Chuang YY (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096
Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C (2020) Gradient surgery for multi-task learning, arXiv:200106782
Yue X, Li J, Wu J, Chang J, Wan J, Ma J (2021) Multi-task adversarial autoencoder network for face alignment in the wild. Neurocomputing 437:261–273
Zhang H, Wang M, Liu Y, Yuan Y (2020) Fdn: feature decoupling network for head pose estimation. In: AAAI, pp 12789– 12796
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, Liu P, Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706
Zhang Y, Sun L (2018) Exploring correlations in multiple facial attributes through graph attention network, arXiv:181009162
Zhang Y, Fu K, Wang J, Cheng P (2020) Learning from discrete gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing 407:259–269
Zhao K, Chu WS, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3391–3399
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155
Acknowledgements
This work is supported by the National Natural Science Foundation of China [grant numbers: 61673052], the National Research and Development Major Project [grant numbers: 2017YFD0400100], the Fundamental Research Fund for the Central Universities of China [grant numbers: FRF-TP-20-10B, FRF-GF-19-010A, FRF-IDRY-19-011].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, C., Zhi, R. & Hu, X. Cross-dataset face analysis based on multi-task learning. Appl Intell 53, 12971–12984 (2023). https://doi.org/10.1007/s10489-022-03173-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03173-4