Skip to main content
Log in

Cross-dataset face analysis based on multi-task learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Facial attributes are fundamental for studying deep structured information. Single-task face analysis reaches great performance, while analysis of multiple attributes meets challenges, including the network design and cross-dataset learning. In this paper, we propose cross-dataset face analysis based on multi-task learning (CFA-Net), which accomplishes landmark, head pose, age, gender, facial expression, and Action Unit (AU) analysis. Firstly, we balance between the shared and the task-specific structure to design an efficient and accurate network. To guarantee the excellent performance of each task, we utilize classification-based, regression-based, ranking-based, or deep label distribution learning-based methods to extract specific features for diverse tasks. Then, face analysis trained on a single dataset has strict requirements for this dataset. Even if this dataset currently meets the demand, the scalability is poor when tasks increase. Therefore, our training set is a mixture of multiple datasets, and each dataset covers one or several task related labels. Each sample possesses one or several tasks’ labels, and we adopt a sample-dependent loss strategy, which only penalizes available ground truth. The proposed CFA-Net only occupies 1.58G GPU memory and costs 0.021s to address one image. In summary, the proposed CFA-Net behaves fast, occupies less memory, and performs well in every subtask, even better than those under single-task training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Agbo-Ajala O, Viriri S (2020) Deep learning approach for facial age classification: a survey of the state-of-the-art. Artif Intell Rev, 1–35

  2. Cao J, Li Y, Zhang Z (2018) Partially shared multi-task convolutional neural network with local constraint for face attribute learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4290–4299

  3. Cao W, Mirjalili V, Raschka S (2019) Consistent rank logits for ordinal regression with convolutional neural networks, arXiv:190107884.6

  4. Chen B, Guan W, Li P, Ikeda N, Hirasawa K, Lu H (2021) Residual multi-task learning for facial landmark localization and expression recognition. Pattern Recogn 115:107893

    Article  Google Scholar 

  5. Chen S, Zhang C, Dong M, Le J, Rao M (2017) Using ranking-cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5183–5192

  6. Chen Z, Badrinarayanan V, Lee CY, Rabinovich A (2018) Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International conference on machine learning (PMLR), pp 794–803

  7. Fanelli G, Dantone M, Gall J, Fossati A, Van Gool L (2013) Random forests for real time 3d face analysis. Int J Comput Vis 101(3):437–458

    Article  Google Scholar 

  8. Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2235–2245

  9. Gao BB, Zhou HY, Wu J, Geng X (2018) Age estimation using expectation of label distribution learning. In: IJCAI, pp 712–718

  10. Han H, Jain AK, Wang F, Shan S, Chen X (2017) Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans Pattern Anal Mach Intell 40(11):2597–2609

    Article  Google Scholar 

  11. Hand EM, Chellappa R (2017) Attributes for improved attributes: a multi-task network utilizing implicit and explicit relationships for facial attribute classification. In: Proceedings of the Thirty-First AAAI conference on artificial intelligence, pp 4068– 4074

  12. Hossein Farzaneh A, Qi X (2020) Discriminant distribution-agnostic loss for facial expression recognition in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 406–407

  13. Huang Z, Zhang J, Shan H (2021) When age-invariant face recognition meets face age synthesis: a multi-task learning framework. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7282–7291

  14. Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7482–7491

  15. Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, pp 2144–2151

  16. Kokkinos I (2017) Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6129–6138

  17. Kollias D, Zafeiriou S (2018) Aff-wild2: extending the aff-wild database for affect recognition. arXiv:181107770

  18. Kutvonen K, et al. (2020) Multi-task learning in computer vision

  19. Li S, Deng W, Du J (2017) Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2852–2861

  20. Li W, Abtahi F, Zhu Z, Yin L (2017) Eac-net: a region-based deep enhancing and cropp.ing app.roach for facial action unit detection. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 103–110

  21. Li Y, Lu Y, Li J, Lu G (2019) Separate loss for basic and compound facial expression recognition in the wild. In: Asian conference on machine learning, pp 897–911

  22. Liu S, Johns E, Davison AJ (2019) End-to-end multi-task learning with attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1871–1880

  23. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738

  24. Liu Z, Chen Z, Bai J, Li S, Lian S (2019) Facial pose estimation by deep learning from label distributions. In: Proceedings of the IEEE international conference on computer vision workshops, pp 0–0

  25. Lu J, Goswami V, Rohrbach M, Parikh D, Lee S (2020) 12-in-1: Multi-task vision and language representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10437–10446

  26. Ma C, Chen L, Yong J (2019) Au r-cnn: encoding expert prior knowledge into r-cnn for action unit detection. Neurocomputing 355:35–47

    Article  Google Scholar 

  27. Meyerson E, Miikkulainen R (2018) Pseudo-task augmentation: from deep multitask learning to intratask sharing—and back. In: International conference on machine learning (PMLR), pp 3511–3520

  28. Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18–31

    Article  Google Scholar 

  29. Niu X, Han H, Yang S, Huang Y, Shan S (2019) Local relationship learning with person-specific shape regularization for facial action unit detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11917–11926

  30. Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4920–4928

  31. Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135

    Article  Google Scholar 

  32. Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: 2017 12th IEEE International conference on automatic face & gesture recognition (FG 2017). IEEE, pp 17–24

  33. Rothe R, Timofte R, Van Gool L (2015) Dex: deep expectation of app.arent age from a single image. In: Proceedings of the IEEE international conference on computer vision workshops, pp 10–15

  34. Ruder S (2017) An overview of multi-task learning in deep neural networks, arXiv:170605098

  35. Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083

  36. Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image Vis Comput 47:3–18

    Article  Google Scholar 

  37. Shao Z, Liu Z, Cai J, Ma L (2018) Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 705–720

  38. Shao Z, Liu Z, Cai J, Wu Y, Ma L (2019) Facial action unit detection using attention and relation learning. IEEE Transactions on Affective Computing

  39. Wang K, Peng X, Yang J, Lu S, Qiao Y (2020) Supp.ressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6897–6906

  40. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  MATH  Google Scholar 

  41. Wang L, Wang S, Qi J, Suzuki K (2021) A multi-task mean teacher for semi-supervised facial affective behavior analysis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3603–3608

  42. Wang S, Yin S, Hao L, Liang G (2021) Multi-task face analyses through adversarial learning. Pattern Recogn 114:107837

    Article  Google Scholar 

  43. Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE international conference on computer vision, pp 6971–6981

  44. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision. Springer, pp 499–515

  45. Yan Y, Duffner S, Phutane P, Berthelier A, Naturel X, Blanc C, Garcia C, Chateau T (2020) Fine-grained facial landmark detection exploiting intermediate feature representations. Comput Vis Image Underst 200:103036

    Article  Google Scholar 

  46. Yang TY, Chen YT, Lin YY, Chuang YY (2019) Fsa-net: learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1087–1096

  47. Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C (2020) Gradient surgery for multi-task learning, arXiv:200106782

  48. Yue X, Li J, Wu J, Chang J, Wan J, Ma J (2021) Multi-task adversarial autoencoder network for face alignment in the wild. Neurocomputing 437:261–273

    Article  Google Scholar 

  49. Zhang H, Wang M, Liu Y, Yuan Y (2020) Fdn: feature decoupling network for head pose estimation. In: AAAI, pp 12789– 12796

  50. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  51. Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, Liu P, Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image Vis Comput 32(10):692–706

    Article  Google Scholar 

  52. Zhang Y, Sun L (2018) Exploring correlations in multiple facial attributes through graph attention network, arXiv:181009162

  53. Zhang Y, Fu K, Wang J, Cheng P (2020) Learning from discrete gaussian label distribution and spatial channel-aware residual attention for head pose estimation. Neurocomputing 407:259–269

    Article  Google Scholar 

  54. Zhao K, Chu WS, Zhang H (2016) Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3391–3399

  55. Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 146–155

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China [grant numbers: 61673052], the National Research and Development Major Project [grant numbers: 2017YFD0400100], the Fundamental Research Fund for the Central Universities of China [grant numbers: FRF-TP-20-10B, FRF-GF-19-010A, FRF-IDRY-19-011].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruicong Zhi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, C., Zhi, R. & Hu, X. Cross-dataset face analysis based on multi-task learning. Appl Intell 53, 12971–12984 (2023). https://doi.org/10.1007/s10489-022-03173-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03173-4

Keywords

Navigation