Skip to main content

Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Abstract

Facial expression recognition in-the-wild is essential for various interactive computing applications. Especially, “Learning from Synthetic Data” is an important topic in the facial expression recognition task. In this paper, we propose a multi-task learning-based facial expression recognition approach where emotion and appearance perspectives of facial images are jointly learned. We also present our experimental results on validation and test set of the LSD challenge introduced in the 4th affective behavior analysis in-the-wild competition. Our method achieved the mean F1 score of 71.82 on the validation and 35.87 on the test set, ranking third place on the final leaderboard.

J.-Y. Jeong, Y.-G. Hong, S. Hong and J. Oh—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbasnejad, I., Sridharan, S., Nguyen, D., Denman, S., Fookes, C., Lucey, S.: Using synthetic data to improve facial expression analysis with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1609–1618 (2017)

    Google Scholar 

  2. AI-Hub: Video dataset for korean facial expression recognition. Available at https://bit.ly/3ODKQNj. Accessed 21 Jul 2022

  3. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    Google Scholar 

  5. Canedo, D., Neves, A.J.: Facial expression recognition using computer vision: a systematic review. Appl. Sci. 9(21), 4678 (2019)

    Article  Google Scholar 

  6. Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2Det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)

    Google Scholar 

  7. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)

    Google Scholar 

  8. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  9. Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)

    Google Scholar 

  10. Deng, L., Platt, J.: Ensemble deep learning for speech recognition. In: Proceedings of Interspeech (2014)

    Google Scholar 

  11. Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)

    Article  Google Scholar 

  12. Fu, J., Liu, J., Jiang, J., Li, Y., Bao, Y., Lu, H.: Scene segmentation with dual relation-aware attention network. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2547–2560 (2020)

    Article  Google Scholar 

  13. Gao, H., Ogawara, K.: Face alignment using a GAN-based photorealistic synthetic dataset. In: 2022 7th International Conference on Control and Robotics Engineering (ICCRE), pp. 147–151. IEEE (2022)

    Google Scholar 

  14. Gera, D., Kumar, B.N.S., Kumar, B.V.R., Balasubramanian, S.: SS-MFAR : semi-supervised multi-task facial affect recognition. arXiv preprint arXiv:2207.09012 (2022)

  15. Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6

    Chapter  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  17. Hu, J., et al.: ISTR: end-to-end instance segmentation with transformers. arXiv preprint arXiv:2105.00637 (2021)

  18. Huang, Y., Chen, F., Lv, S., Wang, X.: Facial expression recognition: a survey. Symmetry 11(10), 1189 (2019)

    Google Scholar 

  19. Jeong, J.Y., Hong, Y.G., Kim, D., Jeong, J.W., Jung, Y., Kim, S.H.: Classification of facial expression in-the-wild based on ensemble of multi-head cross attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2353–2358 (2022)

    Google Scholar 

  20. Kollias, D.: ABAW: learning from synthetic data & multi-task learning challenges. arXiv preprint arXiv:2207.01138 (2022)

  21. Kollias, D.: Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2328–2336 (2022)

    Google Scholar 

  22. Kollias, D., Cheng, S., Pantic, M., Zafeiriou, S.: Photorealistic facial synthesis in the dimensional affect space. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

    Google Scholar 

  23. Kollias, D., Cheng, S., Ververas, E., Kotsia, I., Zafeiriou, S.: Deep neural network augmentation: generating faces for affect analysis. Int. J. Comput. Vis. 128(5), 1455–1484 (2020)

    Article  Google Scholar 

  24. Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 26–33 (2017)

    Google Scholar 

  25. Kollias, D., Sharmanska, V., Zafeiriou, S.: Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790 (2021)

  26. Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. Int. J. Comput. Vis. 127(6), 907–929 (2019)

    Article  Google Scholar 

  27. Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855 (2019)

  28. Kollias, D., Zafeiriou, S.: VA-StarGAN: continuous affect generation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2020. LNCS, vol. 12002, pp. 227–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40605-9_20

    Chapter  Google Scholar 

  29. Kollias, D., Zafeiriou, S.: Affect analysis in-the-wild: valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792 (2021)

  30. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)

    Google Scholar 

  31. Lee, H., Lim, H., Lim, S.: BYEL : bootstrap on your emotion latent. arXiv preprint arXiv:2207.10003 (2022)

  32. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13, 1195–1215 (2020)

    Google Scholar 

  33. Li, S., et al.: Facial affect analysis: Learning from synthetic data & multi-task learning challenges. arXiv preprint arXiv:2207.09748 (2022)

  34. Mao, S., Li, X., Chen, J., Peng, X.: Au-supervised convolutional vision transformers for synthetic facial expression recognition. arXiv preprint arXiv:2207.09777 (2022)

  35. Mehta, S., Rastegari, M.: Separable self-attention for mobile vision transformers. arXiv preprint arXiv:2206.02680 (2022)

  36. Miao, X., Wang, J., Chang, Y., Wu, Y., Wang, S.: Hand-assisted expression recognition method from synthetic images at the fourth ABAW challenge. arXiv preprint arXiv:2207.09661 (2022)

  37. Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)

    Article  Google Scholar 

  38. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition (2015)

    Google Scholar 

  39. Psaroudakis, A., Kollias, D.: Mixaugment & mixup: Augmentation methods for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2367–2375 (June 2022)

    Google Scholar 

  40. Rossi, L., Karimi, A., Prati, A.: Recursively refined R-CNN: instance segmentation with self-RoI rebalancing. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds.) CAIP 2021. LNCS, vol. 13052, pp. 476–486. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89128-2_46

    Chapter  Google Scholar 

  41. Savchenko, A.V.: HSE-NN team at the 4th ABAW competition: Multi-task emotion recognition and learning from synthetic images. arXiv preprint arXiv:2207.09508 (2022)

  42. Savchenko, A.V., Savchenko, L.V., Makarov, I.: Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Trans. Affect. Comput. 13, 2132–2143 (2022)

    Google Scholar 

  43. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)

    Article  Google Scholar 

  44. Thulasidasan, S., Chennupati, G., Bilmes, J.A., Bhattacharya, T., Michalak, S.: On mixup training: improved calibration and predictive uncertainty for deep neural networks. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  45. Wang, L., Li, H., Liu, C.: Hybrid CNN-transformer model for facial affect recognition in the ABAW4 challenge. arXiv preprint arXiv:2207.10201 (2022)

  46. Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270 (2021)

  47. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)

    Google Scholar 

  48. Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-Wild: valence and arousal’in-the-wild’challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–41 (2017)

    Google Scholar 

  49. Zeng, J., Shan, S., Chen, X.: Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)

    Google Scholar 

  50. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

  51. Zhang, Z., Luo, P., Loy, C.C., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2017). https://doi.org/10.1007/s11263-017-1055-1

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

This work was supported by the NRF grant funded by the Korea government (MSIT) (No.2021R1F1A1059665), by the Basic Research Program through the NRF grant funded by the Korea Government (MSIT) (No.2020R1A4A1017775), and by Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE) (P0017123, The Competency Development Program for Industry Specialist).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-Woo Jeong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jeong, JY. et al. (2023). Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25075-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25074-3

  • Online ISBN: 978-3-031-25075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics