Skip to main content

Affective Behaviour Analysis Using Pretrained Model with Facial Prior

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13806))

Included in the following conference series:

Abstract

Affective behavior analysis has aroused researchers’ attention due to its broad applications. However, it is labor exhaustive to obtain accurate annotations for massive face images. Thus, we propose to utilize the prior facial information via Masked Auto-Encoder (MAE) pretrained on unlabeled face images. Furthermore, we combine MAE pretrained Vision Transformer (ViT) and AffectNet pretrained CNN to perform multi-task emotion recognition. We notice that expression and action unit (AU) scores are pure and intact features for valence-arousal (VA) regression. As a result, we utilize AffectNet pretrained CNN to extract expression scores concatenating with expression and AU scores from ViT to obtain the final VA features. Moreover, we also propose a co-training framework with two parallel MAE pretrained ViTs for expression recognition tasks. In order to make the two views independent, we randomly mask most patches during the training process. Then, JS divergence is performed to make the predictions of the two views as consistent as possible. The results on ABAW4 show that our methods are effective, and our team reached 2nd place in the multi-task learning (MTL) challenge and 4th place in the learning from synthetic data (LSD) challenge. Code is available \(^{3}\)https://github.com/JackYFL/EMMA_CoTEX_ABAW4.

Y. Li, H. Sun, Z. Liu—These authors contribute equally to this work. This research was supported in part by the National Key R &D Program of China (grant 2018AAA0102501), and the National Natural Science Foundation of China (grant 62176249).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bao, H., Dong, L., Wei, F.: Beit: bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)

  2. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of ICML, pp. 1597–1607 (2020)

    Google Scholar 

  3. Deng, D.: Multiple emotion descriptors estimation at the abaw3 challenge (2022)

    Google Scholar 

  4. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  5. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of CVPR, pp. 16000–16009 (2022)

    Google Scholar 

  6. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR, pp. 9729–9738 (2020)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)

    Google Scholar 

  8. Jeong, E., Oh, G., Lim, S.: Multitask emotion recognition model with knowledge distillation and task discriminator. arXiv preprint arXiv:2203.13072 (2022)

  9. Jeong, J.Y., Hong, Y.G., Kim, D., Jung, Y., Jeong, J.W.: Facial expression recognition based on multi-head cross attention network. arXiv preprint arXiv:2203.13235 (2022)

  10. Jeong, J.Y., Hong, Y.G., Oh, J., Hong, S., Jeong, J.W., Jung, Y.: Learning from synthetic data: Facial expression classification based on ensemble of multi-task networks. arXiv preprint arXiv:2207.10025 (2022)

  11. Kim, J.H., Kim, N., Won, C.S.: Facial expression recognition with swin transformer. arXiv preprint arXiv:2203.13472 (2022)

  12. Kollias, D.: Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In: Proceedings of CVPR, pp. 2328–2336 (2022)

    Google Scholar 

  13. Kollias, D., Cheng, S., Pantic, M., Zafeiriou, S.: Photorealistic facial synthesis in the dimensional affect space. In: Proceedings of ECCVW (2018)

    Google Scholar 

  14. Kollias, D., Cheng, S., Ververas, E., Kotsia, I., Zafeiriou, S.: Deep neural network augmentation: generating faces for affect analysis. IJCV 128(5), 1455–1484 (2020)

    Article  Google Scholar 

  15. Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: Proceedings of CVPRW, pp. 1972–1979. IEEE (2017)

    Google Scholar 

  16. Kollias, D., Sharmanska, V., Zafeiriou, S.: Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790 (2021)

  17. Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. In: IJCV, pp. 1–23 (2019)

    Google Scholar 

  18. Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855 (2019)

  19. Kollias, D., Zafeiriou, S.: VA-StarGAN: continuous affect generation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2020. LNCS, vol. 12002, pp. 227–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40605-9_20

    Chapter  Google Scholar 

  20. Kollias, D., Zafeiriou, S.: Affect analysis in-the-wild: valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792 (2021)

  21. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE TAC (2020)

    Google Scholar 

  22. Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retr. Aug. 15(2018), 11 (2018)

    Google Scholar 

  23. Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE TAC 2(2), 92–105 (2011)

    Google Scholar 

  24. Niu, X., Han, H., Shan, S., Chen, X.: Multi-label co-regularization for semi-supervised facial action unit recognition. In: Proceedings of NeurIPS, pp. 909–919 (2019)

    Google Scholar 

  25. Phan, K.N., Nguyen, H.H., Huynh, V.T., Kim, S.H.: Expression classification using concatenation of deep neural network for the 3rd abaw3 competition. arXiv preprint arXiv:2203.12899 (2022)

  26. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  27. Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of CVPR, pp. 10428–10436 (2020)

    Google Scholar 

  28. Savchenko, A.V.: Hse-nn team at the 4th abaw competition: multi-task emotion recognition and learning from synthetic images. arXiv preprint arXiv:2207.09508 (2022)

  29. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  30. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of ICML, pp. 6105–6114. PMLR (2019)

    Google Scholar 

  31. Wang, P., Wang, Z., Ji, Z., Liu, X., Yang, S., Wu, Z.: Tal emotionet challenge 2020 rethinking the model chosen problem in multi-task learning. In: Proceedings of CVPRW, pp. 412–413 (2020)

    Google Scholar 

  32. Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270 (2021)

  33. Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z.: Multi-label co-training. In: Proceedings of IJCAI, pp. 2882–2888 (2018)

    Google Scholar 

  34. Xue, F., Tan, Z., Zhu, Y., Ma, Z., Guo, G.: Coarse-to-fine cascaded networks with smooth predicting for video facial expression recognition. In: Proceedings of CVPR, pp. 2412–2418 (2022)

    Google Scholar 

  35. Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal ‘in-the-wild’challenge. In: Proceedings of CVPRW, pp. 1980–1987. IEEE (2017)

    Google Scholar 

  36. Zhang, T., et al.: Emotion recognition based on multi-task learning framework in the abaw4 challenge. arXiv preprint arXiv:2207.09373 (2022)

  37. Zhang, W., et al.: Transformer-based multimodal information fusion for facial expression analysis. In: Proceedings of CVPRW, pp. 2428–2437 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Sun, H., Liu, Z., Han, H., Shan, S. (2023). Affective Behaviour Analysis Using Pretrained Model with Facial Prior. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25075-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25074-3

  • Online ISBN: 978-3-031-25075-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics