Affective Behaviour Analysis Using Pretrained Model with Facial Prior

Li, Yifan; Sun, Haomiao; Liu, Zhaori; Han, Hu; Shan, Shiguang

doi:10.1007/978-3-031-25075-0_2

Yifan Li^10,11,
Haomiao Sun^10,11,
Zhaori Liu^10,11,
Hu Han^10,11 &
…
Shiguang Shan^10,11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13806))

Included in the following conference series:

European Conference on Computer Vision

1365 Accesses
4 Citations

Abstract

Affective behavior analysis has aroused researchers’ attention due to its broad applications. However, it is labor exhaustive to obtain accurate annotations for massive face images. Thus, we propose to utilize the prior facial information via Masked Auto-Encoder (MAE) pretrained on unlabeled face images. Furthermore, we combine MAE pretrained Vision Transformer (ViT) and AffectNet pretrained CNN to perform multi-task emotion recognition. We notice that expression and action unit (AU) scores are pure and intact features for valence-arousal (VA) regression. As a result, we utilize AffectNet pretrained CNN to extract expression scores concatenating with expression and AU scores from ViT to obtain the final VA features. Moreover, we also propose a co-training framework with two parallel MAE pretrained ViTs for expression recognition tasks. In order to make the two views independent, we randomly mask most patches during the training process. Then, JS divergence is performed to make the predictions of the two views as consistent as possible. The results on ABAW4 show that our methods are effective, and our team reached 2nd place in the multi-task learning (MTL) challenge and 4th place in the learning from synthetic data (LSD) challenge. Code is available \(^{3}\)https://github.com/JackYFL/EMMA_CoTEX_ABAW4.

Y. Li, H. Sun, Z. Liu—These authors contribute equally to this work. This research was supported in part by the National Key R &D Program of China (grant 2018AAA0102501), and the National Natural Science Foundation of China (grant 62176249).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bao, H., Dong, L., Wei, F.: Beit: bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of ICML, pp. 1597–1607 (2020)
Google Scholar
Deng, D.: Multiple emotion descriptors estimation at the abaw3 challenge (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of CVPR, pp. 16000–16009 (2022)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR, pp. 9729–9738 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Google Scholar
Jeong, E., Oh, G., Lim, S.: Multitask emotion recognition model with knowledge distillation and task discriminator. arXiv preprint arXiv:2203.13072 (2022)
Jeong, J.Y., Hong, Y.G., Kim, D., Jung, Y., Jeong, J.W.: Facial expression recognition based on multi-head cross attention network. arXiv preprint arXiv:2203.13235 (2022)
Jeong, J.Y., Hong, Y.G., Oh, J., Hong, S., Jeong, J.W., Jung, Y.: Learning from synthetic data: Facial expression classification based on ensemble of multi-task networks. arXiv preprint arXiv:2207.10025 (2022)
Kim, J.H., Kim, N., Won, C.S.: Facial expression recognition with swin transformer. arXiv preprint arXiv:2203.13472 (2022)
Kollias, D.: Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In: Proceedings of CVPR, pp. 2328–2336 (2022)
Google Scholar
Kollias, D., Cheng, S., Pantic, M., Zafeiriou, S.: Photorealistic facial synthesis in the dimensional affect space. In: Proceedings of ECCVW (2018)
Google Scholar
Kollias, D., Cheng, S., Ververas, E., Kotsia, I., Zafeiriou, S.: Deep neural network augmentation: generating faces for affect analysis. IJCV 128(5), 1455–1484 (2020)
Article Google Scholar
Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: Proceedings of CVPRW, pp. 1972–1979. IEEE (2017)
Google Scholar
Kollias, D., Sharmanska, V., Zafeiriou, S.: Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790 (2021)
Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. In: IJCV, pp. 1–23 (2019)
Google Scholar
Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855 (2019)
Kollias, D., Zafeiriou, S.: VA-StarGAN: continuous affect generation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2020. LNCS, vol. 12002, pp. 227–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40605-9_20
Chapter Google Scholar
Kollias, D., Zafeiriou, S.: Affect analysis in-the-wild: valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792 (2021)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE TAC (2020)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retr. Aug. 15(2018), 11 (2018)
Google Scholar
Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE TAC 2(2), 92–105 (2011)
Google Scholar
Niu, X., Han, H., Shan, S., Chen, X.: Multi-label co-regularization for semi-supervised facial action unit recognition. In: Proceedings of NeurIPS, pp. 909–919 (2019)
Google Scholar
Phan, K.N., Nguyen, H.H., Huynh, V.T., Kim, S.H.: Expression classification using concatenation of deep neural network for the 3rd abaw3 competition. arXiv preprint arXiv:2203.12899 (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of CVPR, pp. 10428–10436 (2020)
Google Scholar
Savchenko, A.V.: Hse-nn team at the 4th abaw competition: multi-task emotion recognition and learning from synthetic images. arXiv preprint arXiv:2207.09508 (2022)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of ICML, pp. 6105–6114. PMLR (2019)
Google Scholar
Wang, P., Wang, Z., Ji, Z., Liu, X., Yang, S., Wu, Z.: Tal emotionet challenge 2020 rethinking the model chosen problem in multi-task learning. In: Proceedings of CVPRW, pp. 412–413 (2020)
Google Scholar
Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270 (2021)
Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z.: Multi-label co-training. In: Proceedings of IJCAI, pp. 2882–2888 (2018)
Google Scholar
Xue, F., Tan, Z., Zhu, Y., Ma, Z., Guo, G.: Coarse-to-fine cascaded networks with smooth predicting for video facial expression recognition. In: Proceedings of CVPR, pp. 2412–2418 (2022)
Google Scholar
Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal ‘in-the-wild’challenge. In: Proceedings of CVPRW, pp. 1980–1987. IEEE (2017)
Google Scholar
Zhang, T., et al.: Emotion recognition based on multi-task learning framework in the abaw4 challenge. arXiv preprint arXiv:2207.09373 (2022)
Zhang, W., et al.: Transformer-based multimodal information fusion for facial expression analysis. In: Proceedings of CVPRW, pp. 2428–2437 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Yifan Li, Haomiao Sun, Zhaori Liu, Hu Han & Shiguang Shan
University of the Chinese Academy of Science, Beijing, 100049, China
Yifan Li, Haomiao Sun, Zhaori Liu, Hu Han & Shiguang Shan

Authors

Yifan Li
View author publications
You can also search for this author in PubMed Google Scholar
Haomiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhaori Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hu Han
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Sun, H., Liu, Z., Han, H., Shan, S. (2023). Affective Behaviour Analysis Using Pretrained Model with Facial Prior. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-25075-0_2
Published: 19 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Affective Behaviour Analysis Using Pretrained Model with Facial Prior