Abstract
Affective behavior analysis has aroused researchers’ attention due to its broad applications. However, it is labor exhaustive to obtain accurate annotations for massive face images. Thus, we propose to utilize the prior facial information via Masked Auto-Encoder (MAE) pretrained on unlabeled face images. Furthermore, we combine MAE pretrained Vision Transformer (ViT) and AffectNet pretrained CNN to perform multi-task emotion recognition. We notice that expression and action unit (AU) scores are pure and intact features for valence-arousal (VA) regression. As a result, we utilize AffectNet pretrained CNN to extract expression scores concatenating with expression and AU scores from ViT to obtain the final VA features. Moreover, we also propose a co-training framework with two parallel MAE pretrained ViTs for expression recognition tasks. In order to make the two views independent, we randomly mask most patches during the training process. Then, JS divergence is performed to make the predictions of the two views as consistent as possible. The results on ABAW4 show that our methods are effective, and our team reached 2nd place in the multi-task learning (MTL) challenge and 4th place in the learning from synthetic data (LSD) challenge. Code is available \(^{3}\)https://github.com/JackYFL/EMMA_CoTEX_ABAW4.
Y. Li, H. Sun, Z. Liu—These authors contribute equally to this work. This research was supported in part by the National Key R &D Program of China (grant 2018AAA0102501), and the National Natural Science Foundation of China (grant 62176249).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, H., Dong, L., Wei, F.: Beit: bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of ICML, pp. 1597–1607 (2020)
Deng, D.: Multiple emotion descriptors estimation at the abaw3 challenge (2022)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of CVPR, pp. 16000–16009 (2022)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of CVPR, pp. 9729–9738 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Jeong, E., Oh, G., Lim, S.: Multitask emotion recognition model with knowledge distillation and task discriminator. arXiv preprint arXiv:2203.13072 (2022)
Jeong, J.Y., Hong, Y.G., Kim, D., Jung, Y., Jeong, J.W.: Facial expression recognition based on multi-head cross attention network. arXiv preprint arXiv:2203.13235 (2022)
Jeong, J.Y., Hong, Y.G., Oh, J., Hong, S., Jeong, J.W., Jung, Y.: Learning from synthetic data: Facial expression classification based on ensemble of multi-task networks. arXiv preprint arXiv:2207.10025 (2022)
Kim, J.H., Kim, N., Won, C.S.: Facial expression recognition with swin transformer. arXiv preprint arXiv:2203.13472 (2022)
Kollias, D.: Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In: Proceedings of CVPR, pp. 2328–2336 (2022)
Kollias, D., Cheng, S., Pantic, M., Zafeiriou, S.: Photorealistic facial synthesis in the dimensional affect space. In: Proceedings of ECCVW (2018)
Kollias, D., Cheng, S., Ververas, E., Kotsia, I., Zafeiriou, S.: Deep neural network augmentation: generating faces for affect analysis. IJCV 128(5), 1455–1484 (2020)
Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: Proceedings of CVPRW, pp. 1972–1979. IEEE (2017)
Kollias, D., Sharmanska, V., Zafeiriou, S.: Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790 (2021)
Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. In: IJCV, pp. 1–23 (2019)
Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855 (2019)
Kollias, D., Zafeiriou, S.: VA-StarGAN: continuous affect generation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2020. LNCS, vol. 12002, pp. 227–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40605-9_20
Kollias, D., Zafeiriou, S.: Affect analysis in-the-wild: valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792 (2021)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE TAC (2020)
Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retr. Aug. 15(2018), 11 (2018)
Nicolaou, M.A., Gunes, H., Pantic, M.: Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE TAC 2(2), 92–105 (2011)
Niu, X., Han, H., Shan, S., Chen, X.: Multi-label co-regularization for semi-supervised facial action unit recognition. In: Proceedings of NeurIPS, pp. 909–919 (2019)
Phan, K.N., Nguyen, H.H., Huynh, V.T., Kim, S.H.: Expression classification using concatenation of deep neural network for the 3rd abaw3 competition. arXiv preprint arXiv:2203.12899 (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML, pp. 8748–8763. PMLR (2021)
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of CVPR, pp. 10428–10436 (2020)
Savchenko, A.V.: Hse-nn team at the 4th abaw competition: multi-task emotion recognition and learning from synthetic images. arXiv preprint arXiv:2207.09508 (2022)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of ICML, pp. 6105–6114. PMLR (2019)
Wang, P., Wang, Z., Ji, Z., Liu, X., Yang, S., Wu, Z.: Tal emotionet challenge 2020 rethinking the model chosen problem in multi-task learning. In: Proceedings of CVPRW, pp. 412–413 (2020)
Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270 (2021)
Xing, Y., Yu, G., Domeniconi, C., Wang, J., Zhang, Z.: Multi-label co-training. In: Proceedings of IJCAI, pp. 2882–2888 (2018)
Xue, F., Tan, Z., Zhu, Y., Ma, Z., Guo, G.: Coarse-to-fine cascaded networks with smooth predicting for video facial expression recognition. In: Proceedings of CVPR, pp. 2412–2418 (2022)
Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal ‘in-the-wild’challenge. In: Proceedings of CVPRW, pp. 1980–1987. IEEE (2017)
Zhang, T., et al.: Emotion recognition based on multi-task learning framework in the abaw4 challenge. arXiv preprint arXiv:2207.09373 (2022)
Zhang, W., et al.: Transformer-based multimodal information fusion for facial expression analysis. In: Proceedings of CVPRW, pp. 2428–2437 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Sun, H., Liu, Z., Han, H., Shan, S. (2023). Affective Behaviour Analysis Using Pretrained Model with Facial Prior. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-25075-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)