Local-Global Cross-Fusion Transformer Network for Facial Expression Recognition

Liu, Yicheng; Li, Zecheng; Zhang, Yanbo; Wen, Jie

doi:10.1007/978-981-97-2390-4_18

Yicheng Liu¹²,
Zecheng Li¹²,
Yanbo Zhang¹² &
…
Jie Wen¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14332))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

354 Accesses

Abstract

Facial Expression Recognition (FER) has received increasing attention in the computer vision community. For FER, there are two challenging issues among the facial images: large inter-class similarity and small intra-class discrepancy. To address these challenges and obtain a better performance, we propose a Local-Global Cross-Fusion Transformer network in this paper. Specifically, the method seeks to obtain a more discriminative facial representation by sufficiently considering the local features of multiple local regions of the face and global face features. In order to extract the critical local area features of the face, a local feature decomposition module based on facial landmarks is designed. In addition, a local-global cross-fusion Transformer is designed to enhance the synergistic correlation between local features and global features using the cross-attention mechanism, which can maximize the focus on key regions while considering the connection information among local regions. Extensive experiments conducted on three mainstream expression recognition datasets, RAF-DB, FERPlus, and AffectNet, show that the method outperforms many existing expression recognition methods and can significantly improve the accuracy of expression recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LKRNet: a dual-branch network based on local key regions for facial expression recognition

Article 28 July 2020

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Article Open access 08 February 2024

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

Article 05 January 2024

References

Barsoum, E., Zhang, C., Ferrer, C.C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283 (2016)
Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: CrossViT: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
Google Scholar
Chen, S., Liu, Y., Gao, X., Han, Z.: MobileFaceNets: efficient CNNs for accurate real-time face verification on mobile devices. In: Zhou, J., et al. (eds.) Biometric Recognition: 13th Chinese Conference, CCBR 2018, Urumqi, China, 11–12 August 2018, Proceedings 13, pp. 428–438. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97909-0_46
Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13984–13993 (2020)
Google Scholar
Cotter, S.F.: Sparse representation for accurate classification of corrupted and occluded facial expressions. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 838–841. IEEE (2010)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Farzaneh, A.H., Qi, X.: Facial expression recognition in the wild via deep attentive center loss. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2402–2411 (2021)
Google Scholar
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part III 14, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Huang, Y.F., Tsai, C.H.: PIDViT: pose-invariant distilled vision transformer for facial expression recognition in the wild. IEEE Trans. Affect. Comput. 14(4), 3281–3293 (2022)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, H., Wang, N., Ding, X., Yang, X., Gao, X.: Adaptively learning facial expression representation via CF labels and distillation. IEEE Trans. Image Process. 30, 2016–2028 (2021)
Article Google Scholar
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Google Scholar
Lin, W., et al.: CAT: cross-attention transformer for one-shot object detection. arXiv preprint arXiv:2104.14984 (2021)
Ma, F., Sun, B., Li, S.: Facial expression recognition with visual transformers and attentional selective fusion. IEEE Trans. Affect. Comput. 14(2), 1236–1248 (2021)
Article Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Article Google Scholar
Rifai, S., Bengio, Y., Courville, A., Vincent, P., Mirza, M.: Disentangling factors of variation for facial expression recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012, Proceedings, Part VI 12, pp. 808–822. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-33783-3_58
Ruan, D., Yan, Y., Lai, S., Chai, Z., Shen, C., Wang, H.: Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7660–7669 (2021)
Google Scholar
Shan, C., Gong, S., McOwan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Article Google Scholar
She, J., Hu, Y., Shi, H., Wang, J., Shen, Q., Mei, T.: Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2021)
Google Scholar
Maximiano da Silva, F.A., Pedrini, H.: Geometrical features and active appearance model applied to facial expression recognition. Int. J. Image Graph. 16(04), 1650019 (2016)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Google Scholar
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Article Google Scholar
Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. Biomimetics 8(2), 199 (2023)
Article Google Scholar
Xu, X., Wang, T., Yang, Y., Zuo, L., Shen, F., Shen, H.T.: Cross-modal attention with semantic consistence for image-text matching. IEEE Trans. Neural Networks Learn. Syst. 31(12), 5412–5425 (2020)
Article Google Scholar
Xue, F., Wang, Q., Guo, G.: Transfer: learning relation-aware facial expression representations with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3601–3610 (2021)
Google Scholar
Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F., Tang, B.: Face2Exp: combating data biases for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20291–20300 (2022)
Google Scholar
Zhang, Y., Wang, C., Ling, X., Deng, W.: Learn from all: erasing attention consistency for noisy label facial expression recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part XXVI, pp. 418–434. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_24
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Article Google Scholar
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
Google Scholar

Download references

Acknowledgements

This work is supported by the Higher Education Stability Support Program Project (Grant No. GXWD20220811173317002) and Shenzhen Science and Technology Program (Grant No. RCBS20210609103709020).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
Yicheng Liu, Zecheng Li, Yanbo Zhang & Jie Wen

Authors

Yicheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zecheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Wen .

Editor information

Editors and Affiliations

Peng Cheng Laboratory, Shenzhen, China
Xiangyu Song
China University of Geosciences, Wuhan, China
Ruyi Feng
China University of Geosciences, Wuhan, China
Yunliang Chen
Deakin University, Burwood, VIC, Australia
Jianxin Li
University of Exeter, Exeter, UK
Geyong Min

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Li, Z., Zhang, Y., Wen, J. (2024). Local-Global Cross-Fusion Transformer Network for Facial Expression Recognition. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14332. Springer, Singapore. https://doi.org/10.1007/978-981-97-2390-4_18

Download citation

DOI: https://doi.org/10.1007/978-981-97-2390-4_18
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2389-8
Online ISBN: 978-981-97-2390-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local-Global Cross-Fusion Transformer Network for Facial Expression Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LKRNet: a dual-branch network based on local key regions for facial expression recognition

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Local-Global Cross-Fusion Transformer Network for Facial Expression Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LKRNet: a dual-branch network based on local key regions for facial expression recognition

CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network

Hybrid Attention-Aware Learning Network for Facial Expression Recognition in the Wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation