Lightweight Multi-level Information Fusion Network for Facial Expression Recognition

Zhang, Yuan; Tian, Xiang; Zhang, Ziyang; Xu, Xiangmin

doi:10.1007/978-3-031-27818-1_13

Lightweight Multi-level Information Fusion Network for Facial Expression Recognition

Yuan Zhang¹⁵,
Xiang Tian¹⁵,
Ziyang Zhang¹⁵ &
…
Xiangmin Xu¹⁵

Conference paper
First Online: 31 March 2023

1256 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13834))

Abstract

The increasing capability of networks for facial expression recognition with disturbing factors is often accompanied by a large computational burden, which imposes limitations on practical applications. In this paper, we propose a lightweight multi-level information fusion network with distillation loss, which can be more lightweight compared with other methods under the premise of not losing accuracy. The multi-level information fusion block uses fewer parameters to focus on information from multiple levels with greater detail awareness, and the channel attention used in this block allows the network to concentrate more on sensitive information when processing facial images with disturbing factors. In addition, the distillation loss makes the network less susceptible to the errors of the teacher network. The proposed method has the fewest parameters of 0.98 million and GFLOPs of 0.142 compared with the state-of-the-art methods while achieving 88.95\(\%\), 64.77\(\%\), 60.63\(\%\), and 62.28\(\%\) on the datasets RAF-DB, AffectNet-7, AffectNet-8, and SFEW, respectively. Abundantly experimental results show the effectiveness of the method. The code is available at https://github.com/Zzy9797/MLIFNet.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Prajod, P., Huber, T., André, E.: Using Explainable ai to identify differences between clinical and experimental pain detection models based on facial expressions. In: Þór Jónsson, B., Gurrin, C., Tran, M.-T., Dang-Nguyen, D.-T., Hu, A.M.-C., Huynh Thi Thanh, B., Huet, B. (eds.) MMM 2022. LNCS, vol. 13141, pp. 311–322. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_25
Chapter Google Scholar
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
Google Scholar
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Article Google Scholar
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Google Scholar
Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13984–13993 (2020)
Google Scholar
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Article MATH Google Scholar
Zhang, H., Su, W., Yu, J., Wang, Z.: Weakly supervised local-global relation network for facial expression recognition. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 1040–1046 (2021)
Google Scholar
Zhang, F., Xu, M., Xu, C.: Weakly-supervised facial expression recognition in the wild with noisy data. IEEE Trans. Multim. 24, 1800–1814 (2021)
Article Google Scholar
Mo, R., Yan, Y., Xue, J.H., Chen, S., Wang, H.: D\(^3\)Net: dual-branch disturbance disentangling network for facial expression recognition. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 779–787 (2021)
Google Scholar
Mo, S., Yang, W., Wang, G., Liao, Q.: Emotion Recognition with facial landmark heatmaps. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 278–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_23
Chapter Google Scholar
Wang, Y., Ma, H., Xing, X., Pan, Z.: Eulerian motion based 3dcnn architecture for facial micro-expression recognition. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 266–277. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_22
Chapter Google Scholar
Zheng, R., Li, W., Wang, Y.: Visual sentiment analysis by leveraging local regions and human faces. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 303–314. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_25
Chapter Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications (2017)
Google Scholar
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Chapter Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: Making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Google Scholar
Ma, H., Celik, T., Li, H.-C.: Lightweight attention convolutional neural network through network slimming for robust facial expression recognition. Signal Image Video Process. 15(7), 1507–1515 (2021). https://doi.org/10.1007/s11760-021-01883-9
Article Google Scholar
Zhou, L., Li, S., Wang, Y., Liu, J.: SDNet: lightweight facial expression recognition for sample disequilibrium. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2415–2419. IEEE (2022)
Google Scholar
Wang, J., Li, Y., Lu, H.: Spatial gradient guided learning and semantic relation transfer for facial landmark detection. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 678–690. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_55
Chapter Google Scholar
Chu, W.-T., Huang, P.-S.: Thermal face recognition based on multi-scale image synthesis. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 99–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_9
Chapter Google Scholar
Hui, Z., Gao, X., Yang, Y., Wang, X.: Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)
Google Scholar
Lin, S., et al.: Knowledge distillation via the target-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10915–10924 (June 2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
Google Scholar
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Article Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2106–2112. IEEE (2011)
Google Scholar
Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F., Tang, B.: Face2Exp: combating data biases for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20291–20300 (2022)
Google Scholar
Laurens Van der Maaten, G.H. J.: Visualizing data using t-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, 510641, China
Yuan Zhang, Xiang Tian, Ziyang Zhang & Xiangmin Xu

Authors

Yuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Ziyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangmin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyang Zhang .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Tian, X., Zhang, Z., Xu, X. (2023). Lightweight Multi-level Information Fusion Network for Facial Expression Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-27818-1_13
Published: 31 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27817-4
Online ISBN: 978-3-031-27818-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics