Abstract
The increasing capability of networks for facial expression recognition with disturbing factors is often accompanied by a large computational burden, which imposes limitations on practical applications. In this paper, we propose a lightweight multi-level information fusion network with distillation loss, which can be more lightweight compared with other methods under the premise of not losing accuracy. The multi-level information fusion block uses fewer parameters to focus on information from multiple levels with greater detail awareness, and the channel attention used in this block allows the network to concentrate more on sensitive information when processing facial images with disturbing factors. In addition, the distillation loss makes the network less susceptible to the errors of the teacher network. The proposed method has the fewest parameters of 0.98 million and GFLOPs of 0.142 compared with the state-of-the-art methods while achieving 88.95\(\%\), 64.77\(\%\), 60.63\(\%\), and 62.28\(\%\) on the datasets RAF-DB, AffectNet-7, AffectNet-8, and SFEW, respectively. Abundantly experimental results show the effectiveness of the method. The code is available at https://github.com/Zzy9797/MLIFNet.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Prajod, P., Huber, T., André, E.: Using Explainable ai to identify differences between clinical and experimental pain detection models based on facial expressions. In: Þór Jónsson, B., Gurrin, C., Tran, M.-T., Dang-Nguyen, D.-T., Hu, A.M.-C., Huynh Thi Thanh, B., Huet, B. (eds.) MMM 2022. LNCS, vol. 13141, pp. 311–322. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98358-1_25
Zhao, Z., Liu, Q., Zhou, F.: Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3510–3519 (2021)
Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021)
Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y.: Suppressing uncertainties for large-scale facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6897–6906 (2020)
Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13984–13993 (2020)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Zhang, H., Su, W., Yu, J., Wang, Z.: Weakly supervised local-global relation network for facial expression recognition. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 1040–1046 (2021)
Zhang, F., Xu, M., Xu, C.: Weakly-supervised facial expression recognition in the wild with noisy data. IEEE Trans. Multim. 24, 1800–1814 (2021)
Mo, R., Yan, Y., Xue, J.H., Chen, S., Wang, H.: D\(^3\)Net: dual-branch disturbance disentangling network for facial expression recognition. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 779–787 (2021)
Mo, S., Yang, W., Wang, G., Liao, Q.: Emotion Recognition with facial landmark heatmaps. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 278–289. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_23
Wang, Y., Ma, H., Xing, X., Pan, Z.: Eulerian motion based 3dcnn architecture for facial micro-expression recognition. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 266–277. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_22
Zheng, R., Li, W., Wang, Y.: Visual sentiment analysis by leveraging local regions and human faces. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11961, pp. 303–314. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37731-1_25
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications (2017)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: Making VGG-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)
Ma, H., Celik, T., Li, H.-C.: Lightweight attention convolutional neural network through network slimming for robust facial expression recognition. Signal Image Video Process. 15(7), 1507–1515 (2021). https://doi.org/10.1007/s11760-021-01883-9
Zhou, L., Li, S., Wang, Y., Liu, J.: SDNet: lightweight facial expression recognition for sample disequilibrium. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2415–2419. IEEE (2022)
Wang, J., Li, Y., Lu, H.: Spatial gradient guided learning and semantic relation transfer for facial landmark detection. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 678–690. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_55
Chu, W.-T., Huang, P.-S.: Thermal face recognition based on multi-scale image synthesis. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12572, pp. 99–110. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67832-6_9
Hui, Z., Gao, X., Yang, Y., Wang, X.: Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032 (2019)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 38–39 (2015)
Lin, S., et al.: Knowledge distillation via the target-aware transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10915–10924 (June 2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2106–2112. IEEE (2011)
Zeng, D., Lin, Z., Yan, X., Liu, Y., Wang, F., Tang, B.: Face2Exp: combating data biases for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20291–20300 (2022)
Laurens Van der Maaten, G.H. J.: Visualizing data using t-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Tian, X., Zhang, Z., Xu, X. (2023). Lightweight Multi-level Information Fusion Network for Facial Expression Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-27818-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27817-4
Online ISBN: 978-3-031-27818-1
eBook Packages: Computer ScienceComputer Science (R0)