Abstract
Facial Emotion Recognition (FER) has gained significant attention in recent years due to its potential applications in various fields such as automotive, mental health, and education. Despite the impressive results of Deep Learning (DL) in these areas, a critical shortfall of these systems is the lack of explainability. This paper presents a systematic analysis using Gradient-weighted Class Activation Mapping (Grad-CAM) combined with Guided Backpropagation to investigate the features learned by DL models in FER and their alignment with established emotional theories. We apply this methodology to Convolutional Neural Networks (CNNs) trained on three different emotional datasets: FER-2013, RAF-DB, and AffectNet. Our findings indicate that machine-learned features vary within an emotional state and are not necessarily aligned with expert understanding in emotion psychology. This raises questions about the reliability and ethical implications of using FER systems in sensitive areas, where accurate interpretation of emotions is critical. In response, our study proposes exploring Neuro-Symbolic AI approaches as a potential pathway to more effectively grasp the complexity of emotion psychology and address these concerns. This approach paves the way for the development of new FER model architectures, potentially fostering the emergence of more nuanced emotional concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abhishek, K., Kamath, D.: Attribution-based XAI methods in computer vision: a review. arXiv:2211.14736 (2022)
Araf, T.A., Siddika, A., Karimi, S., Alam, M.G.R.: Real-time face emotion recognition and visualization using grad-CAM. In: 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1–5 (2022). https://doi.org/10.1109/ICAECT54875.2022.9807868
Bai, M., Goecke, R., Herath, D.: Micro-expression recognition based on video motion magnification and pre-trained neural network. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 549–553 (2021). https://doi.org/10.1109/ICIP42928.2021.9506793
Barrett, L.F., Adolphs, R., Marsella, S., Martinez, A.M., Pollak, S.D.: Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol. Sci. Public Interest (2019). https://doi.org/10.1177/1529100619832930
Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832
Chen, G., Zhang, D., Xian, Z., Luo, J., Liang, W., Chen, Y.: Facial expressions classification based on broad learning network. In: 2022 10th International Conference on Information Systems and Computing Technology (ISCTech), pp. 715–720 (2022). https://doi.org/10.1109/ISCTech58360.2022.00118
Cheong, J.H., Jolly, E., Xie, T., Byrne, S., Kenney, M., Chang, L.J.: Py-feat: python facial expression analysis toolbox. arXiv:2104.03509 (2023)
Deramgozin, M., Jovanovic, S., Rabah, H., Ramzan, N.: A hybrid explainable AI framework applied to global and local facial expression recognition. In: 2021 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 1–5 (2021). https://doi.org/10.1109/IST50367.2021.9651357
Ekman, P.: Basic emotions. Handbook of Cognition and Emotion, pp. 301–320. Wiley, New York (1999)
Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System. A Human Face, Salt Lake City, Utah (2002)
Fatima, S.A., Kumar, A., Raoof, S.S.: Real time emotion detection of humans using mini-xception algorithm. IOP Conf. Ser. Mater. Sci. Eng. 1042(1), 012027 (2021). https://doi.org/10.1088/1757-899X/1042/1/012027
Gebele, J., Brune, P., Faußer, S.: Face value: on the impact of annotation (in-)consistencies and label ambiguity in facial data on emotion recognition. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 2597–2604 (2022). https://doi.org/10.1109/ICPR56361.2022.9956230
Gerardo, P.C., Menezes, P.: Classification of FACS-action units with CNN trained from emotion labelled data sets. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3766–3770 (2019). https://doi.org/10.1109/SMC.2019.8914238
Goodfellow, I.J., et al.: Challenges in representation learning: a report on three machine learning contests. arXiv:1307.0414 [cs, stat] (2013)
Guerdan, L., Raymond, A., Gunes, H.: Toward affective XAI: facial affect analysis for understanding explainable human-AI interactions. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 3789–3798 (2021). https://doi.org/10.1109/ICCVW54120.2021.00423
Huang, Y.X., Dai, W.Z., Jiang, Y., Zhou, Z.H.: Enabling knowledge refinement upon new concepts in abductive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 7928–7935 (2023)
Kishan Kondaveeti, H., Vishal Goud, M.: Emotion detection using deep facial features. In: 2020 IEEE International Conference on Advent Trends in Multidisciplinary Research and Innovation (ICATMRI), pp. 1–8 (2020). https://doi.org/10.1109/ICATMRI51801.2020.9398439
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13(3), 1195–1215 (2022). https://doi.org/10.1109/TAFFC.2020.2981446
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2584–2593. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.277
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohn-kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp. 94–101 (2010). https://doi.org/10.1109/CVPRW.2010.5543262
Lundberg, S., Lee, S.I.: A unified approach to interpreting model predictions (2017). https://doi.org/10.48550/arXiv.1705.07874
Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vision 120(3), 233–255 (2016). https://doi.org/10.1007/s11263-016-0911-8
Malek–Podjaski, M., Deligianni, F.: Towards explainable, privacy-preserved human-motion affect recognition. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01–09 (2021). https://doi.org/10.1109/SSCI50451.2021.9660129
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Molnar, C.: Interpretable Machine Learning (Second Edition) - A Guide for Making Black Box Models Explainable. Leanpub (2018)
Moreno-Armendáriz, M.A., Espinosa-Juarez, A., Godinez-Montero, E.: Using diverse ConvNets to classify face action units in dataset on emotions among Mexicans (DEM). IEEE Access 12, 15268–15279 (2024)
Mouakher, A., Chatry, S., Yacoubi, S.E.: A multi-criteria evaluation framework for facial expression recognition models. In: 2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA), pp. 1–8 (2023). https://doi.org/10.1109/AICCSA59173.2023.10479285
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. arXiv:1602.04938 (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision 128(2), 336–359 (2020). https://doi.org/10.1007/s11263-019-01228-7
Shahabinejad, M., Wang, Y., Yu, Y., Tang, J., Li, J.: Toward personalized emotion recognition: a face recognition based attention method for facial emotion recognition. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–5 (2021). https://doi.org/10.1109/FG52635.2021.9666982
Shingjergji, K., Iren, D., Böttger, F., Urlings, C., Klemke, R.: Interpretable explainability in facial emotion recognition and gamification for data collection. In: 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8 (2022). https://doi.org/10.1109/ACII55700.2022.9953864
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2014)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv:1706.03825 (2017)
Wadhawan, R., Gandhi, T.K.: Landmark-aware and part-based ensemble transfer learning network for static facial expression recognition from images. IEEE Trans. Artif. Intell. 4(2), 349–361 (2023). https://doi.org/10.1109/TAI.2022.3172272
Yang, J., Zhang, F., Chen, B., Khan, S.U.: Facial expression recognition based on facial action unit. In: 2019 Tenth International Green and Sustainable Computing Conference (IGSC), pp. 1–6 (2019). https://doi.org/10.1109/IGSC48788.2019.8957163
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv:1311.2901 (2013)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. arXiv:1512.04150 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gebele, J., Brune, P., Schwab, F., von Mammen, S. (2025). Interpreting Emotions Through the Grad-CAM Lens: Insights and Implications in CNN-Based Facial Emotion Recognition. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15313. Springer, Cham. https://doi.org/10.1007/978-3-031-78201-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-78201-5_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78200-8
Online ISBN: 978-3-031-78201-5
eBook Packages: Computer ScienceComputer Science (R0)