Abstract
Since electronic devices have become an integral part of life, there has been a need to bring the communication between a human and a machine closer to being as similar as possible to that between two people. As interpersonal relationships are built on the basis of feelings and empathy, training machines to understand emotions and to provide responses in accordance with the emotional state of the user, i.e. human, has become an interesting area for technology development. To gain a more comprehensive understanding of a person's emotional state, simultaneous utilization of different modalities such as audio, text, and video and their further processing using a graph neural network, recently became popular due to its suitability for tracking a conversation. However, small IoT devices commonly have constrained computational capabilities, memory resources and lower power consumption, and running such a complex multimodal algorithm in real-time may be difficult. In this research, we examine utilization of binarization and 8-bit floating point arithmetic for compressing state-of-the-art GNN-based model COGMEN. We demonstrate that in the case of the multimodal emotion recognition task, such constrained models can provide significant data savings while maintaining relatively high performance, as shown through experiments processing data from the IEMOCAP dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
De Rivera, J., Grinkis, C.: Emotions as social relationships. Motiv. Emot. 10, 351–369 (1986)
Frijda, N.H.: The Emotions. Cambridge University Press (1986)
Delić, V., et al.: Speech technology progress based on new machine learning paradigm. Comput. Intell. Neurosci. 2019, 1–19 (2019)
Yang, C., et al.: Emotion-dependent language featuring depression. J. Behav. Therapy Exp. Psych. 81, 101883 (2023)
Mahlke, S., Minge, M.: Emotions and EMG measures of facial muscles in interactive contexts. Cogn. Emot. 6, 169–200 (2006)
Simić, N., et al.: Enhancing emotion recognition through federated learning: a multimodal approach with convolutional neural networks. Appl. Sci. 14(4), 1325 (2024)
Hebb, D.O.: Emotion in man and animal: an analysis of the intuitive processes of recognition. Psychol. Rev. 53(2), 88 (1946)
Simić, N., et al.: Speaker recognition using constrained convolutional neural networks in emotional speech. Entropy 24(3), 414 (2022)
Cowie, R., et al.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Joshi, A., Bhat, A., Jain, A., Singh, A.V., Modi, A.: COGMEN: COntextualized GNN based multimodal emotion recognitioN. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, USA, pp. 4148–4164 (2022)
Liang, F., Qian, C., Yu, W., Griffith, D., Golmie, N.: Survey of graph neural networks and applications. Wirel. Commun. Mob. Comput. 2022(1), 9261537 (2022)
Bajovic, D., et al.: MARVEL: multimodal extreme scale data analytics for smart cities environments. In: proceedings of 2021 International Balkan Conference on Communications and Networking, BalkanCom, Novi Sad, Serbia, pp. 143–147 (2021)
Choi, Y., El-Khamy, M., Lee, J.: Universal deep neural network compression. IEEE J. Sel. Top. Sig. Process. 14(4), 715–726 (2020)
Ajay, B.S., Rao, M.: Binary neural network based real time emotion detection on an edge computing device to detect passenger anomaly. In: Proceedings of the 2021 34th International Conference on VLSI Design and 2021 20th International Conference on Embedded Systems (VLSID), Guwahati, India, pp. 175–180 (2021)
Muhammad, G., Hossain, M.S.: Emotion recognition for cognitive edge computing using deep learning. IEEE Int. Things J. 8(23), 16894–16901 (2021)
Liu, S., Ha, D.S., Shen, F., Yi, Y.: Efficient neural networks for edge devices. Comput. Electr. Eng. 92(107121), 1–24 (2021)
Wu, L., Cui, P., Pei, J., Zhao, L.: Graph Neural Networks: Foundations, Frontiers, and Applications. Springer (2022)
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.: Dialoguegcn: a graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, Association for Computational Linguistics, Hong Kong, China, pp. 154–164 (2019)
Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 793–803 (2019)
Liang, Y., Meng, F., Zhang, Y., Chen, Y., Xu, J., Zhou, J.: Infusing multi-source knowledge with heterogeneous graph neural network for emotional conversation generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, pp. 13343–13352 (2021)
Neill, J.O.: An overview of neural network compression. arXiv preprint arXiv:2006.03669 (2020)
Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. The Semantic Web. ESWC 2018. Lecture Notes in Computer Science(), vol. 10843. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_38
Shi, Y., Huang, Z., Feng, S., Zhong, H., Wang, W., Sun, Y.: Masked label prediction: unified message passing model for semi-supervised classification. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Montreal, Canada, pp. 1548–1554 (2020)
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008)
Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.P.: Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, Long Papers, pp. 2236–2246 (2018)
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or –1. arXiv 2016, arXiv:1602.02830v3 (2016)
Kahan, W.: IEEE standard 754 for binary floating-point arithmetic. Lect. Notes Status IEEE 754(94720–1776), 11 (1996)
Wang, H., et al.: Binarized graph neural network. World Wide Web 24, 825–848 (2021)
Huang, L., et al.: EPQuant: a Graph Neural Network compression approach based on product quantization. Neurocomputing 503, 49–61 (2022)
Liang, T., Glossner, J., Wang, L., Shi, S., Zhang, X.: Pruning and quantization for deep neural network acceleration: a survey. Neurocomputing 461, 370–403 (2021)
Acknowledgments
This study was Funded by the European Union (Multilingual and Cross-cultural interactions for context-aware, and bias-controlled dialogue systems for safety-critical applications (ELOQUENCE) project, Grant agreement No. 101135916). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission-EU. Neither the European Union nor the granting authority can be held responsible for them.
Also, this research was supported by the Science Fund of the Republic of Serbia, Grant No. 7449, Multimodal multilingual human-machine speech communication, AI-SPEAK.
Disclosure of Interests.
The authors have no competing interests to declare that are relevant to the content of this article.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Đurkić, T., Simić, N., Suzić, S., Bajović, D., Perić, Z., Delić, V. (2025). Multimodal Emotion Recognition Using Compressed Graph Neural Networks. In: Karpov, A., Delić, V. (eds) Speech and Computer. SPECOM 2024. Lecture Notes in Computer Science(), vol 15300. Springer, Cham. https://doi.org/10.1007/978-3-031-78014-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-78014-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78013-4
Online ISBN: 978-3-031-78014-1
eBook Packages: Computer ScienceComputer Science (R0)