Abstract
The integration of empathy in Human-Computer Interaction (HCI) is essential for enhancing user experiences. Current HCI systems often overlook users’ emotional states, limiting interaction quality. This research examines the integration of Multimodal Emotion Recognition (MER) into empathic generative-based conversational agents, encompassing facial, body, and speech emotion recognition, along with sentiment analysis. These elements are fused and incorporated into Large Language Models (LLMs) to continuously comprehend and respond to users empathically. This paper highlights the advantages of this multimodal approach over traditional unimodal systems in recognizing complex human emotions. Additionally, it provides a well-structured background on the addressed topics. The findings include an overview of deep learning in HCI, a review of methods used for emotion recognition and conversational agents, and the proposal of an HCI architecture that integrates facial, body, and speech emotion recognition and sentiment analysis into a fusion model that is fed into an LLM making an empathic conversational agent. This research contributes to the field of HCI by providing an architecture to guide the development of more realistic and meaningful HCIs through MER and a conversational agent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Jaiswal, A., Krishnama Raju, A., Deb, S.: Facial emotion detection using deep learning. In: 2020 International Conference for Emerging Technology (INCET), pp. 1–5 (2020). https://ieeexplore.ieee.org/document/9154121
Santos, B.S., Júnior, M.C., Nunes, M.A.S.N.: Approaches for generating empathy: a systematic mapping. In: Latifi, S. (ed.) Information Technology - New Generations. Advances in Intelligent Systems and Computing, vol. 558, pp. 715–722. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-54978-1_89
Alrowais, F., et al.: Modified earthworm optimization with deep learning assisted emotion recognition for human computer interface. IEEE Access 11, 35089–35096 (2023). https://ieeexplore.ieee.org/document/10091537/
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 7553 436–444 (2015). https://www.nature.com/articles/nature14539
Alrowais, F., et al.: Modified earthworm optimization with deep learning assisted emotion recognition for human computer interface. IEEE Access 11, 35089–35096 (2023)
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020). https://link.springer.com/article/10.1007/s10462-020-09825-6
Chul, B., Id, K.: A brief review of facial emotion recognition based on visual information. Sensors 18, 401 (2018). https://www.mdpi.com/1424-8220/18/2/401/html
Sleeman, W.C., Kapoor, R., Ghosh, P.: Multimodal classification: current landscape, taxonomy and future directions. ACM Comput. Surv. 55(7), 150:1–150:31 (2022). https://dl.acm.org/doi/10.1145/3543848
Fernandes, S., Gawas, R., Alvares, P., Femandes, M., Kale, D., Aswale, S.: Survey on various conversational systems. In: 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), pp. 1–8 (2020)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: efficient finetuning of quantized LLMs (2023). http://arxiv.org/abs/2305.14314
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models (2021). http://arxiv.org/abs/2106.09685
Casas, J., Spring, T., Daher, K., Mugellini, E., Khaled, O.A., Cudré-Mauroux, P.: enhancing conversational agents with empathic abilities. In: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, IVA 2021, pp. 41 – 47 (2021). iSBN: 9781450386197
Daher, K., Casas, J., Khaled, O.A., Mugellini, E.: Empathic chatbot response for medical assistance. In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020 (2020). iSBN: 9781450375863
Zhou, H., Huang, M., Zhang, T., Zhu, X., Liu, B.: Emotional chatting machine: emotional conversation generation with internal and external memory. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 730 – 738 (2018)
Denecke, K., Vaaheesan, S., Arulnathan, A.: A mental health chatbot for regulating emotions (sermo) - concept and usability test. IEEE Trans. Emerg. Topics Comput. 9, 1170–1182 (2021)
Hajek, P., Barushka, A., Munk, M.: Neural networks with emotion associations, topic modeling and supervised term weighting for sentiment analysis. Int. J. Neural Syst. 31(10), 2150013 (2021)
Kaur, H., Mangat, V. N.: A survey of sentiment analysis techniques. In: 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 921–925 (2017)
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022). https://doi.org/10.1007/s10462-022-10144-1
Hung, L.P., Alias, S.: Beyond sentiment analysis: a review of recent trends in text based sentiment analysis and emotion detection. J. Adv. Comput. Intell. Intell. Inf. 27(1), 84–95 (2023). https://www.fujipress.jp/jaciii/jc/jacii002700010084
Ayadi, M.E., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44, 572–587 (2011)
De Lope, J., Graña, M.: A hybrid time-distributed deep neural architecture for speech emotion recognition. Int. J. Neural Syst. 32(6), 2250024 (2022)
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service, PlatCon 2017 - Proceedings (2017)
Zaman, K., Zhaoyun, S., Shah, S.M., Shoaib, M., Lili, P., Hussain, A.: Driver emotions recognition based on improved faster r-cnn and neural architectural search network. Symmetry 14, 687 (2022)
Yao, H., Yang, X., Chen, D., Wang, Z., Tian, Y.: Facial expression recognition based on fine-tuned channel-spatial attention transformer. Sensors 23, 6799 (2023). https://www.mdpi.com/1424-8220/23/15/6799/htm
Bellamkonda, S., Gopalan, N.P.: An enhanced facial expression recognition model using local feature fusion of gabor wavelets and local directionality patterns. Int. J. Ambient Comput. Intell. 11, 48–70 (2020)
Mukhiddinov, M., Djuraev, O., Akhmedov, F., Mukhamadiyev, A., Cho, J.: Masked face emotion recognition based on facial landmarks and deep learning approaches for visually impaired people. Sensors 23, 1080 (2023). https://www.mdpi.com/1424-8220/23/3/1080/htm
Romeo, M., García, D.H., Han, T., Cangelosi, A., Jokinen, K.: Predicting apparent personality from body language: benchmarking deep learning architectures for adaptive social human-robot interaction. Adv. Robot. 35, 1167–1179 (2021)
Ilyas, C.M.A., Nunes, R., Nasrollahi, K., Rehm, M., Moeslund, T.B.: Deep emotion recognition through upper body movements and facial expression. In: VISIGRAPP 2021 - Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 5, pp. 669–679 (2021)
Ranganathan, H., Chakraborty, S., Panchanathan, S.: Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016 (2016)
Prakash, V.G., et al.: Computer vision-based assessment of autistic children: analyzing interactions, emotions, human pose, and life skills. IEEE Access 11, 47907–47929 (2023)
Zhu, L., Zhu, Z., Zhang, C., Xu, Y., Kong, X.: Multimodal sentiment analysis based on fusion methods: a survey. Inf. Fusion 95, 306–325 (2023). https://www.sciencedirect.com/science/article/pii/S156625352300074X
Rizou, S., Paflioti, A., Theofilatos, A., Vakali, A., Sarigiannidis, G., Chatzisavvas, K.C.: Multilingual name entity recognition and intent classification employing deep learning architectures. Simul. Model. Pract. Theory 120, 102620 (2022). https://www.sciencedirect.com/science/article/pii/S1569190X22000995
Li, C., et al.: Large language models understand and can be enhanced by emotional stimuli (2023). http://arxiv.org/abs/2307.11760
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: efficient finetuning of quantized llms (2023). https://arxiv.org/abs/2305.14314v1
Acknowledgments
This work was supported by national funds through the Portuguese Foundation for Science and Technology (FCT), I.P., under the project UIDB/04524/2020 and was partially supported by Portuguese National funds through FITEC-Programa Interface with reference CIT “INOV-INESC Inovação-Financiamento Base”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pereira, R., Mendes, C., Costa, N., Frazão, L., Fernández-Caballero, A., Pereira, A. (2024). Human-Computer Interaction Approach with Empathic Conversational Agent and Computer Vision. In: Ferrández Vicente, J.M., Val Calvo, M., Adeli, H. (eds) Artificial Intelligence for Neuroscience and Emotional Systems. IWINAC 2024. Lecture Notes in Computer Science, vol 14674. Springer, Cham. https://doi.org/10.1007/978-3-031-61140-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-61140-7_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61139-1
Online ISBN: 978-3-031-61140-7
eBook Packages: Computer ScienceComputer Science (R0)