DGNN: Dependency Graph Neural Network for Multimodal Emotion Recognition in Conversation

Zhang, Zhen; Wang, Xin; Yuan, Lifeng; Miao, Gongxun; Liu, Mengqiu; Yun, Wenhao; Wu, Guohua

doi:10.1007/978-981-99-8138-0_8

Zhen Zhang¹⁰,
Xin Wang¹⁰,
Lifeng Yuan¹⁰,
Gongxun Miao¹⁰,
Mengqiu Liu¹⁰,
Wenhao Yun¹⁰ &
…
Guohua Wu^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1963))

Included in the following conference series:

International Conference on Neural Information Processing

379 Accesses

Abstract

For emotion recognition in conversation (ERC), the modeling of conversational dependency plays a crucial role. Existing methods often directly connect multimodal information and then build a graph neural network based on a fixed number of past and future utterances. The former leads to the lack of interaction between modalities, and the latter is less consistent with the logic of the conversation. Therefore, in order to better build conversational dependency, we propose a Dependency Graph Neural Network (DGNN) for ERC. First, we present a cross-modal fusion transformer for modeling dependency between different modalities of the same utterance. Then, we design a directed graph neural network model based on the adaptive window for modeling dependency between different utterances. The results of the extensive experiments on two benchmark datasets demonstrate the superiority of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhavya, S., Nayak, D.S., Dmello, R.C., Nayak, A., Bangera, S.S.: Machine learning applied to speech emotion analysis for depression recognition. In: 2023 International Conference for Advancement in Technology (ICONAT), pp. 1–5 (2023)
Google Scholar
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335–359 (2008)
Article Google Scholar
Cevallos, M., De Biase, M., Vocaturo, E., Zumpano, E.: Fake news detection on COVID 19 tweets via supervised learning approach. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2765–2772 (2022)
Google Scholar
Deng, J., Ren, F.: A survey of textual emotion recognition and its challenges. IEEE Trans. Affect. Comput. 14(1), 49–67 (2021)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010)
Google Scholar
Gao, P., Han, D., Zhou, R., Zhang, X., Wang, Z.: CAB: empathetic dialogue generation with cognition, affection and behavior. In: Database Systems for Advanced Applications: 28th International Conference, pp. 597–606 (2023)
Google Scholar
Ghosal, D., Majumder, N., Gelbukh, A., Mihalcea, R., Poria, S.: COSMIC: COmmonSense knowledge for emotion identification in conversations. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2470–2481 (2020)
Google Scholar
Ghosal, D., Majumder, N., Poria, S., Chhaya, N., Gelbukh, A.: DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 154–164 (2019)
Google Scholar
Ghosal, S., Jain, A.: HateCircle and unsupervised hate speech detection incorporating emotion and contextual semantic. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 2375–4699 (2022)
Google Scholar
Hazarika, D., Poria, S., Zadeh, A., Cambria, E., Morency, L.P., Zimmermann, R.: Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the 2018 conference of the Association for Computational Linguistics. vol. 2018, pp. 2122–2132 (2018)
Google Scholar
Hu, D., Hou, X., Wei, L., Jiang, L., Mo, Y.: MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7037–7041 (2022)
Google Scholar
Hu, D., Wei, L., Huai, X.: DialogueCRN: contextual reasoning networks for emotion recognition in conversations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pp. 2470–2481 (2021)
Google Scholar
Hu, J., Liu, Y., Zhao, J., Jin, Q.: MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pp. 5666–5675 (2021)
Google Scholar
Ishiwatari, T., Yasuda, Y., Miyazaki, T., Goto, J.: Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7360–7370 (2020)
Google Scholar
Joshi, A., Bhat, A., Jain, A., Singh, A., Modi, A.: COGMEN: COntextualized GNN based multimodal emotion recognitioN. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4148–4164 (2022)
Google Scholar
Li, J., Wang, X., Lv, G., Zeng, Z.: GraphCFC: A directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition. IEEE Transactions on Multimedia (2023)
Google Scholar
Li, W., Shao, W., Ji, S., Cambria, E.: BiERU: bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467(7), 73–82 (2022)
Article Google Scholar
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A., Cambria, E.: DialogueRNN: an attentive RNN for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33, pp. 6818–6825 (2019)
Google Scholar
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: a multimodal multi-party dataset for emotion recognition in conversations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 527–536 (2019)
Google Scholar
Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, pp. 593–607 (2018)
Google Scholar
Shen, W., Wu, S., Yang, Y., Quan, X.: Directed acyclic graph network for conversational emotion recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1551–1560 (2021)
Google Scholar

Download references

Acknowledgements

This research was supported by “Pioneer” and “Leading Goose” R &D Program of Zhejiang (Grant No. 2023C03203, 2023C03180, 2022C03174).

Author information

Authors and Affiliations

School of Cyberspace Security, Hangzhou Dianzi University, Hangzhou, 310018, China
Zhen Zhang, Xin Wang, Lifeng Yuan, Gongxun Miao, Mengqiu Liu, Wenhao Yun & Guohua Wu
Data Security Governance Zhejiang Engineering Research Center, Hangzhou Dianzi University, Hangzhou, 310018, China
Guohua Wu

Authors

Zhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Gongxun Miao
View author publications
You can also search for this author in PubMed Google Scholar
Mengqiu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Yun
View author publications
You can also search for this author in PubMed Google Scholar
Guohua Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lifeng Yuan .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z. et al. (2024). DGNN: Dependency Graph Neural Network for Multimodal Emotion Recognition in Conversation. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1963. Springer, Singapore. https://doi.org/10.1007/978-981-99-8138-0_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-8138-0_8
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8137-3
Online ISBN: 978-981-99-8138-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DGNN: Dependency Graph Neural Network for Multimodal Emotion Recognition in Conversation