Skip to main content
Log in

Affect-GCN: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Emotion classification along with sentimental analysis in dialogues is a complex task that has currently attained immense popularity. When communicating their thoughts and feelings, humans are prone to having many emotions of varying intensities. The task is complicated and fascinating since emotions in a dialogue utterance can be independent or based on the preceding utterances. Additional details such as audio and video, along with text facilitates in the identification of the right emotions with the corresponding intensity and appropriate sentiments in a dialogue. In this work, we focus on the task of predicting multiple emotions and their corresponding intensity along with sentiments in a given utterance of a dialogue. With the release of MEISD dataset, the task of simultaneously predicting the sentiments along with multiple emotions with intensity from a given utterance of a conversation utilizing the knowledge from textual, audio and visual cues has gained significance in conversational systems. We design an Affect-GCN framework that utilizes an RNN-GCN network as an utterance encoder followed by Multimodal Factorized Bilinear (MFB) pooling for enhance representation of different modalities. The proposed Affect-GCN framework shows an improvement of 0.7 in terms of Jaccard index for multi-label emotion classification while an increase of 0.3 for intensity prediction. Experimental analysis shows that our proposed Affect-GCN framework outperforms the existing approaches and several baselines for the task of multi-label emotion classification, intensity prediction and sentiment analysis in dialogues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://pytorch.org/

References

  1. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011), pp 30–38

  2. Akhtar S, Ghosal D, Ekbal A, Bhattacharyya P, Kurohashi S (2019) All-in-one: emotion, sentiment and intensity prediction using a multi-task ensemble framework. IEEE Transactions on Affective Computing

  3. Akhtar MS, Chauhan D, Ekbal A (2020) A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Trans Knowl Discov Data (TKDD) 14(3):1–27

    Article  Google Scholar 

  4. Akhtar MS, Chauhan D, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, minneapolis, MN, USA, June 2-7, 2019, volume 1 (Long and Short Papers), pp 370–379

  5. Asri LE, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Suleman K (2017) Frames: a corpus for adding memory to goal-oriented dialogue systems.. In: Proceedings of the 18th Annual SIGdial meeting on discourse and dialogue

  6. Chen SY, Hsu CC, Kuo CC, Ku LW (2018) Emotionlines: an emotion corpus of multi-party conversations. arXiv:1802.08379

  7. Chauhan D, Akhtar MS, Ekbal A, Bhattacharyya P (2019, November) Context-aware interactive attention for multi-modal sentiment and emotion analysis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5647–5657

  8. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078

  9. Colnerič N, Demšar J (2018) Emotion recognition on twitter: comparative study and training a unison model. IEEE Trans Affect Comput 11(3):433–446

    Article  Google Scholar 

  10. Das A, Kottur S, Gupta K, Singh A, Yadav D, Moura JM, Batra D (2017) Visual dialog. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 326–335

  11. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  12. Ekman P (1992) An argument for basic emotions. Cogn Emotion 6(3-4):169–200

    Article  Google Scholar 

  13. Firdaus M, Chauhan H, Ekbal A, Bhattacharyya P (2020) MEISD: a multimodal multi-label emotion, intensity and sentiment dialogue dataset for emotion recognition and sentiment analysis in conversations. In: Proceedings of the 28th international conference on computational linguistics, pp 4441–4453

  14. Fu Y, Okada S, Wang L, Guo L, Song Y, Liu J, Dang J (2021) CONSK-GCN: conversational semantic-and knowledge-oriented graph convolutional network for multimodal emotion recognition. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

  15. Gan Z, Cheng Y, Kholy AE, Li L, Liu J, Gao J (2019) Multi-step reasoning via recurrent dual attention for visual dialog. arXiv:1902.00579

  16. Gemmeke JF, Ellis DP, Freedman D, Jansen A, Lawrence W, Moore RC, Ritter M (2017) Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 776–780

  17. Ghosal D, Akhtar MS, Chauhan D, Poria S, Ekbal A, Bhattacharyya P (2018) Contextual inter-modal attention for multi-modal sentiment analysis. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3454–3466

  18. Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion identification in conversations. arXiv:2010.02795

  19. Ghosal D, Majumder N, Poria S, Chhaya N, Gelbukh A (2019) Dialoguegcn: a graph convolutional neural network for emotion recognition in conversation. arXiv:1908.11540

  20. Hazarika D, Poria S, Mihalcea R, Cambria E, Zimmermann R (2018) Icon: interactive conversational memory network for multimodal emotion detection. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2594–2604

  21. Hazarika D, Poria S, Zadeh A, Cambria E, Morency LP, Zimmermann R (2018) Conversational memory network for emotion recognition in dyadic dialogue videos

  22. He H, Xia R (2018) Joint binary neural network for multi-label learning with applications to emotion classification. In: CCF International conference on natural language processing and Chinese computing. Springer, Cham, pp 250–259

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, Wilson K (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (icassp). IEEE, pp 131–135

  25. Huang C, Trabelsi A, Qin X, Farruque N, Zaïane OR (2019) Seq2emo for multi-label emotion classification based on latent variable chains transformation. arXiv:1911.02147

  26. Huh JH, Seo YS (2019) Understanding edge computing: engineering evolution with artificial intelligence. IEEE Access 7:164229–164245

    Article  Google Scholar 

  27. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar. A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1746–1751

  28. Kim Y, Lee H, Jung K (2018) Attnconvnet at semeval-2018 task 1: attention-based convolutional neural networks for multi-label emotion classification. In: Proceedings of the 12th international workshop on semantic evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5–6, 2018, pp 141–145

  29. Kim J, Ko H, Song S, Jang S, Hong J (2020) Contextual augmentation of pretrained language models for emotion recognition in conversations. In: Proceedings of the third workshop on computational modeling of people’s opinions, personality, and emotion’s in social media, pp 64–73

  30. Kumar A, Ekbal A, Kawahra D, Kurohashi S (2019) Emotion helps sentiment: a multi-task model for sentiment and emotion analysis. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8

  31. Lee H, Park SH, Yoo JH, Jung SH, Huh JH (2020) Face recognition at a distance for a stand-alone access control system. Sensors 20(3):785

    Article  Google Scholar 

  32. Li Y, Wang N, Liu J, Hou X (2017) Factorized bilinear models for image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2079–2087

  33. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv:1711.05101

  34. Lu X, Zhao Y, Wu Y, Tian Y, Chen H, Qin B (2020) An iterative emotion interaction network for emotion recognition in conversations. In: Proceedings of the 28th international conference on computational linguistics, pp 4078–4088

  35. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc icml 30(1):3

    Google Scholar 

  36. Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E (2019) Dialoguernn: an attentive rnn for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 6818–6825

  37. Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S (2018) Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl-Based Syst 161:124–133

    Article  Google Scholar 

  38. Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S (2018, June) Semeval-2018 task 1: affect in tweets. In: Proceedings of the 12th international workshop on semantic evaluation, pp 1–17

  39. Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared task on emotion intensity. arXiv:1708.03700

  40. Munezero M, Montero CS, Sutinen E, Pajunen J (2014) Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans Affect Comput 5(2):101–111

    Article  Google Scholar 

  41. Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G (2018) Survey on emotional body gesture recognition. IEEE Transactions on Affective Computing

  42. Park SW, Ko JS, Huh JH, Kim JC (2021) Review on generative adversarial networks: focusing on computer vision and its applications. Electronics 10 (10):1216

    Article  Google Scholar 

  43. Plutchik R (1980) Plutchik’s wheel of emotions

  44. Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency LP (2017, July) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), pp 873–883

  45. Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: a multimodal multi-party dataset for emotion recognition in conversations. arXiv:1810.02508

  46. Rendle S (IEEE) Factorization machines. In: 2010 IEEE International conference on data mining, pp 995–1000

  47. Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: playing with a digital gesture recognizer for preparing pasta culinary recipes. Computers in Entertainment (CIE) 8(4):1–20

    Article  Google Scholar 

  48. Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132(3434):1115–1118

    Article  Google Scholar 

  49. Rudinac S, Gornishka I, Worring M (2017) Multimodal classification of violent online political extremism content with graph convolutional networks. In: Proceedings of the on thematic workshops of ACM multimedia, vol 2017, pp 245–252

  50. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  51. Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI conference on artificial intelligence, vol 31, p 1

  52. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336

    Article  MATH  Google Scholar 

  53. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

  54. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  55. Tripathi S, Tripathi S, Beigi H (2018) Multi-modal emotion recognition on iemocap dataset using deep learning. arXiv:1804.05788

  56. Wang Y, Qian S, Hu J, Fang Q, Xu C (2020) Fake news detection via knowledge-driven multimodal graph convolutional networks. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 540–547

  57. Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. arXiv:1806.04822

  58. Yeh SL, Lin YS, Lee CC (2019) An interaction-aware attention network for speech emotion recognition in spoken dialogs. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6685–6689

  59. Yoshino K, Hori C, Perez J, D’Haro LF, Polymenakos L, Gunasekara C, Batra D (2019) Dialog system technology challenge 7. arXiv:1901.03461

  60. Yu J, Marujo L, Jiang J, Karuturi P, Brendel W (2018) Improving multi-label emotion classification via sentiment classification with dual attention transfer network. ACL

  61. Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 1821–1830

  62. Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. arXiv:1909.10681

  63. Zhang Y, Li Q, Song D, Zhang P, Wang P (2019) Quantum-inspired interactive networks for conversational sentiment analysis

  64. Zhang D, Wu L, Sun C, Li S, Zhu Q, Zhou G (2019) Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations. In: IJCAI, pp 5415–5421

  65. Zhu W, Liu S, Liu C (2021) Incorporating syntactic and phonetic information into multimodal word embeddings using graph convolutional networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 7588–7592

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mauajama Firdaus.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mauajama Firdaus and Gopendra Vikram Singh contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Firdaus, M., Singh, G.V., Ekbal, A. et al. Affect-GCN: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues. Multimed Tools Appl 82, 43251–43272 (2023). https://doi.org/10.1007/s11042-023-14885-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14885-1

Keywords

Navigation