Abstract
Emotion classification along with sentimental analysis in dialogues is a complex task that has currently attained immense popularity. When communicating their thoughts and feelings, humans are prone to having many emotions of varying intensities. The task is complicated and fascinating since emotions in a dialogue utterance can be independent or based on the preceding utterances. Additional details such as audio and video, along with text facilitates in the identification of the right emotions with the corresponding intensity and appropriate sentiments in a dialogue. In this work, we focus on the task of predicting multiple emotions and their corresponding intensity along with sentiments in a given utterance of a dialogue. With the release of MEISD dataset, the task of simultaneously predicting the sentiments along with multiple emotions with intensity from a given utterance of a conversation utilizing the knowledge from textual, audio and visual cues has gained significance in conversational systems. We design an Affect-GCN framework that utilizes an RNN-GCN network as an utterance encoder followed by Multimodal Factorized Bilinear (MFB) pooling for enhance representation of different modalities. The proposed Affect-GCN framework shows an improvement of 0.7 in terms of Jaccard index for multi-label emotion classification while an increase of 0.3 for intensity prediction. Experimental analysis shows that our proposed Affect-GCN framework outperforms the existing approaches and several baselines for the task of multi-label emotion classification, intensity prediction and sentiment analysis in dialogues.
Similar content being viewed by others
Notes
References
Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on language in social media (LSM 2011), pp 30–38
Akhtar S, Ghosal D, Ekbal A, Bhattacharyya P, Kurohashi S (2019) All-in-one: emotion, sentiment and intensity prediction using a multi-task ensemble framework. IEEE Transactions on Affective Computing
Akhtar MS, Chauhan D, Ekbal A (2020) A deep multi-task contextual attention framework for multi-modal affect analysis. ACM Trans Knowl Discov Data (TKDD) 14(3):1–27
Akhtar MS, Chauhan D, Ghosal D, Poria S, Ekbal A, Bhattacharyya P (2019) Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, minneapolis, MN, USA, June 2-7, 2019, volume 1 (Long and Short Papers), pp 370–379
Asri LE, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Suleman K (2017) Frames: a corpus for adding memory to goal-oriented dialogue systems.. In: Proceedings of the 18th Annual SIGdial meeting on discourse and dialogue
Chen SY, Hsu CC, Kuo CC, Ku LW (2018) Emotionlines: an emotion corpus of multi-party conversations. arXiv:1802.08379
Chauhan D, Akhtar MS, Ekbal A, Bhattacharyya P (2019, November) Context-aware interactive attention for multi-modal sentiment and emotion analysis. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 5647–5657
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078
Colnerič N, Demšar J (2018) Emotion recognition on twitter: comparative study and training a unison model. IEEE Trans Affect Comput 11(3):433–446
Das A, Kottur S, Gupta K, Singh A, Yadav D, Moura JM, Batra D (2017) Visual dialog. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 326–335
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Ekman P (1992) An argument for basic emotions. Cogn Emotion 6(3-4):169–200
Firdaus M, Chauhan H, Ekbal A, Bhattacharyya P (2020) MEISD: a multimodal multi-label emotion, intensity and sentiment dialogue dataset for emotion recognition and sentiment analysis in conversations. In: Proceedings of the 28th international conference on computational linguistics, pp 4441–4453
Fu Y, Okada S, Wang L, Guo L, Song Y, Liu J, Dang J (2021) CONSK-GCN: conversational semantic-and knowledge-oriented graph convolutional network for multimodal emotion recognition. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6
Gan Z, Cheng Y, Kholy AE, Li L, Liu J, Gao J (2019) Multi-step reasoning via recurrent dual attention for visual dialog. arXiv:1902.00579
Gemmeke JF, Ellis DP, Freedman D, Jansen A, Lawrence W, Moore RC, Ritter M (2017) Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 776–780
Ghosal D, Akhtar MS, Chauhan D, Poria S, Ekbal A, Bhattacharyya P (2018) Contextual inter-modal attention for multi-modal sentiment analysis. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3454–3466
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion identification in conversations. arXiv:2010.02795
Ghosal D, Majumder N, Poria S, Chhaya N, Gelbukh A (2019) Dialoguegcn: a graph convolutional neural network for emotion recognition in conversation. arXiv:1908.11540
Hazarika D, Poria S, Mihalcea R, Cambria E, Zimmermann R (2018) Icon: interactive conversational memory network for multimodal emotion detection. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2594–2604
Hazarika D, Poria S, Zadeh A, Cambria E, Morency LP, Zimmermann R (2018) Conversational memory network for emotion recognition in dyadic dialogue videos
He H, Xia R (2018) Joint binary neural network for multi-label learning with applications to emotion classification. In: CCF International conference on natural language processing and Chinese computing. Springer, Cham, pp 250–259
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, Wilson K (2017) CNN architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (icassp). IEEE, pp 131–135
Huang C, Trabelsi A, Qin X, Farruque N, Zaïane OR (2019) Seq2emo for multi-label emotion classification based on latent variable chains transformation. arXiv:1911.02147
Huh JH, Seo YS (2019) Understanding edge computing: engineering evolution with artificial intelligence. IEEE Access 7:164229–164245
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar. A meeting of SIGDAT, a Special Interest Group of the ACL, pp 1746–1751
Kim Y, Lee H, Jung K (2018) Attnconvnet at semeval-2018 task 1: attention-based convolutional neural networks for multi-label emotion classification. In: Proceedings of the 12th international workshop on semantic evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5–6, 2018, pp 141–145
Kim J, Ko H, Song S, Jang S, Hong J (2020) Contextual augmentation of pretrained language models for emotion recognition in conversations. In: Proceedings of the third workshop on computational modeling of people’s opinions, personality, and emotion’s in social media, pp 64–73
Kumar A, Ekbal A, Kawahra D, Kurohashi S (2019) Emotion helps sentiment: a multi-task model for sentiment and emotion analysis. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8
Lee H, Park SH, Yoo JH, Jung SH, Huh JH (2020) Face recognition at a distance for a stand-alone access control system. Sensors 20(3):785
Li Y, Wang N, Liu J, Hou X (2017) Factorized bilinear models for image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2079–2087
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv:1711.05101
Lu X, Zhao Y, Wu Y, Tian Y, Chen H, Qin B (2020) An iterative emotion interaction network for emotion recognition in conversations. In: Proceedings of the 28th international conference on computational linguistics, pp 4078–4088
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc icml 30(1):3
Majumder N, Poria S, Hazarika D, Mihalcea R, Gelbukh A, Cambria E (2019) Dialoguernn: an attentive rnn for emotion detection in conversations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 6818–6825
Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S (2018) Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowl-Based Syst 161:124–133
Mohammad S, Bravo-Marquez F, Salameh M, Kiritchenko S (2018, June) Semeval-2018 task 1: affect in tweets. In: Proceedings of the 12th international workshop on semantic evaluation, pp 1–17
Mohammad SM, Bravo-Marquez F (2017) WASSA-2017 shared task on emotion intensity. arXiv:1708.03700
Munezero M, Montero CS, Sutinen E, Pajunen J (2014) Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans Affect Comput 5(2):101–111
Noroozi F, Kaminska D, Corneanu C, Sapinski T, Escalera S, Anbarjafari G (2018) Survey on emotional body gesture recognition. IEEE Transactions on Affective Computing
Park SW, Ko JS, Huh JH, Kim JC (2021) Review on generative adversarial networks: focusing on computer vision and its applications. Electronics 10 (10):1216
Plutchik R (1980) Plutchik’s wheel of emotions
Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency LP (2017, July) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), pp 873–883
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: a multimodal multi-party dataset for emotion recognition in conversations. arXiv:1810.02508
Rendle S (IEEE) Factorization machines. In: 2010 IEEE International conference on data mining, pp 995–1000
Roccetti M, Marfia G, Zanichelli M (2010) The art and craft of making the tortellino: playing with a digital gesture recognizer for preparing pasta culinary recipes. Computers in Entertainment (CIE) 8(4):1–20
Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132(3434):1115–1118
Rudinac S, Gornishka I, Worring M (2017) Multimodal classification of violent online political extremism content with graph convolutional networks. In: Proceedings of the on thematic workshops of ACM multimedia, vol 2017, pp 245–252
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Serban I, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the AAAI conference on artificial intelligence, vol 31, p 1
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tripathi S, Tripathi S, Beigi H (2018) Multi-modal emotion recognition on iemocap dataset using deep learning. arXiv:1804.05788
Wang Y, Qian S, Hu J, Fang Q, Xu C (2020) Fake news detection via knowledge-driven multimodal graph convolutional networks. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 540–547
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. arXiv:1806.04822
Yeh SL, Lin YS, Lee CC (2019) An interaction-aware attention network for speech emotion recognition in spoken dialogs. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6685–6689
Yoshino K, Hori C, Perez J, D’Haro LF, Polymenakos L, Gunasekara C, Batra D (2019) Dialog system technology challenge 7. arXiv:1901.03461
Yu J, Marujo L, Jiang J, Karuturi P, Brendel W (2018) Improving multi-label emotion classification via sentiment classification with dual attention transfer network. ACL
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 1821–1830
Zhong P, Wang D, Miao C (2019) Knowledge-enriched transformer for emotion detection in textual conversations. arXiv:1909.10681
Zhang Y, Li Q, Song D, Zhang P, Wang P (2019) Quantum-inspired interactive networks for conversational sentiment analysis
Zhang D, Wu L, Sun C, Li S, Zhu Q, Zhou G (2019) Modeling both context-and speaker-sensitive dependence for emotion detection in multi-speaker conversations. In: IJCAI, pp 5415–5421
Zhu W, Liu S, Liu C (2021) Incorporating syntactic and phonetic information into multimodal word embeddings using graph convolutional networks. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 7588–7592
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Mauajama Firdaus and Gopendra Vikram Singh contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Firdaus, M., Singh, G.V., Ekbal, A. et al. Affect-GCN: a multimodal graph convolutional network for multi-emotion with intensity recognition and sentiment analysis in dialogues. Multimed Tools Appl 82, 43251–43272 (2023). https://doi.org/10.1007/s11042-023-14885-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14885-1