Abstract
Emotion recognition in conversation aims to identify the emotion of each consistent utterance in a conversation from several pre-defined emotions. The task has recently become a new popular research frontier in natural language processing because of the increase in open conversational data and its application in opinion mining. However, most existing methods for the task cannot capture the long-range contextual information in an utterance and a conversation effectively. To alleviate this problem, we propose a novel hierarchical attention network with residual gated recurrent unit framework. Firstly, we adopt the pre-trained BERT-Large model to obtain context-dependent representation for each token of each utterance in a conversation. Then, a hierarchical attention network is proposed to capture long-range contextual information about the conversation structure. Besides, in order to better model position information of the utterances in a conversation, we add position embedding to the input of the multi-head attention. Experiments on two textual dialogue emotion datasets demonstrate that our model significantly outperforms the state-of-the-art baseline methods.
Similar content being viewed by others
References
Ekman P (1992) Are there basic emotions? Psychol Rev 99(3):550–553
Strapparava C, Mihalcea R (2010) Annotating and identifying emotions in text. In: Armano G, de Gemmis M, Semeraro G, Vargiu E (eds) Intelligent information access. Springer, Berlin, pp 21–38
Kratzwald B, Ilić S, Kraus M, Feuerriegel S, Prendinger H (2018) Deep learning for affective computing: text-based emotion recognition in decision support. Decis Support Syst 115:24–35
Colneriĉ N, Demsar J (2018) Emotion recognition on Twitter: comparative study and training a unison model. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2018.2807817
Rozgić V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2012) Ensemble of SVM trees for multimodal emotion recognition. In: Proceedings of the 2012 Asia Pacific signal and information processing association annual summit and conference, pp 1–4
Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency LP (2017) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 873–883
Hazarika D, Poria S, Zadeh A, Cambria E, Morency LP, Zimmermann R (2018a) Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers), pp 2122–2132
Hazarika D, Poria S, Mihalcea R, Cambria E, Zimmermann R (2018b) ICON: interactive conversational memory network for multimodal emotion detection. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2594–2604
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
Jiao W, Yang H, King I, Lyu MR (2019) HiGRU: hierarchical gated recurrent units for utterance-level emotion recognition. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 397–406
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 4171–4186
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Plutchik R (1982) A psychoevolutionary theory of emotions. Soc Sci Inf 21(4–5):529–553
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(3):1161–1178
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292
Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, pp 579–586
Neviarouskaya A, Prendinger H, Ishizuka M (2007) Textual affect sensing for sociable and expressive online communication. In: Affective computing and intelligent interaction, pp 218–229
Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers. Eng Appl Artif Intell 51:191–201
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Ho DT, Cao TH (2012) A high-order hidden Markov model for emotion detection from textual data. In: Knowledge management and acquisition for intelligent systems, pp 94–105
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 151–161
Abdul-Mageed M, Ungar L (2017) EmoNet: fine-grained emotion detection with gated recurrent neural networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 718–728
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1577–1586
Vinyals O, Le QV (2015) A neural conversational model. arXiv:1506.05869
Gui L, Hu J, He Y, Xu R, Lu Q, Du J (2017) A question answering approach for emotion cause extraction. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1593–1602
Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. Adv Neural Inf Process Syst 29:289–297
See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1073–1083
Li H, Zhu J, Liu T, Zhang J, Zong C (2018) Multi-modal sentence summarization with modality attention and image filtering. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 4152–4158
Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 214–224
Ambartsoumian A, Popowich F (2018) Self-attention: a better building block for sentiment analysis neural network classifiers. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 130–139
Du J, Xu R, He Y, Gui L (2017) Stance classification with target-specific neural attention. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI), pp 3988–3994
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Thirty-second AAAI conference on artificial intelligence
Wu Y, Mao H, Yi Z (2018) Audio classification using attention-augmented convolutional neural network. Knowl Based Syst 161:90–100
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. Adv Neural Inf Process Syst 28:2440–2448
Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: Proceedings of the 33rd international conference on machine learning, pp 1378–1387
Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: AAAI conference on artificial intelligence
Tao C, Gao S, Shang M, Wu W, Zhao D, Yan R (2018) Get the point of my utterance! learning towards effective responses with multi-head attention mechanism. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 4418–4424
Hsu CC, Chen SY, Kuo CC, Huang TH, Ku LW (2018) EmotionLines: an emotion corpus of multi-party conversations. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC)
Hsu CC, Ku LW (2018) SocialNLP 2018 EmotionX challenge overview: recognizing emotions in dialogues. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 27–31
Baziotis C, Pelekis N, Doulkeridis C (2017) DataStories at SemEval-2017 task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval), pp 747–754
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144
Luo L, Yang H, Chin FYL (2018) EmotionX-DLC: self-attentive BiLSTM for detecting sequential emotions in dialogues. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 32–36
Khosla S (2018) EmotionX-AR: CNN-DCNN autoencoder based emotion classifier. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 37–44
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Third international conference on learning representations
Quan C, Ren F (2010) A blog emotion corpus for emotional expression analysis in Chinese. Comput Speech Lang 24(4):726–749
Acknowledgements
This work is partially supported by the National Key Research and Development Program of China (No. 2018YFC0830603), and the Natural Science Foundation of China (No. 61632011).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ma, H., Wang, J., Qian, L. et al. HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation. Neural Comput & Applic 33, 2685–2703 (2021). https://doi.org/10.1007/s00521-020-05063-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05063-7