Skip to main content

Advertisement

Log in

HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Emotion recognition in conversation aims to identify the emotion of each consistent utterance in a conversation from several pre-defined emotions. The task has recently become a new popular research frontier in natural language processing because of the increase in open conversational data and its application in opinion mining. However, most existing methods for the task cannot capture the long-range contextual information in an utterance and a conversation effectively. To alleviate this problem, we propose a novel hierarchical attention network with residual gated recurrent unit framework. Firstly, we adopt the pre-trained BERT-Large model to obtain context-dependent representation for each token of each utterance in a conversation. Then, a hierarchical attention network is proposed to capture long-range contextual information about the conversation structure. Besides, in order to better model position information of the utterances in a conversation, we add position embedding to the input of the multi-head attention. Experiments on two textual dialogue emotion datasets demonstrate that our model significantly outperforms the state-of-the-art baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://code.google.com/archive/p/word2vec/.

  2. https://github.com/hanxiao/bert-as-service.

  3. http://doraemon.iis.sinica.edu.tw/emotionlines.

  4. http://doraemon.iis.sinica.edu.tw/emotionlines.

  5. https://pypi.org/project/emoji/.

  6. https://nlp.stanford.edu/projects/glove/.

References

  1. Ekman P (1992) Are there basic emotions? Psychol Rev 99(3):550–553

    Article  Google Scholar 

  2. Strapparava C, Mihalcea R (2010) Annotating and identifying emotions in text. In: Armano G, de Gemmis M, Semeraro G, Vargiu E (eds) Intelligent information access. Springer, Berlin, pp 21–38

    Chapter  Google Scholar 

  3. Kratzwald B, Ilić S, Kraus M, Feuerriegel S, Prendinger H (2018) Deep learning for affective computing: text-based emotion recognition in decision support. Decis Support Syst 115:24–35

    Article  Google Scholar 

  4. Colneriĉ N, Demsar J (2018) Emotion recognition on Twitter: comparative study and training a unison model. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2018.2807817

    Article  Google Scholar 

  5. Rozgić V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2012) Ensemble of SVM trees for multimodal emotion recognition. In: Proceedings of the 2012 Asia Pacific signal and information processing association annual summit and conference, pp 1–4

  6. Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency LP (2017) Context-dependent sentiment analysis in user-generated videos. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 873–883

  7. Hazarika D, Poria S, Zadeh A, Cambria E, Morency LP, Zimmermann R (2018a) Conversational memory network for emotion recognition in dyadic dialogue videos. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long papers), pp 2122–2132

  8. Hazarika D, Poria S, Mihalcea R, Cambria E, Zimmermann R (2018b) ICON: interactive conversational memory network for multimodal emotion detection. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2594–2604

  9. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751

  10. Jiao W, Yang H, King I, Lyu MR (2019) HiGRU: hierarchical gated recurrent units for utterance-level emotion recognition. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 397–406

  11. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning

  12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  13. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp 4171–4186

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  15. Plutchik R (1982) A psychoevolutionary theory of emotions. Soc Sci Inf 21(4–5):529–553

    Article  Google Scholar 

  16. Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(3):1161–1178

    Article  Google Scholar 

  17. Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292

    Article  MathSciNet  Google Scholar 

  18. Alm CO, Roth D, Sproat R (2005) Emotions from text: machine learning for text-based emotion prediction. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing, pp 579–586

  19. Neviarouskaya A, Prendinger H, Ishizuka M (2007) Textual affect sensing for sociable and expressive online communication. In: Affective computing and intelligent interaction, pp 218–229

  20. Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers. Eng Appl Artif Intell 51:191–201

    Article  Google Scholar 

  21. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113

    Article  Google Scholar 

  22. Ho DT, Cao TH (2012) A high-order hidden Markov model for emotion detection from textual data. In: Knowledge management and acquisition for intelligent systems, pp 94–105

  23. Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 151–161

  24. Abdul-Mageed M, Ungar L (2017) EmoNet: fine-grained emotion detection with gated recurrent neural networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 718–728

  25. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  26. Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), pp 1577–1586

  27. Vinyals O, Le QV (2015) A neural conversational model. arXiv:1506.05869

  28. Gui L, Hu J, He Y, Xu R, Lu Q, Du J (2017) A question answering approach for emotion cause extraction. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1593–1602

  29. Lu J, Yang J, Batra D, Parikh D (2016) Hierarchical question-image co-attention for visual question answering. Adv Neural Inf Process Syst 29:289–297

    Google Scholar 

  30. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1073–1083

  31. Li H, Zhu J, Liu T, Zhang J, Zong C (2018) Multi-modal sentence summarization with modality attention and image filtering. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 4152–4158

  32. Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 214–224

  33. Ambartsoumian A, Popowich F (2018) Self-attention: a better building block for sentiment analysis neural network classifiers. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 130–139

  34. Du J, Xu R, He Y, Gui L (2017) Stance classification with target-specific neural attention. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence (IJCAI), pp 3988–3994

  35. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Thirty-second AAAI conference on artificial intelligence

  36. Wu Y, Mao H, Yi Z (2018) Audio classification using attention-augmented convolutional neural network. Knowl Based Syst 161:90–100

    Article  Google Scholar 

  37. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1480–1489

  38. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations

  39. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  40. Sukhbaatar S, Szlam A, Weston J, Fergus R (2015) End-to-end memory networks. Adv Neural Inf Process Syst 28:2440–2448

    Google Scholar 

  41. Kumar A, Irsoy O, Ondruska P, Iyyer M, Bradbury J, Gulrajani I, Zhong V, Paulus R, Socher R (2016) Ask me anything: dynamic memory networks for natural language processing. In: Proceedings of the 33rd international conference on machine learning, pp 1378–1387

  42. Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: AAAI conference on artificial intelligence

  43. Tao C, Gao S, Shang M, Wu W, Zhao D, Yan R (2018) Get the point of my utterance! learning towards effective responses with multi-head attention mechanism. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 4418–4424

  44. Hsu CC, Chen SY, Kuo CC, Huang TH, Ku LW (2018) EmotionLines: an emotion corpus of multi-party conversations. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC)

  45. Hsu CC, Ku LW (2018) SocialNLP 2018 EmotionX challenge overview: recognizing emotions in dialogues. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 27–31

  46. Baziotis C, Pelekis N, Doulkeridis C (2017) DataStories at SemEval-2017 task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval), pp 747–754

  47. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144

  48. Luo L, Yang H, Chin FYL (2018) EmotionX-DLC: self-attentive BiLSTM for detecting sequential emotions in dialogues. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 32–36

  49. Khosla S (2018) EmotionX-AR: CNN-DCNN autoencoder based emotion classifier. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 37–44

  50. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Third international conference on learning representations

  51. Quan C, Ren F (2010) A blog emotion corpus for emotional expression analysis in Chinese. Comput Speech Lang 24(4):726–749

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Key Research and Development Program of China (No. 2018YFC0830603), and the Natural Science Foundation of China (No. 61632011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Wang, J., Qian, L. et al. HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation. Neural Comput & Applic 33, 2685–2703 (2021). https://doi.org/10.1007/s00521-020-05063-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05063-7

Keywords

Navigation