Elsevier

Knowledge-Based Systems

Volume 189, 15 February 2020, 105084
Knowledge-Based Systems

Interactive double states emotion cell model for textual dialogue emotion prediction

https://doi.org/10.1016/j.knosys.2019.105084Get rights and content

Abstract

Daily dialogues are full of emotions that control the trends of dialogues and influence the attitudes of interlocutors toward each other, and understanding the human emotions in dialogues is of great significance in emotional comfort, human–computer interaction and intelligent question-answering. This paper defines a new task called emotion prediction in textual dialogue. Different from the text emotion recognition task, which derives the current emotional state of interlocutor from the utterance, emotion prediction aims at predicting the future emotional state of interlocutor before the interlocutor utters something. Moreover, this paper summarizes and explains three notable characteristics of emotional propagation in text dialogue: context dependence, persistence and contagiousness. By considering these characteristics, a fully data-driven interactive double states emotion cell model (IDS-ECM) is proposed. The model has two layers. The first layer automatically extracts the emotional information of historical dialogue and is used to describe the contextual dependence of the textual dialogue emotion. The second layer models the change process of interlocutors’ emotional states during the dialogue and depicts the persistence and contagiousness of emotions. Experimental results on two manually annotated datasets show that the proposed model is superior to the baseline in the macro-averaged F1 evaluation metric and that the proposed model can simulate the emotional changes in the process of dialogue so as to predict the emotions with high accuracy. The experimental results also reveal the communication differences between different emotional categories in dialogue, which is of guiding significance for future research.

Introduction

Emotion plays an important role in dialogue systems by controlling the trends of dialogues, guiding the decision-making of interlocutors, and influencing our attitudes towards each other. It was shown in [1] that the average number of turns in an emotional dialogue (8.7) is longer than that in a non-emotional dialogue (7.0), suggesting that emotions contribute to the continuity of dialogue. Therefore, understanding the human emotions in dialogues is of great significance. Dialogue emotion analysis can also guide human–human interactions [2], such as daily chats and communications between users and customer service, doctors and patients, etc. For example, in the communications between psychologists and patients with depression, understanding the emotions of depressive patients is the key to giving patients emotional comfort and treating them. Dialogue emotion analysis is also the essential factor for a human–computer interaction system to pass the Turing test [2], [3], since emotions can personify robots and enhance the user experience in human–machine dialogue systems, intelligent question answering systems, personal assistant robots, etc.

Most of the existing researches [4], [5], [6], [7], [8], [9], [10] on dialogue emotion analysis recognize emotions through tonal features and facial expressions in video, audio and multi-modal datasets. In many practical applications, however, we express emotions mainly through text, such as social media, instant messaging, on-line customer service, e-mail, etc., where audio and video dialogue datasets are usually difficult to obtain. In addition, the Turing test [3] is designed to communicate only via text and sensory expression (e.g., voice intonations and facial expressions) does not play a role [2]. Thus, due to the lack of audio and visual features, it is important to fully mine the effective information in textual dialogues for emotion analysis.

Recently, great progress has been made in text sentiment and emotion analysis [11], [12], [13], [14], [15], [16]. However, it is not appropriate to apply text emotion analysis to textual dialogue emotion analysis, because the emotion analysis of textual dialogue has its own distinctive characteristics compared with general text emotion analysis:

  • (1)

    Context dependence. The emotion analysis of textual dialogue relies on contexts, whereas the general text emotional analysis mainly focuses on a single sentence. As the example in Table 1 shows, for those marked utterances such as ‘I am going to invite other guys. See you that day.’, the emotion in the sentence is not obvious, but it may carry emotion in a specific context.

  • (2)

    Persistence. The emotional states in the textual dialogue are continuous and may be consistently affected by their own moods, whereas the emotional state in general text emotion analysis is independent. We call it emotional persistence. As the example in Table 2 shows, A and B maintain their emotions in the process of dialogue without being influenced by each other. In the example in Table 3, although the emotions of A are not shown in the middle of the dialogue, they are kept, as verified by the subsequent emotions. These examples show that the emotional state of the interlocutor is consistent.

  • (3)

    Contagiousness. Textual dialogue emotion analysis assesses the emotions of two or more interlocutors in a dialogue, whereas general text emotion analysis usually involves only one interlocutor. In the textual dialogue, the emotional state of a interlocutor is interactive, influential, and contagious to others. We call it the emotional contagiousness. As seen from the example in Table 4, A is not emotional at the beginning of the conversation, but was subsequently infected by the ‘happiness’ of B.

In addition, traditional text dialogue emotion recognition task derives the current mth emotional state Em of an interlocutor from the utterance Um by modelling the probability P(Em|Um). For some practical applications, it is also important to predict the trend of the interlocutor’s emotional changes. Therefore, in this paper, we define a new task of emotion analysis in text dialogue: dialogue emotion prediction. Different from dialogue emotion recognition, the dialogue emotion prediction task models the probability P(Em+1|Um), predicting the future emotional state Em+1 of an interlocutor at time m+1 through the existing dialogue history Um without knowing any future utterance information Um+1 of the interlocutor. The example in Fig. 1 shows the difference between emotion recognition and emotion prediction. The dialogue prediction may help to understand the emotional trends of users or patients in real time, so as to better guide customer services and doctors to make optimal decisions in advance.

By taking into account the characteristics of textual dialogue emotion analysis, this paper proposes a fully data-driven interactive double states emotion cell model (IDS-ECM) for emotion prediction. The IDS-ECM is composed of two layers. The first layer, called the emotion feature extraction layer, depicts the contextual dependence of the textual dialogue emotion and is used to automatically extract the emotional information from historical dialogues. The second layer, called the emotion propagation layer, retains the historical emotional states and enhances the persistence of the emotion. The IDS-ECM embodies the characteristics of textual dialogue emotion analysis and is more suitable for emotion prediction tasks.

The main contributions of this paper can be summarized as follows:

  • Three characteristics of textual dialogue are summarized: context dependence, persistence and contagiousness;

  • A new dialogue emotion analysis task is proposed: dialogue emotion prediction;

  • A data-driven model (IDS-ECM) is proposed;

  • Experiments on two datasets show that IDS-ECM can correctly predict the emotions in dialogues and reveal the communication difference between different emotion categories in dialogue.

The rest of the paper is organized as follows. Section 2 presents some related works. We introduce and verify the proposed model IDS-ECM in Sections 3 IDS-ECM: interactive double states emotion cell model, 4 Experiments respectively. Section 5 presents some future plans.

Section snippets

Related work

Emotion is a mental state that arises spontaneously rather than through a conscious effort and that is often accompanied by physiological changes [19]. Psychologists classify different emotions into different categories according to different theories, as shown in Table 5. For example, Ekman et al. [17] divides emotions into six basic emotions based on general facial expressions, whereas Plutchik et al. [18] divides emotions into eight basic emotions according to their relationship with

IDS-ECM: interactive double states emotion cell model

In this section, we introduce the IDS-ECM model for textual dialogue emotion prediction.

Experiments

In this section, we evaluate our proposed model on two datasets.

Conclusions and future work

The research in this paper is a new attempt in the emotion analysis of textual dialogue. From the practical view, a new dialogue emotion analysis task was proposed: dialogue emotion prediction. We compared the differences between the textual dialogue emotion analysis and general textual emotion analysis, and summarized the characteristics of textual dialogue emotion analysis as context dependence, contagiousness and persistence. Then, we designed the IDS-ECM according to the defined task and

Acknowledgements

The authors would like to thank all anonymous reviewers and Dr. Yanhui Zhai for their valuable comments and suggestions which have significantly improved the quality and presentation of this paper. The works described in this paper are supported by the National Natural Science Foundation of China (NSFC nos. 61632011, 61573231, 61672331, 61432011, 61603229).

References (40)

  • R.W. Picard, Affective computing-MIT media laboratory perceptual computing section technical report no. 321, Cambridge,...
  • TuringA.M.

    I.—Computing machinery and intelligence

    Mind

    (1950)
  • L. Tian, J. Moore, C. Lai, Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical...
  • BerteroD. et al.

    Real-time speech emotion and sentiment recognition for interactive dialogue systems

  • HazarikaD. et al.

    Conversational memory network for emotion recognition in dyadic dialogue videos

  • ShiW. et al.

    Sentiment adaptive end-to-end dialog systems

  • LeeC.M. et al.

    Toward detecting emotions in spoken dialogs

    IEEE Trans. Speech Audio Process.

    (2005)
  • FelboB. et al.

    Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

  • StrapparavaC. et al.

    Semeval-2007 task 14: Affective text

  • MohammadS. et al.

    Semeval-2018 task 1: Affect in tweets

  • Cited by (28)

    • A survey on deep learning for textual emotion analysis in social networks

      2022, Digital Communications and Networks
      Citation Excerpt :

      The lower-level Bi-GRU was used to learn the individual utterance embedding and the upper-level Bi-GRU was used to learn the contextual utterance embedding. Li et al. [106] proposed a fully data-driven Interactive Double States Emotion Cell Model (IDS-ECM) for textual dialogue emotion prediction. In the model, the Bi-LSTM and attention mechanism were used to extract the emotion features.

    • Enhancing emotion inference in conversations with commonsense knowledge

      2021, Knowledge-Based Systems
      Citation Excerpt :

      As the example in Fig. 1 shows, commonsense knowledge provides evidence in supporting the inference of emotion and action of the interlocutors. However, this is a significant challenge for machines if such knowledge is missing or not considered [14]. To this end, we propose to leverage the conversation-related inferential commonsense knowledge generated from the event-based knowledge graph COMET [12] to enhance the emotion inference ability.

    • A survey on empathetic dialogue systems

      2020, Information Fusion
      Citation Excerpt :

      If emotion label of context or user input is available, emotion-awareness can be easily achieved by feeding emotion labels as additional features to the encoder [45,50,52]. In a more modularized framework, the emotion-aware encoder might also be viewed containing a dialogue management module where the dialogue actions based on emotional states as well as other internal states of a user is modeled as partial observable Markov decision process (POMDP) [28,29,74]. On the other hand, it makes sense that emotion labels might be absent from the testing phase.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2019.105084.

    View full text