Elsevier

Information Fusion

Volume 64, December 2020, Pages 50-70
Information Fusion

A survey on empathetic dialogue systems

https://doi.org/10.1016/j.inffus.2020.06.011Get rights and content

Highlights

  • Identified 3 key features of empathetic dialog systems.

  • Discussed emotion-, personality-awareness and knowledge-accessibility as sub-topics.

  • Technical summaries for recent progresses on conversational AI research.

  • Provided tables to organize studies into chronological order.

Abstract

Dialogue systems have achieved growing success in many areas thanks to the rapid advances of machine learning techniques. In the quest for generating more human-like conversations, one of the major challenges is to learn to generate responses in a more empathetic manner. In this review article, we focus on the literature of empathetic dialogue systems, whose goal is to enhance the perception and expression of emotional states, personal preference, and knowledge. Accordingly, we identify three key features that underpin such systems: emotion-awareness, personality-awareness, and knowledge-accessibility. The main goal of this review is to serve as a comprehensive guide to research and development on empathetic dialogue systems and to suggest future directions in this domain.

Introduction

The primary goal of building a dialogue system is to address users’ questions and concerns via emulating the way humans communicate with each other. As human language is too complicated to be considered as a single target, dialogue systems have to model different aspects of human communication separately. Recent years have witnessed the emergence of empathy models in the context of dialogue systems and, hence, an increasing attention from the natural language processing (NLP) community.

Empathy is the capability of projecting feelings and ideas of the other party to someone’s knowledge [1]. It plays an important part in the communication of human beings as it has the potential for enhancing their emotional bond. As noted by a previous study [2], incorporating empathy into the design of a dialogue system is also vital for improving user experience in human-computer interaction. More importantly, being empathetic is a necessary step for the dialogue agent to be perceived as a social character by users [3]. Building an empathetic dialogue system is then premised on the idea that it will result in improved user engagement and, consequently, more effective communication. Research on dialogue system has elaborated on the concept on dialogue system mainly from perspective of features. For example, Loojie et al. [4] stated that an empathetic dialogue system should be complimentary, attentive, and compassionate. In this survey, we are particularly concerned with the unique dimension of dialogue systems from the perspective of functions. Namely, what function has enabled empathetic behavior of a dialogue system. To our knowledge, this has not been discussed in depth by previous literature.

Early attempts to build dialogue systems can be dated back to the 1960s [5]. Since then, dialogue systems are either designed to perform specific tasks such as flight booking [6], healthcare [7], political debate [8], hence termed “task-specific dialogue systems”, or to chitchat as a way of entertainment [9], hence called “chatbots”. A task-specific dialogue system [10], [11] often consists of multiple modules including language understanding, dialogue state tracking, dialogue policy, and dialogue generation. On the other hand, recent progress in deep learning [12] also facilitates the use of end-to-end solutions to dialogue systems which can be more easily trained to simulate the behavior of human communication via access to a large amount of training data. As we will discuss in later sections, the process of generating responses conditioned on the existing contexts of a dialogue can be naturally modeled as a translation process where off-the-shelf end-to-end solutions such as the sequence-to-sequence (Seq2Seq) model [13] have already been proven effective.

The rapid growth of dialogue systems and their applications have intrigued many comprehensive surveys in the past decade. Chen et al. [14] mainly organize their survey by elaborating on each functional component of a dialogue system. Gao et al. [15] proposed the most recent review with good coverage of related topics, mainly focused on neural network-based approaches for building dialogue systems. Unlike [14] and [15], we position our perspective on dialogue systems with empathetic features. Related work [16] viewed empathy to be equivalent to emotion. We argue that empathy is not all about emotions. Indeed, a non-empathetic dialogue system may disappoint and bore the user for that the responses are too robotic yet incoherent, and consequently leads to the loss of affection.

Introducing emotion into the generation of dialogue could only partially address the problem. As illustrated by Fig. 1, a more comprehensive empathetic framework also has to access general knowledge as well as personalized knowledge. Personalization, in such a case, could increase the coherence and consistency of a dialogue system. With knowledge of user-specific information, the dialogue system could tailor responses towards the user’s preference and address questions relevant to the user’s untold background, and a virtuous cycle comes into form when the user tends to provide more information and clue about themselves. Moreover, external knowledge, being it task-specific or commonsense, usually complements the contexts of a conversation with additional background. Many facts that are obvious to human beings may be very opaque to a machine, for example: “I come to my friend’s house. Jimmy is my friend” will be understood as it is when it comes to vanilla dialogue systems. It will not conclude that “my friend’s house” means “Jimmy’s house” unless we construct a relationship between them. This is where the knowledge part comes into play: it helps dialogue systems become smarter, sharper, and more interesting. Although it seems prevalent to incorporate knowledge into dialogue systems, reasoning, retrieving and representing a large scale knowledge base remain challenging. All three components (i.e., emotion, personalization, and knowledge) work together to ensure a smooth and natural flow of the conversation.

Considering such complexity of empathetic systems, we take a perspective that goes beyond the merely emotional definition of empathetic dialogue systems by identifying three pillars. Such pillars accordingly represent the three main sub-topics presented in this survey:

  • perceiving and expressing emotion (Section 3 – Affective Dialogue Systems)

  • caring each individual (Section 4 – Personalized Dialogue Systems), and

  • casting into knowledge (Section 5 – Knowledgeable Dialogue Systems).

In addition to previous surveys [14], we also cover the most recent advances in the area of empathetic dialogue systems. Especially, we would like to emphasize the end-to-end model more than traditional pipeline models as we believe the former represents the current trend of this field. To the best of our knowledge, we are the first to survey the empathetic features of a dialogue system. Overall, we primarily surveyed 35 papers selected from those published on prestigious venues in the past 10 years.

Section snippets

Propaedeutic background

A dialogue system is not a system built on top of one model. Instead, it is built on integrating multiple techniques due to the complexity of language and tasks. In this section, we present a technical introduction to recent techniques that serve as the backbone of an empathetic dialogue system.

Affective dialogue system

Emotion plays an important role in cognition and social behavior [31]. Existing study suggest that emotion is a reaction and a social and cultural interaction that is continuously developing by the relationships between human and the surrounding environment [32]. Yet, the definition and categorization of emotions remain fuzzy and long-debated among psychologists and philosophers [33]. In the scope of this paper, we focus on the representation of emotion in dialogue system (or human-computer

Personalized dialogue system

The communication between a dialogue system and a human is generally desired to be adaptive to the variance in personal preferences to increase communication effectiveness [97], [98] based on appropriate perception of the speaker’s personality of the speaker. On the other hand, personality affects the way of communication in various manners including both linguistic style [99] and acoustic traits [100]. As it feels more natural to interact with a ‘thing’ that has its own personality, implanting

Knowledge-based dialogue system

Generating a conversation is a process of searching and communicating with the knowledge that might come from multiple sources including the current dialogue, personal background, or even external knowledge sources such as a knowledge graph [110]. The comprehension of dialogue thus requires access to the background knowledge which has created a gap between responses generated by human beings and those by data-driven dialogue agents [13], [18], [48]. Fig. 18 shows an example in which the

Future directions

Many research challenges remain in the context of empathetic dialogue systems. For example, little effort has been devoted to combine the three key components (i.e., personalization, knowledge, and emotion) to build a more comprehensive empathetic system. With advances in each subtopic, it becomes possible to further extend this research area on different fronts:

  • 1.

    Multi-goal ManagementAs pointed out by Pollack et al. [139], communication might be overloaded with multiple objectives. This becomes

Conclusion

Although emotion, personality and knowledge have been considered key components by existing research on dialogue systems, little work has been done towards investigating the correlation between them in a broader context in order to enhance human-computer interaction. In this survey, we provided a unified view of these different research efforts under the topic of empathetic dialogue systems and discussed recent advancements and trends in this context. As one of the key features in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research is supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046).

References (150)

  • Y. Zemlyanskiy et al.

    Aiming to know you better perhaps makes me a more engaging dialogue partner

    Proceedings of the 22nd Conference on Computational Natural Language Learning

    (2018)
  • R.S. Nickerson et al.

    Empathy and knowledge projection

    Soc. Neurosc. Empathy

    (2009)
  • K. Liu et al.

    Embedded empathy in continuous, interactive health assessment

    CHI Workshop on HCI Challenges in Health Assessment

    (2005)
  • M.F. McTear et al.

    The Conversational Interface

    (2016)
  • J. Weizenbaum

    Computer Power and Human Reason: From Judgment to Calculation

    (1976)
  • J.-Y. Magadur et al.

    A french oral dialogue system for flight reservations over the telephone

    Third European Conference on Speech Communication and Technology (EUROSPEECH)

    (1993)
  • F. Morbini et al.

    A mixed-initiative conversational dialogue system for healthcare

    SIGDIAL Conference

    (2012)
  • A. Khatua et al.

    Let’s chat about brexit! a politically-sensitive dialog system based on twitter data

    ICDM Workshops

    (2017)
  • H. Zhou et al.

    Emotional chatting machine: emotional conversation generation with internal and external memory

    AAAI Conference on Artificial Intelligence

    (2018)
  • G.G. Lee et al.

    Natural Language Dialog Systems and Intelligent Assistants

    (2015)
  • H. Xu et al.

    End-to-end latent-variable task-oriented dialogue system with exact log-likelihood optimization

    World Wide Web

    (2020)
  • S. Minaee et al.

    Deep learning based text classification: acomprehensive review

    arXiv Preprint arXiv:2004.03705

    (2020)
  • I. Sutskever et al.

    Sequence to sequence learning with neural networks

    Advances in neural information processing systems

    (2014)
  • H. Chen et al.

    A survey on dialogue systems: recent advances and new frontiers

    ACM SIGKDD Explorat. Newsletter

    (2017)
  • J. Gao et al.

    Neural approaches to conversational AI

    Found. Trend. Inf. Retriev.

    (2019)
  • P. Fung et al.

    Empathetic dialog systems

    Language Resources and Evaluation Conference (LREC)

    (2018)
  • T. Mikolov et al.

    Recurrent neural network based language model.

    INTERSPEECH

    (2010)
  • K. Cho et al.

    Learning phrase representations using rnn encoder–decoder for statistical machine translation

    Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2014)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • D. Bahdanau et al.

    Neural machine translation by jointly learning to align and translate

    arXiv preprint arXiv:1409.0473

    (2014)
  • W. Zaremba et al.

    Learning to execute

    arXiv preprint arXiv:1410.4615

    (2014)
  • J. Weston et al.

    Memory networks

    CoRR

    (2015)
  • D.P. Kingma et al.

    Auto-encoding variational bayes

    International Conference on Learning Representation

    (2013)
  • I. Gulrajani et al.

    Improved training of wasserstein gans

    Advances in neural information processing systems

    (2017)
  • B. Zhang et al.

    Variational neural machine translation

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    (2016)
  • I.J. Goodfellow et al.

    Generative adversarial nets

    Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2

    (2014)
  • J. Li et al.

    Deep reinforcement learning for dialogue generation

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    (2016)
  • T.H. Bui et al.

    A tractable hybrid ddn–pomdp approach to affective dialogue modeling for probabilistic frame-based dialogue systems

    Nat. Lang. Eng.

    (2009)
  • T.H. Bui et al.

    Affective dialogue management using factored pomdps

    Interactive Collaborative Information Systems

    (2010)
  • V. Rieser et al.

    Reinforcement learning for adaptive dialogue systems: A data-driven methodology for dialogue management and natural language generation

    (2011)
  • S. Marsella et al.

    Computationally modeling human emotion

    Commun. ACM

    (2014)
  • C. Marinetti, P. Moore, P. Lucas, B. Parkinson, Emotions in Social Interactions: Unfolding Emotional Experience, pp....
  • K.R. Scherer

    What are emotions? and how can they be measured?

    Soc. Sci. Inf.

    (2005)
  • N.H. Frijda

    Emotion, cognitive structure, and action tendency

    Cognit. Emot.

    (1987)
  • J.R. Busemeyer et al.

    Integrating emotional processes into decision-making models

    Integrat. Model. Cognit. Syst.

    (2007)
  • P. Dybala et al.

    Activating humans with humor–a dialogue system that users want to interact with

    IEICE Trans. Inf. Syst.

    (2009)
  • C.N. Moridis et al.

    Affective learning: empathetic agents with emotional facial and tone of voice expressions

    IEEE Trans. Affect. Comput.

    (2012)
  • R.W. Picard

    Affective computing

    (2000)
  • J. Pittermann et al.

    Handling Emotions in Human-Computer Dialogues

    (2009)
  • S. Pauletto et al.

    Exploring expressivity and emotion with artificial voice and speech technologies

    Logoped. Phoniatr. Vocol.

    (2013)
  • Cited by (184)

    • A Survey on Neural Data-to-Text Generation

      2024, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus
    View full text