A survey on empathetic dialogue systems
Introduction
The primary goal of building a dialogue system is to address users’ questions and concerns via emulating the way humans communicate with each other. As human language is too complicated to be considered as a single target, dialogue systems have to model different aspects of human communication separately. Recent years have witnessed the emergence of empathy models in the context of dialogue systems and, hence, an increasing attention from the natural language processing (NLP) community.
Empathy is the capability of projecting feelings and ideas of the other party to someone’s knowledge [1]. It plays an important part in the communication of human beings as it has the potential for enhancing their emotional bond. As noted by a previous study [2], incorporating empathy into the design of a dialogue system is also vital for improving user experience in human-computer interaction. More importantly, being empathetic is a necessary step for the dialogue agent to be perceived as a social character by users [3]. Building an empathetic dialogue system is then premised on the idea that it will result in improved user engagement and, consequently, more effective communication. Research on dialogue system has elaborated on the concept on dialogue system mainly from perspective of features. For example, Loojie et al. [4] stated that an empathetic dialogue system should be complimentary, attentive, and compassionate. In this survey, we are particularly concerned with the unique dimension of dialogue systems from the perspective of functions. Namely, what function has enabled empathetic behavior of a dialogue system. To our knowledge, this has not been discussed in depth by previous literature.
Early attempts to build dialogue systems can be dated back to the 1960s [5]. Since then, dialogue systems are either designed to perform specific tasks such as flight booking [6], healthcare [7], political debate [8], hence termed “task-specific dialogue systems”, or to chitchat as a way of entertainment [9], hence called “chatbots”. A task-specific dialogue system [10], [11] often consists of multiple modules including language understanding, dialogue state tracking, dialogue policy, and dialogue generation. On the other hand, recent progress in deep learning [12] also facilitates the use of end-to-end solutions to dialogue systems which can be more easily trained to simulate the behavior of human communication via access to a large amount of training data. As we will discuss in later sections, the process of generating responses conditioned on the existing contexts of a dialogue can be naturally modeled as a translation process where off-the-shelf end-to-end solutions such as the sequence-to-sequence (Seq2Seq) model [13] have already been proven effective.
The rapid growth of dialogue systems and their applications have intrigued many comprehensive surveys in the past decade. Chen et al. [14] mainly organize their survey by elaborating on each functional component of a dialogue system. Gao et al. [15] proposed the most recent review with good coverage of related topics, mainly focused on neural network-based approaches for building dialogue systems. Unlike [14] and [15], we position our perspective on dialogue systems with empathetic features. Related work [16] viewed empathy to be equivalent to emotion. We argue that empathy is not all about emotions. Indeed, a non-empathetic dialogue system may disappoint and bore the user for that the responses are too robotic yet incoherent, and consequently leads to the loss of affection.
Introducing emotion into the generation of dialogue could only partially address the problem. As illustrated by Fig. 1, a more comprehensive empathetic framework also has to access general knowledge as well as personalized knowledge. Personalization, in such a case, could increase the coherence and consistency of a dialogue system. With knowledge of user-specific information, the dialogue system could tailor responses towards the user’s preference and address questions relevant to the user’s untold background, and a virtuous cycle comes into form when the user tends to provide more information and clue about themselves. Moreover, external knowledge, being it task-specific or commonsense, usually complements the contexts of a conversation with additional background. Many facts that are obvious to human beings may be very opaque to a machine, for example: “I come to my friend’s house. Jimmy is my friend” will be understood as it is when it comes to vanilla dialogue systems. It will not conclude that “my friend’s house” means “Jimmy’s house” unless we construct a relationship between them. This is where the knowledge part comes into play: it helps dialogue systems become smarter, sharper, and more interesting. Although it seems prevalent to incorporate knowledge into dialogue systems, reasoning, retrieving and representing a large scale knowledge base remain challenging. All three components (i.e., emotion, personalization, and knowledge) work together to ensure a smooth and natural flow of the conversation.
Considering such complexity of empathetic systems, we take a perspective that goes beyond the merely emotional definition of empathetic dialogue systems by identifying three pillars. Such pillars accordingly represent the three main sub-topics presented in this survey:
- •
perceiving and expressing emotion (Section 3 – Affective Dialogue Systems)
- •
caring each individual (Section 4 – Personalized Dialogue Systems), and
- •
casting into knowledge (Section 5 – Knowledgeable Dialogue Systems).
In addition to previous surveys [14], we also cover the most recent advances in the area of empathetic dialogue systems. Especially, we would like to emphasize the end-to-end model more than traditional pipeline models as we believe the former represents the current trend of this field. To the best of our knowledge, we are the first to survey the empathetic features of a dialogue system. Overall, we primarily surveyed 35 papers selected from those published on prestigious venues in the past 10 years.
Section snippets
Propaedeutic background
A dialogue system is not a system built on top of one model. Instead, it is built on integrating multiple techniques due to the complexity of language and tasks. In this section, we present a technical introduction to recent techniques that serve as the backbone of an empathetic dialogue system.
Affective dialogue system
Emotion plays an important role in cognition and social behavior [31]. Existing study suggest that emotion is a reaction and a social and cultural interaction that is continuously developing by the relationships between human and the surrounding environment [32]. Yet, the definition and categorization of emotions remain fuzzy and long-debated among psychologists and philosophers [33]. In the scope of this paper, we focus on the representation of emotion in dialogue system (or human-computer
Personalized dialogue system
The communication between a dialogue system and a human is generally desired to be adaptive to the variance in personal preferences to increase communication effectiveness [97], [98] based on appropriate perception of the speaker’s personality of the speaker. On the other hand, personality affects the way of communication in various manners including both linguistic style [99] and acoustic traits [100]. As it feels more natural to interact with a ‘thing’ that has its own personality, implanting
Knowledge-based dialogue system
Generating a conversation is a process of searching and communicating with the knowledge that might come from multiple sources including the current dialogue, personal background, or even external knowledge sources such as a knowledge graph [110]. The comprehension of dialogue thus requires access to the background knowledge which has created a gap between responses generated by human beings and those by data-driven dialogue agents [13], [18], [48]. Fig. 18 shows an example in which the
Future directions
Many research challenges remain in the context of empathetic dialogue systems. For example, little effort has been devoted to combine the three key components (i.e., personalization, knowledge, and emotion) to build a more comprehensive empathetic system. With advances in each subtopic, it becomes possible to further extend this research area on different fronts:
- 1.
Multi-goal ManagementAs pointed out by Pollack et al. [139], communication might be overloaded with multiple objectives. This becomes
Conclusion
Although emotion, personality and knowledge have been considered key components by existing research on dialogue systems, little work has been done towards investigating the correlation between them in a broader context in order to enhance human-computer interaction. In this survey, we provided a unified view of these different research efforts under the topic of empathetic dialogue systems and discussed recent advancements and trends in this context. As one of the key features in
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research is supported by the Agency for Science, Technology and Research (A*STAR) under its AME Programmatic Funding Scheme (Project #A18A2b0046).
References (150)
- et al.
Persuasive robotic assistant for health self-management of older adults: design and evaluation of social behaviors
Int. J. Hum. Comput. Stud.
(2010) - et al.
How emotion is made and measured
Int. J. Hum.-Comput. Stud.
(2007) - et al.
Affective interaction: how emotional agents affect users
Int. J. Hum. Comput. Stud.
(2009) - et al.
Computers that recognise and respond to user emotion: theoretical and practical implications
Interact. Comput.
(2002) - et al.
Employing personality-rich virtual persons—new tools required
Comput. Graph.
(2007) - et al.
Human–machine dialogue modelling with the fusion of word-and sentence-level emotions
Knowl. Based Syst.
(2020) - et al.
A computational approach to politeness with application to social factors
Proceedings of ACL
(2013) - et al.
Interactive double states emotion cell model for textual dialogue emotion prediction
Knowl. Based Syst.
(2020) - et al.
Modeling and evaluating empathy in embodied companion agents
Int. J. Hum. Comput. Stud.
(2007) - et al.
Toward controlled generation of text
Proceedings of the 34th International Conference on Machine Learning-Volume 70
(2017)
Aiming to know you better perhaps makes me a more engaging dialogue partner
Proceedings of the 22nd Conference on Computational Natural Language Learning
Empathy and knowledge projection
Soc. Neurosc. Empathy
Embedded empathy in continuous, interactive health assessment
CHI Workshop on HCI Challenges in Health Assessment
The Conversational Interface
Computer Power and Human Reason: From Judgment to Calculation
A french oral dialogue system for flight reservations over the telephone
Third European Conference on Speech Communication and Technology (EUROSPEECH)
A mixed-initiative conversational dialogue system for healthcare
SIGDIAL Conference
Let’s chat about brexit! a politically-sensitive dialog system based on twitter data
ICDM Workshops
Emotional chatting machine: emotional conversation generation with internal and external memory
AAAI Conference on Artificial Intelligence
Natural Language Dialog Systems and Intelligent Assistants
End-to-end latent-variable task-oriented dialogue system with exact log-likelihood optimization
World Wide Web
Deep learning based text classification: acomprehensive review
arXiv Preprint arXiv:2004.03705
Sequence to sequence learning with neural networks
Advances in neural information processing systems
A survey on dialogue systems: recent advances and new frontiers
ACM SIGKDD Explorat. Newsletter
Neural approaches to conversational AI
Found. Trend. Inf. Retriev.
Empathetic dialog systems
Language Resources and Evaluation Conference (LREC)
Recurrent neural network based language model.
INTERSPEECH
Learning phrase representations using rnn encoder–decoder for statistical machine translation
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Long short-term memory
Neural Comput.
Neural machine translation by jointly learning to align and translate
arXiv preprint arXiv:1409.0473
Learning to execute
arXiv preprint arXiv:1410.4615
Memory networks
CoRR
Auto-encoding variational bayes
International Conference on Learning Representation
Improved training of wasserstein gans
Advances in neural information processing systems
Variational neural machine translation
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Generative adversarial nets
Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2
Deep reinforcement learning for dialogue generation
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
A tractable hybrid ddn–pomdp approach to affective dialogue modeling for probabilistic frame-based dialogue systems
Nat. Lang. Eng.
Affective dialogue management using factored pomdps
Interactive Collaborative Information Systems
Reinforcement learning for adaptive dialogue systems: A data-driven methodology for dialogue management and natural language generation
Computationally modeling human emotion
Commun. ACM
What are emotions? and how can they be measured?
Soc. Sci. Inf.
Emotion, cognitive structure, and action tendency
Cognit. Emot.
Integrating emotional processes into decision-making models
Integrat. Model. Cognit. Syst.
Activating humans with humor–a dialogue system that users want to interact with
IEICE Trans. Inf. Syst.
Affective learning: empathetic agents with emotional facial and tone of voice expressions
IEEE Trans. Affect. Comput.
Affective computing
Handling Emotions in Human-Computer Dialogues
Exploring expressivity and emotion with artificial voice and speech technologies
Logoped. Phoniatr. Vocol.
Cited by (184)
Emotion-and-knowledge grounded response generation in an open-domain dialogue setting
2024, Knowledge-Based SystemsSMFNM: Semi-supervised multimodal fusion network with main-modal for real-time emotion recognition in conversations
2023, Journal of King Saud University - Computer and Information SciencesA Survey on Neural Data-to-Text Generation
2024, IEEE Transactions on Knowledge and Data EngineeringPIRNet: Personality-Enhanced Iterative Refinement Network for Emotion Recognition in Conversation
2024, IEEE Transactions on Neural Networks and Learning Systems