A persona aware persuasive dialogue policy for dynamic and co-operative goal setting
Introduction
The development of virtual agents (VAs) also known as dialogue agents, dialogue policy and dialogue systems (Abdul-Kader and Woods, 2015, Dahiya, 2017) has been one of the foremost applications of artificial intelligence. Its relevance as well as its feasibility have increased over time due to the advancement of machine learning and deep learning techniques. Also, the objective of virtual agents gets intensified as per the evolving world and its requirements. In the recent few years, virtual agents are evolving as per the user’s choice rather than an option. Depending upon the nature of objectives, dialogue agents are broadly classified into two paradigms: Task-Oriented/Goal-Oriented Dialogue system (Lipton et al., 2018, Wei et al., 2018) and Open-ended Dialogue system (Adiwardana et al., 2020, Sankar and Ravi, 2019). In a task-oriented dialogue system, the agent is intended to assist users to complete their tasks. An agent aims to satisfy the communication need of humans through an engaging and interesting conversation in an open-ended dialogue system.
In recent years, the popularity of virtual agents particularly task-oriented dialogue agents has increased immensely due to their effectiveness and simplicity in various domains such as industry, e-commerce, health, and academia (Nuruzzaman and Hussain, 2018, Ranoliya et al., 2017, Wei et al., 2018). A well-developed intelligent virtual agent can be a more economical and scalable choice rather than employing human resources for a well-defined task that needs interaction between end-users and the agent for task completion (Cui et al., 2017).
This drive was vital to the development of some of the most well-known virtual assistants, such as Apple Siri, Amazon Echo, Google Assistant. However, these VAs are proficient in handling some simple and straightforward tasks like playing songs and booking appointment. However, the current world problems/tasks and its complexities have also upgraded over time, and thus it demands an advanced virtual agent that can assist users in their task completion with gratification. One of the major challenges is to deal with end users’ dynamic goal setting behavior as usually in day-to-day life, they decide their final goals depending upon their primary requirements, overall utility, and also sometimes selling agents’ serving capability. The virtual agent always attempts to improve transformation ratio which is defined as follows: where C is the number of dialogues that ended successfully, I is the number of dialogues that are initiated by the VA. In the e-commerce domain, it is popularly known as salesvisit ratio. In this work, we aim to develop such a dialogue system, which can elevate the agents’ performance also in dynamic goal setting and goal unavailability scenarios.
The core responsibility of a virtual agent is to select the most appropriate action as per the current dialogue state. Reinforcement Learning(RL) (Sutton & Barto, 2018) has proven to be one of the most efficient techniques for solving a decision making problem (Lee, Seo, & Jung, 2012). The RL agent gets a reward/penalty for its action/decision as per its relevance and appropriateness with the current dialogue state. The agent learns to maximize its objective, i.e., maximizing cumulative reward over an episode during training. So, it can be framed as a sequential decision problem where the dialogue environment is being represented as a Partial Observable Markov Decision Process (POMDP) (Young et al., 2010) comprising of dialogue state space, agent action space and reward model.
In task-oriented dialogue systems, users task typically consists of three main subtasks: informing task constraints (slot value pairs), querying information (requesting slots), and booking/completing the task (Li, Chen, Li, Gao, & Celikyilmaz, 2017). Existing Virtual agents (Ilievski, 2018, Li et al., 2017, Liu and Lane, 2017, Saha et al., 2018) conclude a conversation with failure if they fail to find the user’s choice in their knowledge base. However, in most cases like restaurant booking, movie ticket booking, e-commerce/online shopping, users do not have a rigid goal/choice, i.e., they may like to reach a common goal collaboratively in case of goal unavailability. They may upgrade/downgrade/update their goal components dynamically if a similar goal with its aspects is presented to the user. The utmost priority of both user and VA is successful task completion. In the case of goal unavailability scenarios, a significant number of users may agree on a close goal if they get to know the attractive features of the suggested goal. The efficacy of a task-oriented virtual agent can surely be improved if the end-users find a similar experience to one with a real agent. However, the existing VAs completely flatters in case of goal unavailability scenarios. A VA should present an identical goal rather than ending the dialogue unsuccessfully. The user and the agent should collaboratively drive the interaction dynamically towards accomplishing a user-satisfying goal that can be served by the VA. A use case has been shown in Fig. 1.
Persuasion success is very much subjective, i.e., one may get convinced with an agent’s proposed persuasion target feature whereas, for someone other, it might not even be relevant. Hence, the proposed agent utilizes personal attributes for convincing users on a similar goal. Motivated by the inability and real-world task agent’s behavior, we aim to develop a co-operative and persuasive virtual agent, which serves users even in case of goal unavailability situations by finding a similar task goal and persuading them on their personal traits. Our proposed method outperforms several baselines and state of the art methods in all evaluation metrics. We observed that the persona aware persuasive dialogue agent outperforms generalized persuasive dialogue agent by a large margin, which indicates the efficacy of personalized persuasion compared to a fixed persuasive strategy. Furthermore, we also found that the task oriented reward is the most essential reward for training a reinforcement learning based agent and agents trained without task based reward do not even converge. To the best of our knowledge, the work is the first attempt towards building a task-oriented VA that serves users even in case of goal unavailability situations by providing a dynamic and co-operative goal-setting framework. The proposed methodology is applicable for task-oriented dialogue systems where end-users prefer to decide their goal dynamically or have a flexible task goal. In this work, it has been trained and evaluated on a phone buying and selling use case, which can be extended to other domains and other tasks with minimal change, particularly intent, slot, and reward module. The key contributions of this work are as follows :
- •
We aim to develop an intelligent and persona aware virtual agent that can deal with goal unavailability scenarios through a collaborative and persuasive approach.
- •
An unified and a multi-agent (two agents: Agent1 is responsible for slot filling and dynamic goal setting whereas Agent2 assists an user in case of goal unavailability through proposing an identical goal) RL based dialogue policy has been presented for dynamic and collaborative goal setting.
- •
A unique POMDP formulation with a diversified user simulator has been proposed. The proposed pseudo environment simulates both dynamic and collaborative goal setting.
- •
A novel reward model that combines persona-based and sentiment-based rewards along with task-oriented reward has been utilized for training. The sentiment and persona-based rewards motivate the agent for user adaptable behavior that contributes a crucial role in user gratification.
- •
To incorporate all these aspects, we created and annotated a dialogue corpus named Deviation adapted Persuasive Virtual Agent (DevPVA) because of unavailability of such corpus featured with dynamic and goal unavailability scenarios. The obtained results and its post-evaluation in a real environment illustrate that the proposed VA uplifts the transformation ratio (#Successful dialogue completion/#Dialogue initiation or sales/visitors) significantly. The other metrics such as average dialogue length and human score illustrate the correctness and gratification aspect of the VA.
Section 2 outlines the recent work related to the proposed problem followed by the motivation of this work. Section 3 describes the dataset which has been used in our experiment. The proposed method with its architecture has been explained and illustrated in Section 4. Section 5 presents the obtained results with the thorough analysis. Section 6 summarizes the work and highlights its limitations and future works.
Section snippets
Background
Since early 2000, most of the developed dialogue agents such as ELIZA (Weizenbaum, 1966) and ALICE (Shawar & Atwell, 2002) were rule-based systems which primarily utilize word/pattern matching techniques along with a set of pre-defined rules for response selection. The major problem with a rule-based system is the availability of huge number of rules and conflicting situations. It becomes hard to generalize a rule-based dialogue system even for a small problem. There are mainly two sub
Dataset
First, we probed existing relevant data corpora such as MultiWoz (Budzianowski et al., 2018), ATIS (Hemphill, Godfrey, & Doddington, 1990), bAbi (Bordes, Boureau, & Weston, 2016), cornell-movie corpus (Danescu-Niculescu-Mizil & Lee, 2011), Ubuntu dialogue corpus (Lowe, Pow, Serban, & Pineau, 2015), Deal or not (Lewis, Yarats, Dauphin, Parikh, & Batra, 2017). However, no suitable data corpus was found for the dynamic and goal unavailability scenarios, which could be adapted as it is for the
Methodology
The central acting component of a virtual agent is the dialogue manager that compromises of dialogue state and dialogue policy. We propose a multi-agent persona aware dialogue system for dynamic and co-operative goal setting, which elevates both user satisfaction as well as the agent’s effectiveness particularly transformation ratio/task success rate. We incorporate a Dynamic and Co-operative Goal Driven Module (DyCoGDM) with RL based dialogue manager. DyCoGDM strengthens the dialogue manager
Result and analysis
The most foremost concern for any task-oriented VA is to complete user tasks with gratification, i.e., manual evaluation (user satisfaction) is also required along with automatic evaluation. We evaluated the proposed VA with both automatic method and manual analysis. The following three most popularly used automatic metrics (Deriu et al., 2020, Peng et al., 2017, Weisz et al., 2018) have been used to assess the quality of the VA and also to compare the performance with other baselines.
- 1.
Learning
Conclusion and future work
In the real world, users may not have a rigid goal; however, existing dialogue agents terminate the dialogue unsuccessfully if they find any goal unavailability scenario. This paper presents two different persona aware dialogue policies namely, A Unified Persona aware Persuasive Dialogue Policy(UPPDP) and Multi-agent Persona aware Persuasive Dialogue Policy(MaPPDP) that elevate the capability of a virtual agent to deal with dynamic and cooperative goal setting. We proposed a novel reward model
CRediT authorship contribution statement
Abhisek Tiwari: Conceptualization, Data curation, Methodology, Validation, Formal analysis, Investigation, Visualization, Writing. Tulika Saha: Conceptualization, Supervision, Investigation. Sriparna Saha: Conceptualization, Supervision, Project administration. Shubhashis Sengupta: Conceptualization, Supervision, Project administration. Anutosh Maitra: Conceptualization, Supervision, Project administration. Roshni Ramnani: Conceptualization, Supervision, Project administration. Pushpak
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The research reported in this paper is an outcome of the project “Autonomous Goal-Oriented and Knowledge-Driven Neural Conversational Agents ” (Project No. IITP/2020/458), sponsored by Accenture LLP.
References (66)
- et al.
Reward, motivation, and reinforcement learning
Neuron
(2002) - et al.
Reinforcement-learning based dialogue system for human–robot interactions with socially-inspired rewards
Computer Speech and Language
(2015) - et al.
The hidden information state model: A practical framework for POMDP-based spoken dialogue management
Computer Speech and Language
(2010) - et al.
Survey on chatbot design techniques in speech conversation systems
International Journal of Advanced Computer Science and Applications
(2015) - et al.
Towards a human-like open-domain chatbot
(2020) Kaggle gsm arean
(2017)- et al.
Learning end-to-end goal-oriented dialog
(2016) - Budzianowski, P., Wen, T.-H., Tseng, B.-H., Casanueva, I. n., Ultes, S., & Ramadan, O., et al. (2018). MultiWOZ-a...
- et al.
Implementing modular dialogue systems: A case of study
- Chen, M., Liu, R., Shen, L., Yuan, S., Zhou, J., & Wu, Y., et al. (2020). The JDDC corpus: A large-scale multi-turn...
Bert for joint intent classification and slot filling
Multi-agent deep reinforcement learning for large-scale traffic signal control
IEEE Transactions on Intelligent Transportation Systems
Empirical evaluation of gated recurrent neural networks on sequence modeling
Simpleds: A simple deep reinforcement learning dialogue system
A tool of conversation: Chatbot
International Journal of Computer Sciences and Engineering
A virtual selling agent which is persuasive and adaptive
Survey on evaluation methods for dialogue systems
Artificial Intelligence Review
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
The ATIS spoken language systems pilot corpus
Building advanced dialogue managers for goal-oriented dialogue systems
ArXiv
Evaluating persuasion strategies and deep reinforcement learning methods for negotiation dialogue agents
Development of an e-commerce sales chatbot
Kappa testi.
Journal of Mood Disorders
Neural basis of reinforcement learning and decision making
Annual Review of Neuroscience
Learning dialogue strategies within the markov decision process framework
A user simulator for task-completion dialogues
Cited by (6)
T-VAKS: A Tutoring-Based Multimodal Dialog System via Knowledge Selection
2023, Frontiers in Artificial Intelligence and ApplicationsIntroducing Multi-modality in Persuasive Task Oriented Virtual Sales Agent
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Towards Sentiment and Emotion aided Intent Detection
2022, Proceedings - International Conference on Pattern Recognition