A persona aware persuasive dialogue policy for dynamic and co-operative goal setting

https://doi.org/10.1016/j.eswa.2021.116303Get rights and content

Highlights

  • Existing VAs abort the conversation during goal unavailability situations.

  • Proposal of novel DM architecture and joint reward (task, sentiment & persona) model.

  • Design of both unified and multi-agent persona aware persuasive policies.

  • Proposed VA elevates both user satisfaction & agent utility.

  • Easy adaptation to other task-oriented dialogue settings.

Abstract

Contextualization:

In recent years, the popularity of virtual agents particularly task-oriented dialogue agents has increased immensely due to their effectiveness and simplicity in various domains such as industry, e-commerce, and health.

Problem:

In real-world, users do not have always a predefined and immutable goal, i.e., they may upgrade/downgrade/update their task goal dynamically depending upon their utility and the serving capability of the assisting agent. However, the existing Virtual Agents (VAs) in the dialogue literature relinquish and yield dialogue failure if they find any dynamic goal setting or goal unavailability scenarios.

Contributions and methodology:

Motivated by these inabilities of existing VAs, we propose some intelligent and expert Dialogue Agents (A Unified Dialogue Agent and Multi-agent Dialogue system) that can deal with dynamic and goal unavailability situations to elevate both user satisfaction and the agent’s utility particularly task success rate. The proposed architecture incorporates a goal guiding module namely Dynamic and Co-Operative Goal Driven Module (DyCoGDM), which traces goal status and resolves goal discrepancy through dynamic goal setting (Goal Formulator) and personalized persuasion (Goal Persuader) mechanisms. We also created and annotated a dialogue corpus because of unavailability of such corpus featured with dynamic and goal unavailability scenarios.

Findings and implications:

Our proposed method outperforms several baselines and state of the art methods in all evaluation metrics. The proposed VA is capable of dealing with dynamic goals and goal unavailability scenarios effectively. The study found that the persona aware persuasive dialogue agent outperforms generalized persuasive dialogue agent by a large margin. Furthermore, we also observed that the task oriented reward is the most essential reward for training a reinforcement learning based agent and agents trained without task based reward do not even converge.

Introduction

The development of virtual agents (VAs) also known as dialogue agents, dialogue policy and dialogue systems (Abdul-Kader and Woods, 2015, Dahiya, 2017) has been one of the foremost applications of artificial intelligence. Its relevance as well as its feasibility have increased over time due to the advancement of machine learning and deep learning techniques. Also, the objective of virtual agents gets intensified as per the evolving world and its requirements. In the recent few years, virtual agents are evolving as per the user’s choice rather than an option. Depending upon the nature of objectives, dialogue agents are broadly classified into two paradigms: Task-Oriented/Goal-Oriented Dialogue system (Lipton et al., 2018, Wei et al., 2018) and Open-ended Dialogue system (Adiwardana et al., 2020, Sankar and Ravi, 2019). In a task-oriented dialogue system, the agent is intended to assist users to complete their tasks. An agent aims to satisfy the communication need of humans through an engaging and interesting conversation in an open-ended dialogue system.

In recent years, the popularity of virtual agents particularly task-oriented dialogue agents has increased immensely due to their effectiveness and simplicity in various domains such as industry, e-commerce, health, and academia (Nuruzzaman and Hussain, 2018, Ranoliya et al., 2017, Wei et al., 2018). A well-developed intelligent virtual agent can be a more economical and scalable choice rather than employing human resources for a well-defined task that needs interaction between end-users and the agent for task completion (Cui et al., 2017).

This drive was vital to the development of some of the most well-known virtual assistants, such as Apple Siri, Amazon Echo, Google Assistant. However, these VAs are proficient in handling some simple and straightforward tasks like playing songs and booking appointment. However, the current world problems/tasks and its complexities have also upgraded over time, and thus it demands an advanced virtual agent that can assist users in their task completion with gratification. One of the major challenges is to deal with end users’ dynamic goal setting behavior as usually in day-to-day life, they decide their final goals depending upon their primary requirements, overall utility, and also sometimes selling agents’ serving capability. The virtual agent always attempts to improve transformation ratio which is defined as follows: Transformationratio=CIwhere C is the number of dialogues that ended successfully, I is the number of dialogues that are initiated by the VA. In the e-commerce domain, it is popularly known as sales/visit ratio. In this work, we aim to develop such a dialogue system, which can elevate the agents’ performance also in dynamic goal setting and goal unavailability scenarios.

The core responsibility of a virtual agent is to select the most appropriate action as per the current dialogue state. Reinforcement Learning(RL) (Sutton & Barto, 2018) has proven to be one of the most efficient techniques for solving a decision making problem (Lee, Seo, & Jung, 2012). The RL agent gets a reward/penalty for its action/decision as per its relevance and appropriateness with the current dialogue state. The agent learns to maximize its objective, i.e., maximizing cumulative reward over an episode during training. So, it can be framed as a sequential decision problem where the dialogue environment is being represented as a Partial Observable Markov Decision Process (POMDP) (Young et al., 2010) comprising of dialogue state space, agent action space and reward model.

In task-oriented dialogue systems, users task typically consists of three main subtasks: informing task constraints (slot value pairs), querying information (requesting slots), and booking/completing the task (Li, Chen, Li, Gao, & Celikyilmaz, 2017). Existing Virtual agents (Ilievski, 2018, Li et al., 2017, Liu and Lane, 2017, Saha et al., 2018) conclude a conversation with failure if they fail to find the user’s choice in their knowledge base. However, in most cases like restaurant booking, movie ticket booking, e-commerce/online shopping, users do not have a rigid goal/choice, i.e., they may like to reach a common goal collaboratively in case of goal unavailability. They may upgrade/downgrade/update their goal components dynamically if a similar goal with its aspects is presented to the user. The utmost priority of both user and VA is successful task completion. In the case of goal unavailability scenarios, a significant number of users may agree on a close goal if they get to know the attractive features of the suggested goal. The efficacy of a task-oriented virtual agent can surely be improved if the end-users find a similar experience to one with a real agent. However, the existing VAs completely flatters in case of goal unavailability scenarios. A VA should present an identical goal rather than ending the dialogue unsuccessfully. The user and the agent should collaboratively drive the interaction dynamically towards accomplishing a user-satisfying goal that can be served by the VA. A use case has been shown in Fig. 1.

Persuasion success is very much subjective, i.e., one may get convinced with an agent’s proposed persuasion target feature whereas, for someone other, it might not even be relevant. Hence, the proposed agent utilizes personal attributes for convincing users on a similar goal. Motivated by the inability and real-world task agent’s behavior, we aim to develop a co-operative and persuasive virtual agent, which serves users even in case of goal unavailability situations by finding a similar task goal and persuading them on their personal traits. Our proposed method outperforms several baselines and state of the art methods in all evaluation metrics. We observed that the persona aware persuasive dialogue agent outperforms generalized persuasive dialogue agent by a large margin, which indicates the efficacy of personalized persuasion compared to a fixed persuasive strategy. Furthermore, we also found that the task oriented reward is the most essential reward for training a reinforcement learning based agent and agents trained without task based reward do not even converge. To the best of our knowledge, the work is the first attempt towards building a task-oriented VA that serves users even in case of goal unavailability situations by providing a dynamic and co-operative goal-setting framework. The proposed methodology is applicable for task-oriented dialogue systems where end-users prefer to decide their goal dynamically or have a flexible task goal. In this work, it has been trained and evaluated on a phone buying and selling use case, which can be extended to other domains and other tasks with minimal change, particularly intent, slot, and reward module. The key contributions of this work are as follows :

  • We aim to develop an intelligent and persona aware virtual agent that can deal with goal unavailability scenarios through a collaborative and persuasive approach.

  • An unified and a multi-agent (two agents: Agent1 is responsible for slot filling and dynamic goal setting whereas Agent2 assists an user in case of goal unavailability through proposing an identical goal) RL based dialogue policy has been presented for dynamic and collaborative goal setting.

  • A unique POMDP formulation with a diversified user simulator has been proposed. The proposed pseudo environment simulates both dynamic and collaborative goal setting.

  • A novel reward model that combines persona-based and sentiment-based rewards along with task-oriented reward has been utilized for training. The sentiment and persona-based rewards motivate the agent for user adaptable behavior that contributes a crucial role in user gratification.

  • To incorporate all these aspects, we created and annotated a dialogue corpus named Deviation adapted Persuasive Virtual Agent (DevPVA) because of unavailability of such corpus featured with dynamic and goal unavailability scenarios. The obtained results and its post-evaluation in a real environment illustrate that the proposed VA uplifts the transformation ratio (#Successful dialogue completion/#Dialogue initiation or sales/visitors) significantly. The other metrics such as average dialogue length and human score illustrate the correctness and gratification aspect of the VA.

Section 2 outlines the recent work related to the proposed problem followed by the motivation of this work. Section 3 describes the dataset which has been used in our experiment. The proposed method with its architecture has been explained and illustrated in Section 4. Section 5 presents the obtained results with the thorough analysis. Section 6 summarizes the work and highlights its limitations and future works.

Section snippets

Background

Since early 2000, most of the developed dialogue agents such as ELIZA (Weizenbaum, 1966) and ALICE (Shawar & Atwell, 2002) were rule-based systems which primarily utilize word/pattern matching techniques along with a set of pre-defined rules for response selection. The major problem with a rule-based system is the availability of huge number of rules and conflicting situations. It becomes hard to generalize a rule-based dialogue system even for a small problem. There are mainly two sub

Dataset

First, we probed existing relevant data corpora such as MultiWoz (Budzianowski et al., 2018), ATIS (Hemphill, Godfrey, & Doddington, 1990), bAbi (Bordes, Boureau, & Weston, 2016), cornell-movie corpus (Danescu-Niculescu-Mizil & Lee, 2011), Ubuntu dialogue corpus (Lowe, Pow, Serban, & Pineau, 2015), Deal or not (Lewis, Yarats, Dauphin, Parikh, & Batra, 2017). However, no suitable data corpus was found for the dynamic and goal unavailability scenarios, which could be adapted as it is for the

Methodology

The central acting component of a virtual agent is the dialogue manager that compromises of dialogue state and dialogue policy. We propose a multi-agent persona aware dialogue system for dynamic and co-operative goal setting, which elevates both user satisfaction as well as the agent’s effectiveness particularly transformation ratio/task success rate. We incorporate a Dynamic and Co-operative Goal Driven Module (DyCoGDM) with RL based dialogue manager. DyCoGDM strengthens the dialogue manager

Result and analysis

The most foremost concern for any task-oriented VA is to complete user tasks with gratification, i.e., manual evaluation (user satisfaction) is also required along with automatic evaluation. We evaluated the proposed VA with both automatic method and manual analysis. The following three most popularly used automatic metrics (Deriu et al., 2020, Peng et al., 2017, Weisz et al., 2018) have been used to assess the quality of the VA and also to compare the performance with other baselines.

  • 1.

    Learning

Conclusion and future work

In the real world, users may not have a rigid goal; however, existing dialogue agents terminate the dialogue unsuccessfully if they find any goal unavailability scenario. This paper presents two different persona aware dialogue policies namely, A Unified Persona aware Persuasive Dialogue Policy(UPPDP) and Multi-agent Persona aware Persuasive Dialogue Policy(MaPPDP) that elevate the capability of a virtual agent to deal with dynamic and cooperative goal setting. We proposed a novel reward model

CRediT authorship contribution statement

Abhisek Tiwari: Conceptualization, Data curation, Methodology, Validation, Formal analysis, Investigation, Visualization, Writing. Tulika Saha: Conceptualization, Supervision, Investigation. Sriparna Saha: Conceptualization, Supervision, Project administration. Shubhashis Sengupta: Conceptualization, Supervision, Project administration. Anutosh Maitra: Conceptualization, Supervision, Project administration. Roshni Ramnani: Conceptualization, Supervision, Project administration. Pushpak

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The research reported in this paper is an outcome of the project “Autonomous Goal-Oriented and Knowledge-Driven Neural Conversational Agents ” (Project No. IITP/2020/458), sponsored by Accenture LLP.

References (66)

  • ChenQ. et al.

    Bert for joint intent classification and slot filling

    (2019)
  • ChuT. et al.

    Multi-agent deep reinforcement learning for large-scale traffic signal control

    IEEE Transactions on Intelligent Transportation Systems

    (2019)
  • ChungJ. et al.

    Empirical evaluation of gated recurrent neural networks on sequence modeling

  • CuayáhuitlH.

    Simpleds: A simple deep reinforcement learning dialogue system

  • Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., & Zhou, M. (2017). Superagent: A customer service chatbot for...
  • DahiyaM.

    A tool of conversation: Chatbot

    International Journal of Computer Sciences and Engineering

    (2017)
  • Danescu-Niculescu-Mizil, C., & Lee, L. (2011). Chameleons in imagined conversations: A new approach to understanding...
  • DelecroixF. et al.

    A virtual selling agent which is persuasive and adaptive

  • DeriuJ. et al.

    Survey on evaluation methods for dialogue systems

    Artificial Intelligence Review

    (2020)
  • DietterichT.G.

    Hierarchical reinforcement learning with the MAXQ value function decomposition

    Journal of Artificial Intelligence Research

    (2000)
  • HemphillC.T. et al.

    The ATIS spoken language systems pilot corpus

  • Hiraoka, T., Neubig, G., Sakti, S., Toda, T., & Nakamura, S. (2014). Reinforcement learning of cooperative persuasive...
  • IlievskiV.

    Building advanced dialogue managers for goal-oriented dialogue systems

    ArXiv

    (2018)
  • Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. In...
  • KeizerS. et al.

    Evaluating persuasion strategies and deep reinforcement learning methods for negotiation dialogue agents

    (2017)
  • KhanM.M.

    Development of an e-commerce sales chatbot

  • KılıçS.

    Kappa testi.

    Journal of Mood Disorders

    (2015)
  • LeeD. et al.

    Neural basis of reinforcement learning and decision making

    Annual Review of Neuroscience

    (2012)
  • LevinE. et al.

    Learning dialogue strategies within the markov decision process framework

  • Lewis, M., Yarats, D., Dauphin, Y., Parikh, D., & Batra, D. (2017). Deal or No Deal? End-to-End Learning of Negotiation...
  • Li, X., Chen, Y.-N., Li, L., Gao, J., & Celikyilmaz, A. (2017). End-to-end task-completion neural dialogue systems. In...
  • LiX. et al.

    A user simulator for task-completion dialogues

    (2016)
  • Liao, L., Ma, Y., He, X., Hong, R., & Chua, T.-s. (2018). Knowledge-aware multimodal dialogue systems. In Proceedings...
  • Cited by (6)

    View full text