Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System

Hiraoka, Takuya; Neubig, Graham; Sakti, Sakriani; Toda, Tomoki; Nakamura, Satoshi

doi:10.1007/978-3-319-19291-8_15

Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System

Takuya Hiraoka⁵,
Graham Neubig⁵,
Sakriani Sakti⁵,
Tomoki Toda⁵ &
…
Satoshi Nakamura⁵

Chapter

1109 Accesses

Abstract

In this paper, we construct and evaluate a fully automated text-based cooperative persuasive dialogue system, which is able to persuade the user to take a specific action while maintaining user satisfaction. In our previous works, we created a dialogue management module for cooperative persuasive dialogue (Hiraoka et al., Reinforcement learning of cooperative persuasive dialogue policies using framing, Proceedings of international conference on computational linguistics (COLING), 2014), but only evaluated it in a wizard-of-Oz setting, as we did not have the capacity for natural language generation (NLG) and natural language understanding (NLU). In this work, the main technical contribution is the design of the NLU and the NLG modules which allows us to remove this bottleneck and create the first fully automatic cooperative persuasive dialogue system. Based on this system, we performed an evaluation with real users. Experimental results indicate that the learned policy is able to effectively persuade the users: the reward of the proposed model is much higher than baselines and almost the same as a dialogue manager controlled by a human. This tendency is almost the same as our previous evaluation using a wizard-of-Oz framework (Hiraoka et al., Reinforcement learning of cooperative persuasive dialogue policies using framing, Proceedings of international conference on computational linguistics (COLING), 2014), demonstrates that the proposed NLU and NLG modules are effective for cooperative persuasive dialogue.

Download chapter PDF

15.1 Introduction

There is ongoing research on applying reinforcement learning to persuasion and negotiation dialogues, which are different from traditional task-based dialogues (Georgila and Traum 2011; Georgila 2013; Paruchuri et al. 2009; Heeman 2009). In task-based dialogue, the system is required to perform the dialogue to achieve the user goal, eliciting some information from the user to provide an appropriate service. A reward corresponding to the achievement of the user’s goal is given to the system. In contrast, in persuasive dialogue, the system convinces the user to take some action achieving a system goal, for example buying a particular product or agreeing to a particular plan (Georgila 2013). In previous work, we have proposed the paradigm of cooperative persuasive dialogue (Hiraoka et al. 2014b, 2013), where reward corresponding to the achievement of both the user’s and the system’s goal is given to the system. This paradigm is useful in situations where the user and the system have different, but not mutually exclusive, goals, an example of which being a sales situation where the user wants to find a product that matches their taste and the system wants to successfully sell a product, ideally one with a higher profit margin.

In previous reports, we have applied reinforcement learning to cooperative persuasive dialogue and evaluated the learned policy in a wizard-of-Oz setting (Hiraoka et al. 2014b). We modeled the cooperative dialogue based on partially observable Markov decision processes (POMDP), and system policies were learned with reinforcement learning. We introduced framing (Irwin et al. 2013), description of alternatives with emotionally charged words, as a system action. In this previous work, we evaluated the learnt policy by substituting a human wizard of Oz for natural language understanding (NLU) and the natural language generation modules (NLG). In this evaluation framework, the result of the evaluation is highly dependent on the ability of the human wizard, and the effect of NLU and NLG is discounted, potentially overstating the effectiveness of the system.

In this paper, we construct and evaluate the first fully automated text-based cooperative persuasive dialogue system. At first, we give a review of our previous research (Hiraoka et al. 2014a,b) about learning cooperative persuasive policies and then explain new modifications to the dialogue modeling, the newly implemented NLU and NLG models, and the evaluation. Experimental results indicate that the learned policy with framing is effective, even in a fully automatic system. The reward of the learnt policy with framing is much higher than baselines (a policy without framing and a random policy) and almost the same as a policy controlled by a human. This tendency is almost the same as the result of our previous research using the wizard-of-Oz framework (Hiraoka et al. 2014b).

15.2 Cooperative Persuasive Dialogue Corpus

In this section, we give a brief overview of cooperative persuasive dialogue and a human dialogue corpus that we use to construct the dialogue models and dialogue system described in later sections. In our collected persuasive dialogue corpus (Sect. 15.2.1), we define and quantify the actions of the cooperative persuader (Sect. 15.2.2). In addition, we annotate persuasive dialogue acts of the persuader from the point of view of framing (Sect. 15.2.3).

15.2.1 Persuasive Dialogue Corpus

The cooperative persuasive dialogue corpus (Hiraoka et al. 2014a) consists of dialogues between a salesperson (persuader) and customer (persuadee) as a typical example of persuasive dialogue. The salesperson attempts to convince the customer to purchase a particular product (decision) from a number of alternatives (decision candidates). More concretely, the corpus assumes a situation where the customer is in an appliance store looking for a camera, and the customer must decide which camera to purchase from five alternatives.

Prior to recording, the salesperson is given the description of the five cameras and instructed to try to convince the customer to purchase a specific camera (the persuasive target). In this corpus, the persuasive target is camera A, and this persuasive target is invariant over all subjects. The customer is also instructed to select one preferred camera from the catalog of the cameras,^{Footnote 1} and choose one aspect of the camera that is particularly important in making their decision (the determinant). During recording, the customer and the salesperson converse and refer to the information in the camera catalog as support for their dialogues. The customer can close the dialogue whenever they want, and choose to buy a camera, not buy a camera, or reserve their decision for a later date. The total number of dialogues is 34, and the total time is about 340 min.

15.2.2 Annotation of Persuader and Persuadee Goals

We define the cooperative persuader as a persuader who achieves both the persuader and persuadee goals and cooperative persuasive dialogue as a dialogue where both the persuader and persuadee goals have been achieved. To measure the salesperson’s success as a cooperative persuader, we annotate each dialogue with scores corresponding to the achievement of the two participants’ goals. As the persuader’s goal, we use persuasive success measured by whether the persuadee’s final decision (purchased camera) is the persuasive target or not. As the persuadees goal, we use the persuadee’s subjective satisfaction as measured by results of a questionnaire filled out by the persuadee at the end of the dialogue (1: not satisfied; 3: neutral; 5: satisfied). Note that we assume a situation that is not a zero-sum game, and thus the persuader and persuadee goals are not mutually exclusive.

15.2.3 Annotation of Dialogue Acts

15.2.3.1 Framing

Framing is the use of emotionally charged words to explain particular alternatives and is known as an effective way of increasing persuasive power. The corpus contains tags of all instances of negative/positive framing (Irwin et al. 2013; Mazzotta and de Rosis 2006), with negative framing using negative words and positive framing using positive words.

The framing tags are defined as a tuple $\langle a,p,r\rangle$ where a represents the target alternative, p takes value neg if the framing is negative and pos if the framing is positive, and r is a binary variable indicating whether or not the framing contains a reference to the determinant that the persuadee indicated was most important (for example, the performance or price of a camera). The user’s preferred determinant is annotated based on the results of the pre-dialogue questionnaire.

Table 15.1 shows an example of positive framing (p = pos) about the performance of Camera A (a = a). In this example, the customer answered that his preference is the price of camera, and this utterance does not contain any description of price. Thus, r = no is annotated.

Table 15.1 An example of positive framing

Full size table

15.2.3.2 General Purpose Functions (GPF)

The corpus also contains tags for traditional dialogue acts. As a tag set to represent traditional dialogue acts, we use the general-purpose functions (GPF) defined by the ISO international standard for dialogue act annotation (ISO24617-2 2010). All annotated GPF tags are defined to be one of the tags in this set.

15.3 Cooperative Persuasive Dialogue Modeling

The cooperative persuasive dialogue model proposed in our previous research (Hiraoka et al. 2014b) consists of a user-side dialogue model (Sect. 15.3.1) and a system-side model (Sect. 15.3.2).

15.3.1 User Simulator

The user simulator estimates two aspects of the conversation:

1.
The user’s dialogue acts
2.
Whether the preferred determinant has been conveyed to the user (conveyed preferred determinant; CPD)

The user’s dialogue acts are represented by using GPFs (e.g., question, answer, and inform). In our research, the user simulator chooses one GPF or None representing no response at each turn. CPD represents that the user has been convinced that the determinant in the persuader’s framing satisfies the user’s preference. For example, in Table 15.1, “performance” is contained in the salesperson’s positive framing for camera A. If the persuadee is convinced that the decision candidate satisfies his/her preference based on this framing, we say that CPD has occurred (r=yes). In our research, the user simulator models CPD for each of the five cameras. This information is required to calculate reward described in Sect. 15.3.2. Specifically, GPF and CPD are used for calculating naturalness and persuasive success, which are elements of the reward function.

The user’s GPF G _user ^t+1 and CPD $C_{\mathrm{alt}}^{t+1}$ at turn t + 1 are calculated by the following probabilities:

$$\displaystyle\begin{array}{rcl} P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{ user}}^{t},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}}),& &{}\end{array}$$

(15.1)

$$\displaystyle\begin{array}{rcl} P(C_{\mathrm{alt}}^{t+1}\vert C_{\mathrm{ alt}}^{t},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}}).& &{}\end{array}$$

(15.2)

G _sys ^t represents the system GPF at time t and F _sys ^t represents the system framing at t. These variables correspond to system actions, and are explained in Sect. 15.3.2. $G_{\mathrm{user}}^{t}$ represents the user’s GPF at t, $C_{\mathrm{alt}}^{t}$ represents the CPD at t, and U _eval represents the users’s original evaluation of the alternatives.^{Footnote 2} In our research, this is the camera selected by the user as preferred at the beginning of the dialogue. We use the persuasive dialogue corpus described in Sect. 15.2.1 for training the user simulator, considering the customer in the corpus as the user and the salesperson in the corpus as the system. We use logistic regression for learning Eqs. (15.1) and (15.2).

15.3.2 Dialogue Modeling: Learning Cooperative Persuasion Policies

For training the dialogue system using reinforcement learning, in addition to the user simulator, the reward, system actions, and belief state are required (Williams and Young 2007).

Reward is calculated using three factors: user satisfaction, system persuasive success, and naturalness. As described in Sect. 15.1, cooperative persuasive dialogue systems must perform dialogue to achieve both the system and user goals. Thus, reward at each turn t is calculated with the following equation:

$$\displaystyle\begin{array}{rcl} r_{t}& =& (\mathrm{Sat}_{\mathrm{user}}^{t} + \mathrm{PS}_{\mathrm{ sys}}^{t} + N^{t})/3.{}\end{array}$$

(15.3)

$\mathrm{Sat}_{\mathrm{user}}^{t}$ represents a five level score of the user’s subjective satisfaction (1: not satisfied; 3: neutral; 5: satisfied) at turn t scaled into the range between 0 and 1. $\mathrm{PS}_{\mathrm{sys}}^{t}$ represents persuasive success (1: success; 0: failure) at turn t. N _t represents bi-gram likelihood of the dialogue between the system and user at turn t. Sat and PS are calculated with a predictive model constructed from the corpus described in Sect. 15.2.1 (Hiraoka et al. 2014a).

The system action $\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle$ is a GPF/framing $\langle a,p\rangle$ pair representing the dialogue act of the salesperson. We construct a unigram model of the salesperson’s dialogue acts $P(G_{\mathrm{sales}},F_{\mathrm{sales}})$ from the original corpus, then exclude pairs for which the likelihood is below 0. 005. As a result, we use the remaining 13 pairs as system actions.

The belief state is represented by the features used for reward calculation (Table 15.2) and the reward calculated at previous turn. Note that of the 8 features used for reward calculation, only C _alt cannot be directly observed from the system action or NLU results, and thus the system estimates it through the dialogue by using the following probability:

$$\displaystyle\begin{array}{rcl} \sum _{\widehat{C_{\mathrm{alt}}^{t}}}P(\widehat{C_{\mathrm{alt}}^{t+1}}\vert \widehat{C_{\mathrm{ alt}}^{t}},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}})P(\widehat{C_{\mathrm{alt}}^{t}}),& &{}\end{array}$$

(15.4)

where $\widehat{C_{\mathrm{alt}}^{t+1}}$ represents the estimated CPD at t + 1, $\widehat{C_{\mathrm{alt}}^{t}}$ represents the estimated CPD at t, and the other variables are the same as those in Eq. (15.2).

Table 15.2 Features for calculating reward. These features are also used as the system belief state

Full size table

15.4 Modifications of the Cooperative Persuasive Dialogue Model

In this paper, we further propose two modifications to the cooperative dialogue models described in Sect. 15.3: (1) considering NLU recognition errors in the belief state, and (2) normalization of reward factors.

15.4.1 Considering NLU Recognition Errors

In the cooperative dialogue model in Sect. 15.3, we are not considering recognition errors of the NLU module. In previous research (Hiraoka et al. 2014b), we evaluated the policies based on the wizard of Oz, where a human was substituted for the NLU module, precluding the use of estimation methods used in ordinary POMDP-based dialogue systems (Williams and Young 2007). However, in this paper, we use a fully automatic NLU module, which might cause recognition errors, and thus some method for recovery is needed.

In this work, we modify the dialogue model to consider NLU recognition errors, incorporating estimation of the true user dialogue act (i.e., GPF) into the dialogue model. The estimation is performed according to the following equation:

$$\displaystyle\begin{array}{rcl} P(G_{\mathrm{user}}^{t+1}\vert H_{ G_{\mathrm{user}}})& =& \frac{\sum _{G_{\mathrm{user}}^{t}}P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})P(G_{\mathrm{user}}^{t})} {\sum _{G_{\mathrm{user}}^{t+1}}\sum _{G_{\mathrm{user}}^{t}}P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})P(G_{\mathrm{user}}^{t})}.\qquad {}\end{array}$$

(15.5)

H _user represents the NLU result (described in Sect. 15.5.1) at t, and other variables are the same as those in Eqs. (15.1) and (15.2). $P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})$ represents a confusion matrix between the actual GPF and recognition result. To construct the confusion matrix, in Sect. 15.6.1, we perform an evaluation of NLU and use the confusion matrix from this evaluation for the estimation of Eq. (15.5). $P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})$ is calculated using maximum likelihood estimation over the persuasive dialogue corpus described in Sect. 15.2.1.

15.4.2 Normalization of the Reward Factors

The reward function in Sect. 15.3.2 considers three factors: persuasive success, user satisfaction, and naturalness. In the current phase of our research, we have no evidence that one of these factors is more important than the other for cooperative persuasive dialogue and thus would like to treat them as equally important. However, in Eq. (15.3) the scales (i.e., the standard deviation) of factors are different, and thus factors with a larger scale are considered as relatively important, and other factors are considered as relatively unimportant. For example, in our previous research (Hiraoka et al. 2014b), the scale of naturalness N is smaller than other factors and as a result is largely ignored in the learning.

In this work, we fix this problem by equalizing the importance of reward factors through normalization with z-score. More concretely, the reward function of Eq. (15.3) is substituted with the following reward function:

$$\displaystyle\begin{array}{rcl} r_{t}^{'}& =& \frac{\mathrm{Sat}_{\mathrm{user}}^{t} -\overline{\mathrm{Sat}_{\mathrm{ user}}^{t}}} {\mathrm{Stddev}(\mathrm{Sat}_{\mathrm{user}})} + \frac{\mathrm{PS}_{\mathrm{sys}}^{t} -\overline{\mathrm{PS}_{\mathrm{sys}}}} {\mathrm{Stddev}(\mathrm{PS}_{\mathrm{sys}})} + \frac{N^{t} -\overline{N}} {\mathrm{Stddev}(N)},{}\end{array}$$

(15.6)

where variables with a bar represent the mean of variables without a bar, and the Stddev function represents the standard deviation of the argument. These statistics are calculated from simulated dialogue with the proposed dialogue model in the previous section, where actions are chosen randomly. We sampled the reward factor for 60,000 turns of the simulated dialogue (about 6000 dialogues) for calculating the statistics of each variable.

15.5 Text-Based Cooperative Persuasive Dialogue System

The main contribution of this paper is the construction of a fully automated text-based cooperative persuasive dialogue system. The structure of the system is shown in Fig. 15.1. In this section, we describe the construction of NLU (Sect. 15.5.1) and NLG (Sect. 15.5.2) modules that act as an interface between the policy module and the human user and are necessary for fully automatic dialogue.

15.5.1 Natural Language Understanding

The NLU module detects the GPF in the user’s text input u _user using a statistical classifier. In this paper, we use bagging, using decision trees as the weak classifier (Breiman 1996). We require the NLU to (1) be simple and (2) output the estimated classes with probability, and bagging with decision trees satisfies these requirements. The NLU uses many features (i.e., word frequency), and decision trees can select a small number of effective features, making a simple classifier. In addition, by using bagging, the confidence probability, which is determined by the voting rate of decision trees, can be attached to the classification result. We utilize Mark (2009) for constructing the bagging classifier.

As input to the classifier, we use features calculated from u _user and the history of system outputs (u _sys, $\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle$). Features are mainly categorized into four types:

Uni: :: Unigram word frequency in the user’s input
Bi: :: Bigram word frequency in the user’s input
DAcl: :: The previous action of the system (i.e., GPF/framing pairs $\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle$)
Unicl: :: Unigram word frequency in the previous system utterance

As we use Japanese as our target language, we perform morphological analysis using Mecab (Kudo et al. 2004) and use information about the normal form of the word and part of speech to identify the word.

As the NLU result $H_{G_{\mathrm{user}}}$, eight types of GPF are output with membership probabilities. We use 694 customer utterances in the camera sales corpus (Sect. 15.2) as training data. In this training data, eight types of GPF labels are distributed as shown in Table 15.3.

Table 15.3 Distribution of the GPF labels in the training data

Full size table

15.5.2 Natural Language Generation

The NLG module outputs a system response u _sys based on the user’s input u _user, the system’s previous utterance u _sys ^′, and the system action $\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle$. Though the dialogue assumed in this paper is focusing on a restricted situation, it is still not trivial to create system responses for various inputs. In order to avoid the large amount of engineering required for template-based NLG and allow for rapid prototyping, we decide to use the framework of example-based dialogue management (Lee et al. 2009).

We construct an example database $D =\{ d_{1},d_{2},\ldots,d_{M}\}$ with M utterances by modifying the human persuasive dialogue corpus of Sect. 15.2. In the example database, the ith datum $d_{i} =\langle s,u,g,f,p\rangle$ consists of the speaker s, utterance u, GPF g, framing flag f, and previous datum p. In modifying the human persuasive dialogue corpus, we manually make the following corrections:

Deletion of redundant words and sentences (e.g., fillers and restatements)
Insertion of omitted words (e.g., subjects or objects) and sentences

Our example database consists of 2022 utterances (695 system utterances and 1327 user example utterances). An example of the database is shown in Table 15.4.

Table 15.4 Part of the example database. The words surrounded by < > are inserted in correction

Full size table

The NLG module determines the system response u _sys based on u _user, u _sys ^′, and $\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle$. More concretely, our NLG modules performs the following procedure:

1.
We define the response candidate set R according to whether there is user input ($u_{\mathrm{user}}\neq \phi$) or not (u _user = ϕ). If u _user ≠ ϕ, then we define R as the set of utterances r for which the previous utterance is a user utterance (r. p. s = User). Conversely, if u _user = ϕ, then we define R so r. p. s = Sys.^{Footnote 3}
2.
Response candidates R are scored based on the following similarity score:
$$\displaystyle\begin{array}{rcl} \mathrm{cos}(r.p.u,u_{\mathrm{input}})& =& \frac{\mathrm{words}(r.p.u) \cdot \mathrm{words}(u_{\mathrm{input}})} {\mid \mathrm{words}(r.p.u)\mid \cdot \mid \mathrm{words}(u_{\mathrm{input}})\mid }, {}\end{array}$$
(15.7)

$$\displaystyle{u_{\mathrm{input}} = \left \{\begin{array}{@{}l@{\quad }l@{}} \ u_{\mathrm{sys}}^{{\prime}}\quad &(u_{\mathrm{ user}} =\phi ), \\ \ u_{\mathrm{user}}\quad &(u_{\mathrm{user}}\neq \phi ).\end{array} \right.}$$
The cosine similarity cos between the previous utterance of the response sentence candidate r. p. u(r ∈ R) and input sentence u _input is used for the scoring. u _input is set as u _sys ^′ or u _user depending on u _user. The words function returns the frequency vector of the content words (i.e., nouns, verbs, and adjectives) weighted according to tf-idf.
3.
The r ^∗. u that has the highest score is selected as the output of the NLG module u _sys
$$\displaystyle\begin{array}{rcl} r^{{\ast}}& =& \mathop{\mathrm{arg\ max}}\limits _{ r\in R}\mathrm{cos}(r.p.u,u_{\mathrm{input}}), {}\end{array}$$
(15.8)

$$\displaystyle\begin{array}{rcl} u_{\mathrm{sys}}& =& r^{{\ast}}.u. {}\end{array}$$
(15.9)

15.6 Experimental Results

In this section, we perform two forms of experimental evaluation. First, as a preliminary experiment, we evaluate the performance of the NLU module proposed in Sect. 15.5.1. Then, we evaluate the fully automatic persuasive dialogue system.

15.6.1 Evaluation for NLU Using Different Feature Sets

First, we evaluate the performance of the NLU module using different feature sets proposed in Sect. 15.5.1. We prepare four patterns of feature sets (Uni, Uni+DAcl, Uni+CAcl+Unicl, and Uni+CAcl+Bi) and evaluate the recognition accuracy of GPF labels in the customer’s utterances. The evaluation is performed based on 15-fold cross-validation with 694 customer utterances described in Sect. 15.5.1.

From the experimental result (Fig. 15.2), we can see that NLU with Uni+CAcl+Bi achieves the highest accuracy, and thus we decided to use Uni+CAcl+Bi for NLU of the dialogue system in the next section. Focusing on the details of the misclassified GPFs, we show the confusion matrix for classification results of the NLU module with Uni+CAcl+Bi in Table 15.5. From this matrix, we can see that Answer is misclassified to Inform and that SetQ and Question are misclassified into PropositionalQ. This result indicates that this module has difficulty in distinguishing dialogue acts in a hypernym/hyponym or sibling relationship.

Table 15.5 The confusion matrix

Full size table

15.6.2 Complete System Evaluation

In this section, we describe the results of the first user study evaluating fully automated cooperative persuasive dialogue systems. For evaluation, we prepare the following four policies.

Random: :: A baseline where the action is randomly output from all possible actions.
NoFraming: :: A baseline where the action is output based on the policy which is learned using only GPFs. For constructing the actions, we remove actions whose framing is not None from the actions described in Sect. 15.3.2. The policy is a greedy policy and selects the action with the highest score.
Framing: :: The proposed method where the action is output based on the policy learned with all actions described in Sect. 15.3.2 including framing. The policy is also a greedy policy.
Human: :: An oracle where the action is output based on human selection. In this research, the first author (who has no formal sales experience, but with experience of about 1 year in the analysis of camera sales dialogue) selects the action.

For learning the policies (i.e., NoFraming and Framing), we use Neural fitted Q Iteration (NFQ) (Riedmiller 2005). For applying NFQ, we use the Pybrain library (Schaul et al. 2010). The learning conditions follow the default Pybrain settings. We consider 3000 dialogues as one epoch and update the parameters of the neural network at each epoch. Learning is finished when the number of epochs reaches 20 (60,000 dialogues), and the policy with the highest average reward is used for evaluation.

We evaluate policies on the basis of average reward and correct response rate of dialogues with real users. The definition of the reward is described in Sect. 15.3.2, and the correct response rate is the ratio of correct system responses to all system responses. In the experiment, the dialogue system plays the salesperson, and the user plays the customer. At the end of the dialogue, to calculate the reward, the user answers the following questionnaire:

Satisfaction: :: The user’s subjective satisfaction defined as a 5 level score of customer satisfaction (1: not satisfied; 3: neutral; 5: satisfied).
Final decision: :: The camera that the user finally wants to buy.

In addition, to calculate the correct response rate, we have the user annotate information regarding whether each system response is correct or not. Thirteen users perform one dialogue with the system obeying each policy (a total of four dialogues per user).

Experimental results for the reward are shown in Fig. 15.3. From these results, we can see that the reward of Framing is higher than that of NoFraming and Random and almost equal to Human. This indicates that learning a policy with framing is effective in a fully automatic text-based cooperative dialogue system. It is interesting to note that the tendency of those scores is almost the same as those of the wizard-of-Oz based experiment (Hiraoka et al. 2014b). The exception is that the naturalness of Framing in this experiment is higher than that of the wizard-of-Oz based experiment. Our hypothesis about the reason for this difference is that this is due to the effect of the modification of reward factors. In Sect. 15.4.2, we modified the importances of reward factors to be considered equally in learning the policy. Therefore, in the learning, naturalness is considered as an important factor, resulting in an increase of the naturalness score of Framing. It should be noted, however, that most of the subjects are different from the wizard-of-Oz based experiment we performed in previous work (Hiraoka et al. 2014b), and this might also affect the experimental result.

Experimental results for the correct response rate (Fig. 15.4) indicate that our cooperative persuasive dialogue system somewhat correctly responds to the user’s input. The scores of all policies are higher than 70 %, and the score of Framing is about 77 %. In addition, even the Random policy achieves a score of about 70 %. One of the reasons for this is that NLG method used by our system (Sect. 15.5.2) is based on examples and thus is able to return natural responses that will only be judged as incorrect if they do not match the context.

15.7 Conclusion

In this paper, we presented a method for construction of a fully automatic cooperative persuasive dialogue system. Particularly, we focused on modifications to the policy learning and construction of NLU and NLG modules. We performed an evaluation of the constructed dialogue system with real users. Experimental results indicated that the proposed system is effective in text-based cooperative dialogue systems and that the tendency of each reward is almost the same as results of our previous research (Hiraoka et al. 2014b).

In the future, we plan to evaluate the system policies in more realistic situations that move beyond role-playing to real sales situations over more broad domains. We also plan to consider nonverbal information for estimating persuasive success and user satisfaction.

Notes

1.
The salesperson is not told this information about customer preferences.
2.
Values of these variables are set at the beginning of dialogue and invariant over the dialogue.
3.
In this paper, we use “. ” for representing the membership relation between variables. For example, $\mathrm{Var}1.\mathrm{Var}2$ means that Var2 is a member variable of Var1.

References

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Georgila K (2013) Reinforcement learning of two-issue negotiation dialogue policies. In: Proceedings of the special interest group on discourse and dialogue (SIGDIAL)
Google Scholar
Georgila K, Traum D (2011) Reinforcement learning of argumentation dialogue policies in negotiation. In: Proceedings of international speech (INTERSPEECH)
Google Scholar
Heeman PA (2009) Representing the reinforcement learning state in a negotiation dialogue. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU)
Google Scholar
Hiraoka T, Yamauchi Y, Neubig G, Sakti S, Toda T, Nakamura S (2013) Dialogue management for leading the conversation in persuasive dialogue systems. In: Proceedings of IEEE automatic speech recognition and understanding workshop (ASRU)
Google Scholar
Hiraoka T, Neubig G, Sakti S, Toda T, Nakamura S (2014a) Construction and analysis of a persuasive dialogue corpus. In: Proceedings of the international workshop on spoken dialog systems (IWSDS)
Google Scholar
Hiraoka T, Neubig G, Sakti S, Toda T, Nakamura S (2014b) Reinforcement learning of cooperative persuasive dialogue policies using framing. In: Proceedings of international conference on computational linguistics (COLING)
Google Scholar
Irwin L, Schneider SL, Gaeth GJ (2013) All frames are not created equal: a typology and critical analysis of framing effects. Organ Behav Hum Decis Process 76(2):149–188
Google Scholar
ISO24617-2: Language resource management-Semantic annotation frame work (SemAF). Part2: Dialogue acts. ISO (2010)
Google Scholar
Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 707–710
Google Scholar
Lee C, Jung S, Kim S, Lee GG (2009) Example-based dialog modeling for practical multi-domain dialog system. Speech Commun 51(5):466–484
Article Google Scholar
Mark H, Eibe F, Geoffrey H, Bernhard P, Peter R, Ian HW (2009) The WEKA Data Mining Software: An Update; SIGKDD Explorations, 11(1)
Google Scholar
Mazzotta I, de Rosis F (2006) Artifices for persuading to improve eating habits. In: AAAI spring symposium: argumentation for consumers of healthcare
Google Scholar
Paruchuri P, Chakraborty N, Zivan R, Sycara K, Dudik M, Gordon G (2009) POMDP based negotiation modeling. In: Proceedings of the first MICON (modeling intercultural collaboration and negotiation), pp 66–78
Google Scholar
Riedmiller M (2005) Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L (eds) Machine learning: ECML. Springer, Berlin
Google Scholar
Schaul T, Bayer J, Wierstra D, Sun Y, Felder M, Sehnke F, Ruckstiess T, Schmidhuber J (2010) Pybrain. J Mach Learn Res 11:743–746
Google Scholar
Williams JD, Young S (2007) Partially observable Markov decision processes for spoken dialog systems. Comput Speech Lang 21(2):393–422
Article Google Scholar

Download references

Author information

Authors and Affiliations

Nara Institute of Science and Technology, Ikoma, Nara, Japan
Takuya Hiraoka, Graham Neubig, Sakriani Sakti, Tomoki Toda & Satoshi Nakamura

Authors

Takuya Hiraoka
View author publications
You can also search for this author in PubMed Google Scholar
Graham Neubig
View author publications
You can also search for this author in PubMed Google Scholar
Sakriani Sakti
View author publications
You can also search for this author in PubMed Google Scholar
Tomoki Toda
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuya Hiraoka .

Editor information

Editors and Affiliations

Department of Computer Science and Engin, Pohang University of Science & Tech, Namgu, Pohang, Korea (Republic of)
G.G. Lee
School of Information and Communications, Gwangju Institute of Science and Tech, Buk-gu, Gwangju, Korea (Republic of)
H.K. Kim
Microsoft Corporation, Redmond, Washington, USA
M. Jeong
Dept of Computer Science and Engineering, Sogang University, Mapo-gu, Seoul, Korea (Republic of)
J.-H. Kim

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hiraoka, T., Neubig, G., Sakti, S., Toda, T., Nakamura, S. (2015). Evaluation of a Fully Automatic Cooperative Persuasive Dialogue System. In: Lee, G., Kim, H., Jeong, M., Kim, JH. (eds) Natural Language Dialog Systems and Intelligent Assistants. Springer, Cham. https://doi.org/10.1007/978-3-319-19291-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-19291-8_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19290-1
Online ISBN: 978-3-319-19291-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

15.1 Introduction

15.2 Cooperative Persuasive Dialogue Corpus

15.2.1 Persuasive Dialogue Corpus

15.2.2 Annotation of Persuader and Persuadee Goals

15.2.3 Annotation of Dialogue Acts

15.2.3.1 Framing

15.2.3.2 General Purpose Functions (GPF)

15.3 Cooperative Persuasive Dialogue Modeling

15.3.1 User Simulator

15.3.2 Dialogue Modeling: Learning Cooperative Persuasion Policies

15.4 Modifications of the Cooperative Persuasive Dialogue Model

15.4.1 Considering NLU Recognition Errors

15.4.2 Normalization of the Reward Factors

15.5 Text-Based Cooperative Persuasive Dialogue System

15.5.1 Natural Language Understanding

15.5.2 Natural Language Generation

15.6 Experimental Results

15.6.1 Evaluation for NLU Using Different Feature Sets

15.6.2 Complete System Evaluation

15.7 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation