15.1 Introduction

There is ongoing research on applying reinforcement learning to persuasion and negotiation dialogues, which are different from traditional task-based dialogues (Georgila and Traum 2011; Georgila 2013; Paruchuri et al. 2009; Heeman 2009). In task-based dialogue, the system is required to perform the dialogue to achieve the user goal, eliciting some information from the user to provide an appropriate service. A reward corresponding to the achievement of the user’s goal is given to the system. In contrast, in persuasive dialogue, the system convinces the user to take some action achieving a system goal, for example buying a particular product or agreeing to a particular plan (Georgila 2013). In previous work, we have proposed the paradigm of cooperative persuasive dialogue (Hiraoka et al. 2014b2013), where reward corresponding to the achievement of both the user’s and the system’s goal is given to the system. This paradigm is useful in situations where the user and the system have different, but not mutually exclusive, goals, an example of which being a sales situation where the user wants to find a product that matches their taste and the system wants to successfully sell a product, ideally one with a higher profit margin.

In previous reports, we have applied reinforcement learning to cooperative persuasive dialogue and evaluated the learned policy in a wizard-of-Oz setting (Hiraoka et al. 2014b). We modeled the cooperative dialogue based on partially observable Markov decision processes (POMDP), and system policies were learned with reinforcement learning. We introduced framing (Irwin et al. 2013), description of alternatives with emotionally charged words, as a system action. In this previous work, we evaluated the learnt policy by substituting a human wizard of Oz for natural language understanding (NLU) and the natural language generation modules (NLG). In this evaluation framework, the result of the evaluation is highly dependent on the ability of the human wizard, and the effect of NLU and NLG is discounted, potentially overstating the effectiveness of the system.

In this paper, we construct and evaluate the first fully automated text-based cooperative persuasive dialogue system. At first, we give a review of our previous research (Hiraoka et al. 2014a,b) about learning cooperative persuasive policies and then explain new modifications to the dialogue modeling, the newly implemented NLU and NLG models, and the evaluation. Experimental results indicate that the learned policy with framing is effective, even in a fully automatic system. The reward of the learnt policy with framing is much higher than baselines (a policy without framing and a random policy) and almost the same as a policy controlled by a human. This tendency is almost the same as the result of our previous research using the wizard-of-Oz framework (Hiraoka et al. 2014b).

15.2 Cooperative Persuasive Dialogue Corpus

In this section, we give a brief overview of cooperative persuasive dialogue and a human dialogue corpus that we use to construct the dialogue models and dialogue system described in later sections. In our collected persuasive dialogue corpus (Sect. 15.2.1), we define and quantify the actions of the cooperative persuader (Sect. 15.2.2). In addition, we annotate persuasive dialogue acts of the persuader from the point of view of framing (Sect. 15.2.3).

15.2.1 Persuasive Dialogue Corpus

The cooperative persuasive dialogue corpus (Hiraoka et al. 2014a) consists of dialogues between a salesperson (persuader) and customer (persuadee) as a typical example of persuasive dialogue. The salesperson attempts to convince the customer to purchase a particular product (decision) from a number of alternatives (decision candidates). More concretely, the corpus assumes a situation where the customer is in an appliance store looking for a camera, and the customer must decide which camera to purchase from five alternatives.

Prior to recording, the salesperson is given the description of the five cameras and instructed to try to convince the customer to purchase a specific camera (the persuasive target). In this corpus, the persuasive target is camera A, and this persuasive target is invariant over all subjects. The customer is also instructed to select one preferred camera from the catalog of the cameras,Footnote 1 and choose one aspect of the camera that is particularly important in making their decision (the determinant). During recording, the customer and the salesperson converse and refer to the information in the camera catalog as support for their dialogues. The customer can close the dialogue whenever they want, and choose to buy a camera, not buy a camera, or reserve their decision for a later date. The total number of dialogues is 34, and the total time is about 340 min.

15.2.2 Annotation of Persuader and Persuadee Goals

We define the cooperative persuader as a persuader who achieves both the persuader and persuadee goals and cooperative persuasive dialogue as a dialogue where both the persuader and persuadee goals have been achieved. To measure the salesperson’s success as a cooperative persuader, we annotate each dialogue with scores corresponding to the achievement of the two participants’ goals. As the persuader’s goal, we use persuasive success measured by whether the persuadee’s final decision (purchased camera) is the persuasive target or not. As the persuadees goal, we use the persuadee’s subjective satisfaction as measured by results of a questionnaire filled out by the persuadee at the end of the dialogue (1: not satisfied; 3: neutral; 5: satisfied). Note that we assume a situation that is not a zero-sum game, and thus the persuader and persuadee goals are not mutually exclusive.

15.2.3 Annotation of Dialogue Acts

15.2.3.1 Framing

Framing is the use of emotionally charged words to explain particular alternatives and is known as an effective way of increasing persuasive power. The corpus contains tags of all instances of negative/positive framing (Irwin et al. 2013; Mazzotta and de Rosis 2006), with negative framing using negative words and positive framing using positive words.

The framing tags are defined as a tuple \(\langle a,p,r\rangle\) where a represents the target alternative, p takes value neg if the framing is negative and pos if the framing is positive, and r is a binary variable indicating whether or not the framing contains a reference to the determinant that the persuadee indicated was most important (for example, the performance or price of a camera). The user’s preferred determinant is annotated based on the results of the pre-dialogue questionnaire.

Table 15.1 shows an example of positive framing (p = pos) about the performance of Camera A (a = a). In this example, the customer answered that his preference is the price of camera, and this utterance does not contain any description of price. Thus, r = no is annotated.

Table 15.1 An example of positive framing

15.2.3.2 General Purpose Functions (GPF)

The corpus also contains tags for traditional dialogue acts. As a tag set to represent traditional dialogue acts, we use the general-purpose functions (GPF) defined by the ISO international standard for dialogue act annotation (ISO24617-2 2010). All annotated GPF tags are defined to be one of the tags in this set.

15.3 Cooperative Persuasive Dialogue Modeling

The cooperative persuasive dialogue model proposed in our previous research (Hiraoka et al. 2014b) consists of a user-side dialogue model (Sect. 15.3.1) and a system-side model (Sect. 15.3.2).

15.3.1 User Simulator

The user simulator estimates two aspects of the conversation:

  1. 1.

    The user’s dialogue acts

  2. 2.

    Whether the preferred determinant has been conveyed to the user (conveyed preferred determinant; CPD)

The user’s dialogue acts are represented by using GPFs (e.g., question, answer, and inform). In our research, the user simulator chooses one GPF or None representing no response at each turn. CPD represents that the user has been convinced that the determinant in the persuader’s framing satisfies the user’s preference. For example, in Table 15.1, “performance” is contained in the salesperson’s positive framing for camera A. If the persuadee is convinced that the decision candidate satisfies his/her preference based on this framing, we say that CPD has occurred (r=yes). In our research, the user simulator models CPD for each of the five cameras. This information is required to calculate reward described in Sect. 15.3.2. Specifically, GPF and CPD are used for calculating naturalness and persuasive success, which are elements of the reward function.

The user’s GPF G user t+1 and CPD \(C_{\mathrm{alt}}^{t+1}\) at turn t + 1 are calculated by the following probabilities:

$$\displaystyle\begin{array}{rcl} P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{ user}}^{t},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}}),& &{}\end{array}$$
(15.1)
$$\displaystyle\begin{array}{rcl} P(C_{\mathrm{alt}}^{t+1}\vert C_{\mathrm{ alt}}^{t},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}}).& &{}\end{array}$$
(15.2)

G sys t represents the system GPF at time t and F sys t represents the system framing at t. These variables correspond to system actions, and are explained in Sect. 15.3.2. \(G_{\mathrm{user}}^{t}\) represents the user’s GPF at t, \(C_{\mathrm{alt}}^{t}\) represents the CPD at t, and U eval represents the users’s original evaluation of the alternatives.Footnote 2 In our research, this is the camera selected by the user as preferred at the beginning of the dialogue. We use the persuasive dialogue corpus described in Sect. 15.2.1 for training the user simulator, considering the customer in the corpus as the user and the salesperson in the corpus as the system. We use logistic regression for learning Eqs. (15.1) and (15.2).

15.3.2 Dialogue Modeling: Learning Cooperative Persuasion Policies

For training the dialogue system using reinforcement learning, in addition to the user simulator, the reward, system actions, and belief state are required (Williams and Young 2007).

Reward is calculated using three factors: user satisfaction, system persuasive success, and naturalness. As described in Sect. 15.1, cooperative persuasive dialogue systems must perform dialogue to achieve both the system and user goals. Thus, reward at each turn t is calculated with the following equation:

$$\displaystyle\begin{array}{rcl} r_{t}& =& (\mathrm{Sat}_{\mathrm{user}}^{t} + \mathrm{PS}_{\mathrm{ sys}}^{t} + N^{t})/3.{}\end{array}$$
(15.3)

\(\mathrm{Sat}_{\mathrm{user}}^{t}\) represents a five level score of the user’s subjective satisfaction (1: not satisfied; 3: neutral; 5: satisfied) at turn t scaled into the range between 0 and 1. \(\mathrm{PS}_{\mathrm{sys}}^{t}\) represents persuasive success (1: success; 0: failure) at turn t. N t represents bi-gram likelihood of the dialogue between the system and user at turn t. Sat and PS are calculated with a predictive model constructed from the corpus described in Sect. 15.2.1 (Hiraoka et al. 2014a).

The system action \(\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle\) is a GPF/framing \(\langle a,p\rangle\) pair representing the dialogue act of the salesperson. We construct a unigram model of the salesperson’s dialogue acts \(P(G_{\mathrm{sales}},F_{\mathrm{sales}})\) from the original corpus, then exclude pairs for which the likelihood is below 0. 005. As a result, we use the remaining 13 pairs as system actions.

The belief state is represented by the features used for reward calculation (Table 15.2) and the reward calculated at previous turn. Note that of the 8 features used for reward calculation, only C alt cannot be directly observed from the system action or NLU results, and thus the system estimates it through the dialogue by using the following probability:

$$\displaystyle\begin{array}{rcl} \sum _{\widehat{C_{\mathrm{alt}}^{t}}}P(\widehat{C_{\mathrm{alt}}^{t+1}}\vert \widehat{C_{\mathrm{ alt}}^{t}},F_{\mathrm{ sys}}^{t},G_{\mathrm{ sys}}^{t},U_{\mathrm{ eval}})P(\widehat{C_{\mathrm{alt}}^{t}}),& &{}\end{array}$$
(15.4)

where \(\widehat{C_{\mathrm{alt}}^{t+1}}\) represents the estimated CPD at t + 1, \(\widehat{C_{\mathrm{alt}}^{t}}\) represents the estimated CPD at t, and the other variables are the same as those in Eq. (15.2).

Table 15.2 Features for calculating reward. These features are also used as the system belief state

15.4 Modifications of the Cooperative Persuasive Dialogue Model

In this paper, we further propose two modifications to the cooperative dialogue models described in Sect. 15.3: (1) considering NLU recognition errors in the belief state, and (2) normalization of reward factors.

15.4.1 Considering NLU Recognition Errors

In the cooperative dialogue model in Sect. 15.3, we are not considering recognition errors of the NLU module. In previous research (Hiraoka et al. 2014b), we evaluated the policies based on the wizard of Oz, where a human was substituted for the NLU module, precluding the use of estimation methods used in ordinary POMDP-based dialogue systems (Williams and Young 2007). However, in this paper, we use a fully automatic NLU module, which might cause recognition errors, and thus some method for recovery is needed.

In this work, we modify the dialogue model to consider NLU recognition errors, incorporating estimation of the true user dialogue act (i.e., GPF) into the dialogue model. The estimation is performed according to the following equation:

$$\displaystyle\begin{array}{rcl} P(G_{\mathrm{user}}^{t+1}\vert H_{ G_{\mathrm{user}}})& =& \frac{\sum _{G_{\mathrm{user}}^{t}}P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})P(G_{\mathrm{user}}^{t})} {\sum _{G_{\mathrm{user}}^{t+1}}\sum _{G_{\mathrm{user}}^{t}}P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})P(G_{\mathrm{user}}^{t})}.\qquad {}\end{array}$$
(15.5)

H user represents the NLU result (described in Sect. 15.5.1) at t, and other variables are the same as those in Eqs. (15.1) and (15.2). \(P(H_{G_{\mathrm{user}}^{t+1}}\vert G_{\mathrm{user}}^{t+1})\) represents a confusion matrix between the actual GPF and recognition result. To construct the confusion matrix, in Sect. 15.6.1, we perform an evaluation of NLU and use the confusion matrix from this evaluation for the estimation of Eq. (15.5). \(P(G_{\mathrm{user}}^{t+1}\vert G_{\mathrm{user}}^{t})\) is calculated using maximum likelihood estimation over the persuasive dialogue corpus described in Sect. 15.2.1.

15.4.2 Normalization of the Reward Factors

The reward function in Sect. 15.3.2 considers three factors: persuasive success, user satisfaction, and naturalness. In the current phase of our research, we have no evidence that one of these factors is more important than the other for cooperative persuasive dialogue and thus would like to treat them as equally important. However, in Eq. (15.3) the scales (i.e., the standard deviation) of factors are different, and thus factors with a larger scale are considered as relatively important, and other factors are considered as relatively unimportant. For example, in our previous research (Hiraoka et al. 2014b), the scale of naturalness N is smaller than other factors and as a result is largely ignored in the learning.

In this work, we fix this problem by equalizing the importance of reward factors through normalization with z-score. More concretely, the reward function of Eq. (15.3) is substituted with the following reward function:

$$\displaystyle\begin{array}{rcl} r_{t}^{'}& =& \frac{\mathrm{Sat}_{\mathrm{user}}^{t} -\overline{\mathrm{Sat}_{\mathrm{ user}}^{t}}} {\mathrm{Stddev}(\mathrm{Sat}_{\mathrm{user}})} + \frac{\mathrm{PS}_{\mathrm{sys}}^{t} -\overline{\mathrm{PS}_{\mathrm{sys}}}} {\mathrm{Stddev}(\mathrm{PS}_{\mathrm{sys}})} + \frac{N^{t} -\overline{N}} {\mathrm{Stddev}(N)},{}\end{array}$$
(15.6)

where variables with a bar represent the mean of variables without a bar, and the Stddev function represents the standard deviation of the argument. These statistics are calculated from simulated dialogue with the proposed dialogue model in the previous section, where actions are chosen randomly. We sampled the reward factor for 60,000 turns of the simulated dialogue (about 6000 dialogues) for calculating the statistics of each variable.

15.5 Text-Based Cooperative Persuasive Dialogue System

The main contribution of this paper is the construction of a fully automated text-based cooperative persuasive dialogue system. The structure of the system is shown in Fig. 15.1. In this section, we describe the construction of NLU (Sect. 15.5.1) and NLG (Sect. 15.5.2) modules that act as an interface between the policy module and the human user and are necessary for fully automatic dialogue.

Fig. 15.1
figure 1

Structure of our dialogue system. Rectangles represent information, and cylinders represent a system module

15.5.1 Natural Language Understanding

The NLU module detects the GPF in the user’s text input u user using a statistical classifier. In this paper, we use bagging, using decision trees as the weak classifier (Breiman 1996). We require the NLU to (1) be simple and (2) output the estimated classes with probability, and bagging with decision trees satisfies these requirements. The NLU uses many features (i.e., word frequency), and decision trees can select a small number of effective features, making a simple classifier. In addition, by using bagging, the confidence probability, which is determined by the voting rate of decision trees, can be attached to the classification result. We utilize Mark (2009) for constructing the bagging classifier.

As input to the classifier, we use features calculated from u user and the history of system outputs (u sys, \(\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle\)). Features are mainly categorized into four types:

Uni: :

Unigram word frequency in the user’s input

Bi: :

Bigram word frequency in the user’s input

DAcl: :

The previous action of the system (i.e., GPF/framing pairs \(\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle\))

Unicl: :

Unigram word frequency in the previous system utterance

As we use Japanese as our target language, we perform morphological analysis using Mecab (Kudo et al. 2004) and use information about the normal form of the word and part of speech to identify the word.

As the NLU result \(H_{G_{\mathrm{user}}}\), eight types of GPF are output with membership probabilities. We use 694 customer utterances in the camera sales corpus (Sect. 15.2) as training data. In this training data, eight types of GPF labels are distributed as shown in Table 15.3.

Table 15.3 Distribution of the GPF labels in the training data

15.5.2 Natural Language Generation

The NLG module outputs a system response u sys based on the user’s input u user, the system’s previous utterance u sys , and the system action \(\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle\). Though the dialogue assumed in this paper is focusing on a restricted situation, it is still not trivial to create system responses for various inputs. In order to avoid the large amount of engineering required for template-based NLG and allow for rapid prototyping, we decide to use the framework of example-based dialogue management (Lee et al. 2009).

We construct an example database \(D =\{ d_{1},d_{2},\ldots,d_{M}\}\) with M utterances by modifying the human persuasive dialogue corpus of Sect. 15.2. In the example database, the ith datum \(d_{i} =\langle s,u,g,f,p\rangle\) consists of the speaker s, utterance u, GPF g, framing flag f, and previous datum p. In modifying the human persuasive dialogue corpus, we manually make the following corrections:

  • Deletion of redundant words and sentences (e.g., fillers and restatements)

  • Insertion of omitted words (e.g., subjects or objects) and sentences

Our example database consists of 2022 utterances (695 system utterances and 1327 user example utterances). An example of the database is shown in Table 15.4.

Table 15.4 Part of the example database. The words surrounded by < > are inserted in correction

The NLG module determines the system response u sys based on u user, u sys , and \(\langle G_{\mathrm{sys}},F_{\mathrm{sys}}\rangle\). More concretely, our NLG modules performs the following procedure:

  1. 1.

    We define the response candidate set R according to whether there is user input (\(u_{\mathrm{user}}\neq \phi\)) or not (u user = ϕ). If u userϕ, then we define R as the set of utterances r for which the previous utterance is a user utterance (r. p. s = User). Conversely, if u user = ϕ, then we define R so r. p. s = Sys.Footnote 3

  2. 2.

    Response candidates R are scored based on the following similarity score:

    $$\displaystyle\begin{array}{rcl} \mathrm{cos}(r.p.u,u_{\mathrm{input}})& =& \frac{\mathrm{words}(r.p.u) \cdot \mathrm{words}(u_{\mathrm{input}})} {\mid \mathrm{words}(r.p.u)\mid \cdot \mid \mathrm{words}(u_{\mathrm{input}})\mid }, {}\end{array}$$
    (15.7)
    $$\displaystyle{u_{\mathrm{input}} = \left \{\begin{array}{@{}l@{\quad }l@{}} \ u_{\mathrm{sys}}^{{\prime}}\quad &(u_{\mathrm{ user}} =\phi ), \\ \ u_{\mathrm{user}}\quad &(u_{\mathrm{user}}\neq \phi ).\end{array} \right.}$$

    The cosine similarity cos between the previous utterance of the response sentence candidate r. p. u(r ∈ R) and input sentence u input is used for the scoring. u input is set as u sys or u user depending on u user. The words function returns the frequency vector of the content words (i.e., nouns, verbs, and adjectives) weighted according to tf-idf.

  3. 3.

    The r . u that has the highest score is selected as the output of the NLG module u sys

    $$\displaystyle\begin{array}{rcl} r^{{\ast}}& =& \mathop{\mathrm{arg\ max}}\limits _{ r\in R}\mathrm{cos}(r.p.u,u_{\mathrm{input}}), {}\end{array}$$
    (15.8)
    $$\displaystyle\begin{array}{rcl} u_{\mathrm{sys}}& =& r^{{\ast}}.u. {}\end{array}$$
    (15.9)

15.6 Experimental Results

In this section, we perform two forms of experimental evaluation. First, as a preliminary experiment, we evaluate the performance of the NLU module proposed in Sect. 15.5.1. Then, we evaluate the fully automatic persuasive dialogue system.

15.6.1 Evaluation for NLU Using Different Feature Sets

First, we evaluate the performance of the NLU module using different feature sets proposed in Sect. 15.5.1. We prepare four patterns of feature sets (Uni, Uni+DAcl, Uni+CAcl+Unicl, and Uni+CAcl+Bi) and evaluate the recognition accuracy of GPF labels in the customer’s utterances. The evaluation is performed based on 15-fold cross-validation with 694 customer utterances described in Sect. 15.5.1.

From the experimental result (Fig. 15.2), we can see that NLU with Uni+CAcl+Bi achieves the highest accuracy, and thus we decided to use Uni+CAcl+Bi for NLU of the dialogue system in the next section. Focusing on the details of the misclassified GPFs, we show the confusion matrix for classification results of the NLU module with Uni+CAcl+Bi in Table 15.5. From this matrix, we can see that Answer is misclassified to Inform and that SetQ and Question are misclassified into PropositionalQ. This result indicates that this module has difficulty in distinguishing dialogue acts in a hypernym/hyponym or sibling relationship.

Fig. 15.2
figure 2

Accuracy of the NLU module. The vertical axis represents accuracy and the horizontal axis represents the NLU feature set. Chance rate is an NLU module that always outputs inform

Table 15.5 The confusion matrix

15.6.2 Complete System Evaluation

In this section, we describe the results of the first user study evaluating fully automated cooperative persuasive dialogue systems. For evaluation, we prepare the following four policies.

Random: :

A baseline where the action is randomly output from all possible actions.

NoFraming: :

A baseline where the action is output based on the policy which is learned using only GPFs. For constructing the actions, we remove actions whose framing is not None from the actions described in Sect. 15.3.2. The policy is a greedy policy and selects the action with the highest score.

Framing: :

The proposed method where the action is output based on the policy learned with all actions described in Sect. 15.3.2 including framing. The policy is also a greedy policy.

Human: :

An oracle where the action is output based on human selection. In this research, the first author (who has no formal sales experience, but with experience of about 1 year in the analysis of camera sales dialogue) selects the action.

For learning the policies (i.e., NoFraming and Framing), we use Neural fitted Q Iteration (NFQ) (Riedmiller 2005). For applying NFQ, we use the Pybrain library (Schaul et al. 2010). The learning conditions follow the default Pybrain settings. We consider 3000 dialogues as one epoch and update the parameters of the neural network at each epoch. Learning is finished when the number of epochs reaches 20 (60,000 dialogues), and the policy with the highest average reward is used for evaluation.

We evaluate policies on the basis of average reward and correct response rate of dialogues with real users. The definition of the reward is described in Sect. 15.3.2, and the correct response rate is the ratio of correct system responses to all system responses. In the experiment, the dialogue system plays the salesperson, and the user plays the customer. At the end of the dialogue, to calculate the reward, the user answers the following questionnaire:

Satisfaction: :

The user’s subjective satisfaction defined as a 5 level score of customer satisfaction (1: not satisfied; 3: neutral; 5: satisfied).

Final decision: :

The camera that the user finally wants to buy.

In addition, to calculate the correct response rate, we have the user annotate information regarding whether each system response is correct or not. Thirteen users perform one dialogue with the system obeying each policy (a total of four dialogues per user).

Experimental results for the reward are shown in Fig. 15.3. From these results, we can see that the reward of Framing is higher than that of NoFraming and Random and almost equal to Human. This indicates that learning a policy with framing is effective in a fully automatic text-based cooperative dialogue system. It is interesting to note that the tendency of those scores is almost the same as those of the wizard-of-Oz based experiment (Hiraoka et al. 2014b). The exception is that the naturalness of Framing in this experiment is higher than that of the wizard-of-Oz based experiment. Our hypothesis about the reason for this difference is that this is due to the effect of the modification of reward factors. In Sect. 15.4.2, we modified the importances of reward factors to be considered equally in learning the policy. Therefore, in the learning, naturalness is considered as an important factor, resulting in an increase of the naturalness score of Framing. It should be noted, however, that most of the subjects are different from the wizard-of-Oz based experiment we performed in previous work (Hiraoka et al. 2014b), and this might also affect the experimental result.

Fig. 15.3
figure 3

Evaluation results for real users. Error bars represent 95 % confidence intervals. Rew represents the reward, Sat represents the user satisfaction, PS represents persuasive success, and Nat represents naturalness

Experimental results for the correct response rate (Fig. 15.4) indicate that our cooperative persuasive dialogue system somewhat correctly responds to the user’s input. The scores of all policies are higher than 70 %, and the score of Framing is about 77 %. In addition, even the Random policy achieves a score of about 70 %. One of the reasons for this is that NLG method used by our system (Sect. 15.5.2) is based on examples and thus is able to return natural responses that will only be judged as incorrect if they do not match the context.

Fig. 15.4
figure 4

Correct response rate of the system utterances

15.7 Conclusion

In this paper, we presented a method for construction of a fully automatic cooperative persuasive dialogue system. Particularly, we focused on modifications to the policy learning and construction of NLU and NLG modules. We performed an evaluation of the constructed dialogue system with real users. Experimental results indicated that the proposed system is effective in text-based cooperative dialogue systems and that the tendency of each reward is almost the same as results of our previous research (Hiraoka et al. 2014b).

In the future, we plan to evaluate the system policies in more realistic situations that move beyond role-playing to real sales situations over more broad domains. We also plan to consider nonverbal information for estimating persuasive success and user satisfaction.