1 Introduction

As smart speakers with voice interaction capability permeate continuously in the world (According to statistics of IDC, global shipments were 99.8 million from Q1 to Q3 of 2018 [1]), voice interaction is more widely used and becoming an important human-computer interaction method.

Although speech recognition, natural language processing (NLP) technologies have been greatly improved, users still encounter errors from “recognition”, “understanding”, and “fulfillment” aspects [2, 3]. By conducting a structured walkthrough on how the three key smart speaker brands in China (Xiaomi, Baidu, Alibaba) would respond to errors under task-oriented dialogue interaction, we found the strategies varied in two aspects: “apology or not” and “humor or neutral”. When error occurred, some of them would make an apology, but not in all errors. Besides, when the humor will be showed in error message (e.g. self-deprecating) is unpredictable. Whether a smart speaker would make an apology or not was inconsistent under different error scenarios. Moreover, for a given error scenario, whether or not humor would be showed in the error message (e.g. self-deprecating) was unpredictable.

Coping with errors is a common research topic in HCI. Traditionally, researchers studied how to provide error message effectively to help user recover from errors [4,5,6,7], as well as how positive listening, sympathy, encouragement could eliminate the frustration caused by machine [8,9,10,11,12].

But previous studies on response strategies were mostly based on traditional interactions. Responses were mostly presented visually (texts or pictures) and not acoustically. Moreover, smart speakers differed from traditional machines as they are anthropomorphic, and users spontaneously show human-human communication characteristics when interacting with them [13, 14]. Thus, it is worth to explore the users’ demand and experience on error response strategies adopted by smart speakers. The present study focused on two aspects, “apology or not” and “humor or neutral”, which were consistent with two principles proposed by Suo [15] in human-human interaction, the “courtesy criteria” and the “humor criteria”.

In the study, we used the method of “Wizard of Oz”. Participants were asked to complete a series of specific tasks on a smart speaker. During the process, they experienced pre-setup conversation errors: when errors occurred, the smart speakers responded in different strategies. The goal of the experiment was to explore the participants’ preference of apology and humor. The results of this experiment can provide implications for speech design in voice interaction.

2 Related Work

2.1 Error and Its Impact in HCI

In the process of human-computer interaction, errors such as program crash, delay, no requested results, unexplained problems occur due to factors such as technology, design, and environment etc. When it comes to voice interaction, errors also exist but in different types. In previous studies and our own observations, it was found that errors came from “recognition”, “understanding”, and “fulfillment” aspects, including “inaccurate recognition of foreign language and dialect”, “recognition failure” “unable to understand multiple demands” “no requested audio resource” [2, 7, 16, 17].

Klein [4] concluded that researchers improved user experience by determining and fixing errors, or preventing them from happening in advance. But it was impossible to solve and predict all mistakes because of various environment and users. Instead error message could help user recover from error. Nielsen [5] proposed that a good error message should be polite, precise, and constructive. Starting with a simple and slightly apologetic statement would be necessary for some instances [6].

Errors can easily evoke user’s frustrations. Frustration will not only damage user experience during the interaction process, but also jeopardize the long-term willingness to use and trust smart devices. Some researchers proposed that it was crucial to deal with the frustration. Combing emotional management theory, Klein et al. [4] designed an interface agent which could actively “listen” to the users’ frustrations caused by computer and “show” empathy. Users’ frustrations were alleviated and they were more willing to keep playing the game comparing to those who didn’t get affective support. Hone et al. [9, 10] further proved that an embodied agent was more effective, and female agent was better than male agent. Some researchers [12, 18, 19] used the emotion detection module, algorithms or other means to identify user’s negative emotions and give emotional support in real time, such as empathy encouragement, to reduce the user’s frustration.

In summarize, when errors occur, in addition to trying to solve the problem, coping with frustrations is also important to improve user experience.

2.2 Apology in HCI

In interpersonal communication, an apology is often used to express regret, to mitigate anger caused by offender, to restore relationships, to save one’s dignity, and to receive reduced punishment. “Compared to other approaches, such as making excuses, justifying the action, or denying the blame, apologies are perceived as the most trust engendering and sincere manner to resolve interpersonal conflicts and restore harmony, regardless of the severity of the circumstances” [20]. Blum-Kulka & Olshtain [21] proposed that there were five apologizing strategies: (1) an illocutionary force indicating device (IFID; such as, “I’m sorry”, “I apologize”, or “Excuse me”, “I apologize”); (2) an explanation or account; (3) take on responsibility; (4) an offer of repair; (5) a promise of forbearance. And IFID and “take on responsibility” were more universal across different contexts and cultures [22].

Studies about the utility of apologies in human-computer interaction didn’t draw any common and unified conclusion. At first, compared to plain computer messages, apologetic messages from computer could be seen as apologies by users [23]. Basing on a study on information retrieval systems, Park et al. [24] revealed that the users perceived the apologetic system as more aesthetically appealing and usable than the neutral or non-apologetic system. Tzeng [23] found that although apology did not improve the overall evaluation of the game, user’s psychological experience during the game was better. Akgun [22] further found that an apology made the user feel being respected, and 60% of users thought that a machine should make an apology when it couldn’t meet the user’s requests. However, Park et al. [20] studied the utility of apologies in voice interaction on television and found that apologies did not significantly increase the positive evaluation on TV. Baylor et al. [25] compared “apology” and “empathy” and found that apologetic message was significantly more believable and sincere than empathetic message. Jaksic et al. [26] conducted a study with the “Wizard of Oz” methodology to evaluate the effectiveness of social agents in reducing user frustration. Participants were divided into different groups to experience different levels of frustration. The social agent apologized (by message) actively when participants showed negative facial expressions. It was found that apology increased the frustration of users with high level frustration, and reduced the frustration of users with moderate level of frustration. De Visser et al. [27] found that making an apology could help to restore trust in the machine.

Summarizing previous studies, it could be concluded that impact of apology was not uniform in improving users’ evaluation on machine, but the majority of studies reported that an apology at least improved the experience of the interaction process.

2.3 Humor in HCI

Humor is an important interpersonal communication method. Humor can increase personal attraction, promote relationship, ease embarrassment and break the ice [28]. But humor cannot always bring positive impact for its utility depends on the context. At the same time, cultural diversities, individual differences and other factors have a great impact on the effectiveness of humor [29].

Although humor generally has a positive impact in interpersonal communications, attempts of applying humor in the field of human-computer interaction seem to be cautious. Efficiency is always one of the most important goals pursued by HCI designers, by minimizing task steps and reducing learning time etc. Some researchers believe that humor distracts users’ attention and leads to reduction of efficiency during human-computer interaction [30]. Dolen et al. [31] found that for an e-commerce website, the impact of humor was influenced by the process experience and the outcome. In a task of reserving a vacation accommodation on the website, if participants had a good experience during the process but the result did not go well, participants had higher satisfaction and preference for the website with humorous elements, but if the process and the result were both dissatisfying, humor elements had negative impacts. Tzeng [32] found that for an error message on a website, users preferred neutral and apologetic expression, instead of a humorous expression, and humorous expressions were perceived to be unclear and unfriendly.

However, for products with more anthropomorphic and social features, such as virtual agents and robots, studies show that humor might have positive impacts. Morke et al. [30] empowered the machine with the humor character by pre-programming some task-related jokes, and found that the user showed higher affection and greater willingness to cooperate. Khooshabeh et al. [33] thought the pre-programmed conversations in Morke’s study were not natural and dynamic enough. Thus, they further improved the natural dialogue capabilities of virtual agents, and found users preferred humorous virtual agents as well. Niculescu et al. [34] found users experienced more pleasure for a humorous service robot, and the findings were confirmed for virtual agents [35, 36]. However, these findings were based on scenarios that focused on conversation and cooperation. For a smart speaker, with which the interactions are strongly task-oriented, the impacts of humor still need to be studied. Niculescu et al. [37] proposed an assumption that when an error occurred in a task-dialogue scenario, a humorous expression might alleviate the stress in the dialogue and make the user be more tolerant of the error, but the assumption still needs to be tested.

3 Method

3.1 Experiment Design

The purpose of the experiment was to explore users’ preference over “apology” and “humor”. A 2 (apologize: yes vs. no) × 2 (error message expression: humorous vs. neutral) within-subjects experiment was conducted. Two dependent variables were measured using a 7-point Likert scale: The satisfaction and sincerity of smart speaker’s responses.

“I’m sorry” was selected as the apologetic message since it is an effective apology strategy [21]. At the same time, “I’m sorry” is also the most commonly used apology in China. The humorous expressions were chosen from a pilot study.

In the experiment, two common error types were simulated, which were “cannot understand” and “no requested audio resource”. “Cannot understand” included situations that voice was detected but not recognized or understood by the agent. “No requested audio resource” meant that demands were correctly recognized and understood, but couldn’t be fulfilled because of no copyright of requested music or other audio contents. These two types of errors are frequent. In our previous survey, we found that users were dissatisfied most with these two scenarios, with 89% and 91% of respondents (N = 202) clearly expressing their dissatisfaction. The responses are listed as below (see Tables 1 and 2).

Table 1. The responses under “Cannot understand” error scenario
Table 2. The responses under “No requested audio resource” error scenario

3.2 Participants

A total of 27 participants (13 male and 14 female) were recruited. The age of these participants ranged from 20 to 45 years (M = 27, SD = 4.48). The participants’ experience of smart speaker was balanced. 14 of them reported previous experience with smart speakers (i.e., they had used smart speakers at home in the past three months).

3.3 Tasks

The participants were asked to complete different tasks on a smart speaker. These tasks were selected from four kinds of functions with high usage frequency, namely “listen to music”, “listen to audio resource (such as audio lessons), “check the weather” and “set an alarm”.

“Cannot understand” Scenario.

We generated 3 tasks from each of the functions, which ended up 12 tasks in total. The 3 tasks from same category of function differed in detail. For example, for the function of checking the weather, the 3 tasks were about weather in different places. In daily use, errors will occur now and then. To increase ecological validity, we tried to simulate the real situation. Specifically, participants were asked to complete all 12 tasks and only 4 of them would induce error messages. During the 4 experimental tasks, the smart speaker couldn’t understand what the participants said at the first time and respond with an error message. Then the participants tried again and the smart speaker would succeed. Here is an example of “check tomorrow’s weather in Shanghai” (wake-up stage was not included):

  • Participant asked: How is the weather in Shanghai tomorrow?

  • Smart speaker responded: I’m sorry, my IQ is still recharging, please repeat it again.

  • Participants asked: How is the weather in Shanghai tomorrow?

  • Smart speaker responded: It’s raining tomorrow in Shanghai; the temperature is 11–15 °C.

In order to balance the effect of order, researchers controlled: (1) among the 12 tasks, the 4 failure tasks (experimental conditions) occurred randomly; (2) the 4 failure tasks occurred equally in the four functional types (“check the weather”, “listen to the music”, “listen to audio resource” and “set an alarm”).

“No requested resource” Scenario.

We generated 2 tasks from each of the 2 kinds of functions, “listen to music” and “listen to audio resource. The sequence of the 4 tasks was also randomized for each participant. Different from the former scenario, there was only one conversation round in each task. Here is an example of “ask for Mai Sheng Children’s story”:

  • Participant asked: I want to listen to Mai Sheng Children’s story.

  • Smart speaker responded: I don’t have this copyright of the audio resource yet.

Each task should be perceived by users as equally difficult for the smart speaker to avoid its impact on users’ evaluations [26]. To ensure the task difficulty was effectively controlled, we asked the participants to subjectively evaluate the task difficulty (in a 7-point Likert scale) after each task. The question is “How difficult do you think the task is for the smart speaker?” The results of repeated measures ANOVA showed that the difficulty of tasks had no significant difference for both error scenarios (F(3,63) = 0.058, p > 0.05, F(3,63) = 0.705, p > 0.05). The results confirmed that the tasks were perceived to be equally difficult for the smart speaker.

3.4 Procedure

The experiment adopted the “Wizard of Oz” methodology. A computer was used to manipulate the response of smart speaker, simulating the human-machine voice interaction process.

The whole experiment process consisted of three parts: First, the participants completed the tasks under the “cannot understand” error scenario and evaluated the response after each task. Then the user completed the tasks under the “no requested audio resource” error scenario, and evaluated the response after each task. Finally, worrying that the participants might pay too much attention to finish tasks and ignore the difference of responses, we asked the participants to evaluate the responses again. In this step, all responses were printed in a paper together so participants could compare directly.

Before the experiment, researchers explained to the participants of the experimental procedure and how to interact with smart speaker with voice. Participants were requested to interact with the smart speaker to familiarize themselves with voice interaction. After that, the formal experimental began.

After each task, the participants were asked to evaluate the difficulty of the task fulfillment they had on the smart speaker. In addition, for the 4 failure tasks (experimental conditions), the participants also needed to answer the following questions: (1) spontaneously recalled the first response (error message) of the smart speaker; (2) perceived satisfaction and sincerity of the response.

After all the tasks, all error messages during the experiment were presented together in print. The participants were asked to evaluate their satisfaction and sincerity again for each error message as they could directly compare the 4 different responses strategies under each error type. Then they were interviewed for reasons of their ratings.

Besides, for the two humorous expressions used in the experiment, participants were asked to evaluate their humor level and semantic accuracy (both in 7-point Likert scale).

3.5 Data Analysis

We used the SPSS 23.0 to analyze the data. First, repeated measures ANOVA was used to check the manipulation result of difficulty. The, we conducted descriptive analysis of all dependent variables. To evaluate the impact of “apology” and “humor”, repeated measures ANOVA were used. At the same time, we qualitatively analyzed the recall of error messages, and reasons collected from participants.

4 Result

As explained in part 3.4, “satisfaction” and “sincerity” were measured twice in the experiment: the first time was after each task during the experiment (Natural perception), and the second time was after all the tasks were finished (Direct comparison perception). The results were showed separately.

4.1 “Cannot understand” Error Scenario

Results of Natural Perception

The mean scores of satisfaction and sincerity of 4 different responses are showed as below (see Table 3). Response of “apologetic & neutral expression” was perceived the most satisfied and sincerest. While response of “non-apologetic & humorous expression” was perceived to be least satisfying and sincere.

Table 3. The Mean(SE) of responses under “cannot understand” error scenario

Satisfaction.

The results of repeated measures ANOVA showed no significant interaction effect between “apology” and “humor” on perceived satisfaction (F(1,26) = 0.016, p > 0.05). And there were no main effects on “apology” (F(1,26) = 1.876, p > 0.05) and “humor” (F(1,26) = 1.457, p > 0.05). This indicated when smart speaker couldn’t understand what user said, apologizing and being humorous had no impact on perceived satisfaction.

Sincerity.

The results of repeated measures ANOVA showed significant interaction effect between “apology” and “humor” on perceived sincerity (F(1, 26) = 4.319, p < 0.05). Further analysis was conducted. When smart speaker made an apology, neutral message was perceived sincerer than humorous message (F(1, 26) = 5.032, p < 0.05). When smart speaker didn’t make an apology, being humor or not had no significant impact (F(1, 26) = 0.815, p > 0.05). This indicated when smart speaker made an apology, neutral response could significantly improve perceived sincerity.

Results of Direct Comparison Perception

The mean scores of satisfaction and sincerity of 4 different responses are showed as below (see Table 4). Response of “apologetic & neutral expression” was perceived the most satisfied and sincerest. While response of “non-apologetic & humorous expression” was perceived the lowest satisfied and sincerest.

Table 4. The Mean(SE) of responses under “cannot understand” error scenario

Satisfaction. The results of repeated measures ANOVA showed no significant interaction between “apology” and “humor” on perceived satisfaction (F(1,26) = 1.874, p > 0.05). There was main effect of apology on perceived satisfaction (F(1,26) = 15.730, p < 0.01)”, but the main effect of humor didn’t exist (F(1,26) = 1.930, p > 0.05). This indicated when smart speaker had no requested audio resource, apologizing could improve perceived satisfaction, and being humorous had no impact.

Sincerity.

The results of repeated measures ANOVA showed no significant interaction between “apology” and “humor” on perceived sincerity (F(1, 26) = 2.208, p > 0.05). There was main effect of “apology” and “humor” on perceived sincerity (F(1,26) = 18.809, p < 0.001)”, (F(1,26) = 5.955, p < 0.05). This indicated when smart speaker had no requested audio resource, apologizing could improve perceived sincerity, and humor had negative impact on perceived sincerity.

4.2 “No requested audio resource” Error Scenario

Results of Natural Perception

The mean scores of satisfaction and sincerity of 4 different responses are showed as below (see Table 5). Apologetic responses had higher satisfied and sincere scores.

Table 5. The Mean(SE) of responses under “no requested audio resource” error scenario

Satisfaction.

The results of repeated measures ANOVA showed no significant interaction between “apology” and “humor” on perceived satisfaction (F(1,26) = 0.031, p > 0.05). And there were no main effects on apology (F(1,26) = 0.308, p > 0.05) and humor (F(1,26) = 0.010, p > 0.05). This indicated when smart speaker had no requested audio resource, apologizing and being humorous had no impact on perceived satisfaction.

Sincerity.

The results of repeated measures ANOVA showed no significant interaction effect between “apology” and “humor” on perceived sincerity (F(1,26) = 0.325, p > 0.05). There was main effect of apology (F(1,26) = 4.522, p < 0.05)”, and the main effect of humor didn’t exist (F(1,26) = 0.032, p > 0.05). This indicated when smart speaker had no requested audio resource, apologizing could improve perceived sincerity, and humor had no impact.

Results of Direct Comparison Perception

The mean scores of satisfaction and sincerity of 4 different responses are showed as below (see Table 6). Apologetic response had higher satisfaction and sincerity scores.

Table 6. The Mean(SE) of responses under “no requested audio resource” error scenario

Satisfaction.

The results of repeated measures ANOVA showed no significant interaction between “apology” and “humor” on perceived satisfaction (F(1,26) = 0.382, p > 0.05). There was main effect of “apology” on perceived satisfaction (F(1,26) = 26.202, p < 0.001)”, but the main effect of humor didn’t exist (F(1,26) = 0.554, p > 0.05)”. This indicated when smart speaker had no requested audio resource, apologizing could improve perceived satisfaction and being humorous had no impact.

Sincerity.

The results of repeated measures ANOVA showed no significant interaction between “apology” and “humor” on perceived sincerity (F(1, 26) = 0.088, p > 0.05). There was main effect of “apology” on perceived sincerity (F(1,26) = 20.469, p < 0.001), but the main effect of “humor” didn’t exist (F(1,26) = 0.014, p > 0.05). This indicated when smart speaker had no requested audio resource, apologizing could improve perceived sincerity and being humorous had no impact.

The results indicated when smart speaker had no requested audio resource, making an apology could improve the perception of satisfaction and sincerity, but the humor had no impact on both aspects.

4.3 Results Summary

In general, descriptive statistics results of natural perception and direct comparisons showed same tendencies. However, there were more significant results in the direct comparisons (see Tables 7 and 8). The results showed that for both error scenarios, making an apology could improve perceived satisfaction and sincerity. Humor had least positive impact on perceived satisfaction and sincerity, and even had negative impact when the smart speaker could not understand users’ requests.

Table 7. Summary of significance differences tests under “Cannot understand” scenario
Table 8. Summary of significance difference tests under “No requested audio resource” scenario

5 Discussion

5.1 Natural Perception vs. Direct Comparison Perception

In the experiment, “Satisfaction” and “Sincerity” were measured twice: the first time was after each task during the experiment (Natural perception), and the second time was after all the tasks were finished (Direct comparison perception). To do the direct comparison, all the responses were printed and presented together to the participants. It was found that users’ evaluation differed greatly in perceptions. (see Tables 7 and 8).

Analysis of post-experiment interviews demonstrated our pre-experimental speculations. First, in the experiment environment, participants were more focused on whether the tasks were finished. When an error occurred, they allocated most cognition resources to comprehend what error message conveyed rather than to pay attention to how the error message was expressed. Second, the four failure tasks were given in a random sequence during the whole process, and the participants evaluated one response at a time. But during the direct comparison, all the responses were presented to users together so the participants could recognize the difference and make a rational evaluation at this time.

5.2 Attitude Toward Apology

When error occurred, an apology from the smart speaker made participants feel more satisfied and sincerer. In the post-experiment interviews, 21 participants preferred smart speaker making an apology. And 4 participants thought it didn’t matter whether it apologized or not. Analysis of participants’ reviews showed there were mainly four reasons.

  1. (1)

    An apology showed smart speaker’s friendliness and politeness, which alleviated the perception of dissatisfaction.

  2. (2)

    The apology attributed the responsibility to the smart speaker. This opinion was consisted with acknowledged effect of apology [20], which, at the meantime, let participants feel less frustrated sometimes by proving they didn’t do wrong [31].

  3. (3)

    Making apology attitude was educational for children. In China, children are one of the important user groups of smart speaker. Many parents said that the smart speaker was a playmate for their child. But they worried the smart speaker may teach the children to be rude, which seemed to be a global concern [38]. So, this is important reason why smart speak should apologize.

  4. (4)

    The apology could be seen as a signal of an error. Participants said that “I’m sorry” prompted them to concentrate quickly and to comprehend the error message. This could improve the interaction efficiency and make the process smoother. To some extent, the finding was in line with Zillmann’s study [39]. His found that humorous stimuli increased subjects’ attentiveness and helped to acquit information ultimately.

Only 2 participants expressed reluctance on receiving apologies, concerning the apology made the smart speaker sound verbose, especially when the error message was long. In a task-oriented dialogue process, participants indicated that they needed to know why error happened quickly, so that they could adjust dialogue strategy to achieve the ultimate intention.

Generally, making an apology could improve user experience when a smart speaker triggers users’ dissatisfaction. But an apology increased the length of error message and conveyed humble attitude. A perfect apologetic error message should be clear, simple, and modest but not humble.

5.3 Attitude Toward Humor

Our results indicated that compared with neutral response, humorous response did not get a higher score in perceived satisfaction and sincerity. And when a smart speaker couldn’t understand, the humorous response reduced the its sincerity. Half of the participants (13 out of 27) clearly expressed dislikes towards the humorous expression. Analysis of participants’ reviews showed there were mainly six reasons.

  1. (1)

    Individual differences in perception of humor. Though the two humorous responses used in this experiment got higher score of humor in the pre-experimental test, there were still 29.6% and 25.9% of participants respectively perceived no humor. This consistent with other research’s findings [29].

  2. (2)

    Humor required more literal comprehension which could be a barrier of understanding. The use of humor in human-machine interaction has always been controversial. One of the main reasons is the impact on efficiency. Most humorous expressions used by smart speaker in China are from internet. Not everyone is familiar with them.

  3. (3)

    Humorous expressions were prone to be ambiguous. Although most participants in this study accurately understood the meaning of the response, they still expressed their concerns.

  4. (4)

    Humor reduced the perceived responsibility of smart speaker. The participants thought the smart speaker was perfunctory and prevaricated when responded wittily. In the experiment, that also the reason why humorous expression was less sincere under “cannot understand” error.

  5. 5)

    Humorous expression might be hard to be understood by children. As mentioned before, children are an important group of users in China. Some parents worried that humors in error messages could be a barrier to understand smart devices for children because of their immature cognitive ability.

  6. (6)

    Repeated humor tended to be boring. Although each humorous expression was repeated only once in this experiment, some participants still reported that the humorous expression lost sense of freshness the second time it appeared.

13 participants had positive comments on humorous expression, mainly because humor made the smart speaker more interesting, lovable, more emotional and vitality.

Researchers believe that when a smart speaker triggers user dissatisfaction, responding in a humorous way should be carefully considered. As discussed above, humor had certain disadvantages in terms of attitude, efficiency and enjoyment. Considering the attitude, inappropriate humor could easily lead to irresponsible, slick impressions humorous voice-feedback could easily create an assistant image quite irresponsible and flippant, especially when it couldn’t achieve the users’ simple request. Secondly, humorous responses might decrease the efficiency of voice interaction because of misinterpreting, culture barrier, impaired ability etc. Finally, perceptions about humor varied by age, social group and culture. Therefore, the accuracy, readability and universality of humor should be taken into account when designing humorous response.

6 Implication

Theoretical value could be found in this study. It could be verified that when a user was in a negative mood of dissatisfaction, the response with emotional mood was more preferred by the user. Indeed, the emotional response is friendly, but its content should not be too casual. Therefore, a response with an apology was more favored. And a humorous message feels no better than a neutral one.

Practical value could also be found in this study. On one hand, the results of this experiment provided a valuable VUI response strategy that the “apology & neutral” response was more preferred. On the other hand, the qualitative results in this experiment found the suggestions and guides of response wordings. For example, the length and attitudes were critical for apology phrases, and they should sound modest, instead of humble. When design humorous response, the accurate meanings, easy to understanding, and the sense of humor were universal and important principles.

7 Limitations and Future Study

The limitations and prospects for future research of this study can be concluded as below. Firstly, due to multi-tasks were assigned to each participant, task evaluation might be easily affected by the degree of achievement in previous tasks although each respondent was instructed to treat each experimental task as an independent task and dialogue. Random tasks had been assigned to reduce this impact, the mutual interference among the tasks was still not been ruled out. To improve this, the experimental designed by crossing occasions among respondents or by collecting the users daily real human-machine dialogue data (with acknowledgement and agreement) can be employed for analysis and study. Secondly, this study mainly focused on frustrations caused by smart speakers in the first round of dialogue. For future study, it is necessary to investigate repeatedly frustrations in which negative moods of users may deepen and lead to more dissatisfactions because of unresolved key needs. Thirdly, this study did not explore the impacts from different task occasion, respondents’ background and characteristics and the relationships between users and device. For future studies, they can be explored more on responding theory. Finally, this study focused on smart speakers’ best response strategy for users’ frustrations. For future studies, we can explore on more different devices, such as smart navigation system, smart television, and smart mobile voice assistant.

8 Conclusion

The findings of this study are listed below: (1) Apology response from smart devices significantly increased users’ perception of sincerity, and it prominently improved both satisfaction and sincerity under direct comparison perception. (2) The factor of humor in responses had no impact on the perception of satisfaction and sincerity, but it can observably decrease the perception of sincerity when the smart speaker could not understand users’ requests. Therefore, when smart speakers trigger users’ dissatisfactions, an apologetic and neutral response may be the best way to present.