1 Introduction

Humor plays a sociability role in human-human interaction. Researchers in HRI assume that implementing humorous behavior into human-robot interactions can take advantage of the potential of humor for establishing social relationships [14]. It can help to make robots much friendlier, and increase cooperation with the system [10]. This work is part of the Joker project which aims at building a generic user interface that provides a multimodal dialog system with social skills including humor and empathy [6]. We assume that using humor in human-robot interaction sets up a positive atmosphere in which participants are willing to contribute. Humor support is defined by Hay [8] as the conversational strategies used in reaction to humor utterances. This paper aims at exploring the phenomenon of responses to humor interventions from the robot through the examination of a corpus. This corpus is a sub-version of the data collected in the frame of the Joker project with 3 different systems (described in [6]). These human responses, elicited through canned jokes and conversational humor (food-related puns, teasing and end rhymes) were annotated. The data were also categorized by the sociolinguistic variables of gender, and personality traits. Section 2 presents work related to humor support in human-human interaction and to humor in HRI. Section 3 presents the corpus collection performed with a wizard of oz data collection system on three scenarios. Section 4 is dedicated to the corpus annotation process in terms of linguistic content of the human speech. Categories of humor responses found in the annotated corpus are listed and defined. Section 5 highlights the importance of contextual information in the use of a type of responses to humor. It reveals the existence of other influence such as the sociolinguistics variable of age or the Sense of Humor (measured by questionnaire). Results of this study are discussed. Section 6 concludes this paper and presents perspectives.

2 Related Work on Humor Support and Humor in HRI

Morkes et al. [13] have designed experiments to examine the role of humor in human-computer interaction. They show that humor have positive effects on the human-computer interaction: participants rated the system as more likable and responded in a more sociable manner. Regarding the reaction to humor, Hay [8] described many different humor support strategies in natural human-human conversations. The humor support strategies can be perceived in smiles and laughter, with the contribution of more humor, echoing the humor, offering sympathy, contradicting self-deprecating humor or providing no support. To give full humor support, humor has to be recognized, understood and appreciated [14]. Bell [4] described responses to failed humor. The strategies are quite similar to response to success humor: laughter, metalinguistic remarks about the jokes, interjection, evaluation of the joke, rhetorical question, sarcasm, non verbal response, mode adoption. Attardo [2] in a study of reactions to ironical utterances suggests that the hearer may also mode adopt. Mode adaption can be elicited by many kind of humor. Norrick [15] provides examples of spontaneous conversational punning that elicits further punning from other participants.

3 Data Collection of Social Human-Robot Dialog

3.1 Interaction Scenarios

Data were collected using a Wizard of Oz dedicated to social dialogue through the Nao robot [6], implemented in French language. The system is configured by a predefined dialogue tree that specifies the text utterances, gestures and laughter that can be executed by the Nao robot. At each node, the operator chooses the next node of dialogue to visit according to the human dialogue participant’s reaction. In this paper, examples from the corpus have been adapted from French to English in an attempt to be as close as possible as the intended effects in French.

The scenario implements a system-directed social interaction dialogue that adapts the telling of riddles and other humorous contributions to some aspects of the user model. In this scenario, the system displays various humor capabilities (as shown in Table 1). Interactions with the robot follows a common structure. First, the robot greets the participant and presents itself in an introduction phase. Next, the system offers the telling of a riddle depending on the detected (by the operator) emotional state of the human. The behavior of the system is adapted to the receptiveness of the human to the contributions of the robot: positive reactions from the participant trigger more jokes. Then the system challenges participants in a game by asking a question about a meal (e.g., “What ingredients do we need to make a onion soup?”). Finally, the system gives a conclusion about the perceived participant reactions (e.g., “I am glad you like humor produced by a robot”.), and closes the interaction.

Given the need for robust generation, the humor is of the hackneyed variety. The humorous acts made by the robot are divided into two categories derived from the literature (see [2, 17]). As mentioned by Attardo [2], the locutor can produce two types of humorous acts in dialogs: canned jokes which are narratives containing a punchline, or conversational witticism which is a non-narrative jab-line, melted in the dialog. We separate the humorous act made by the robot following the canned joke and conversational humor categories as shown in Table 1.

Table 1. Humorous acts made by the robot during interaction

3.2 Collected Data

This corpus consists of two experimentations following the same scenario and the same protocol. The first experimentation took place in the cafeteria of the LIMSI-CNRS laboratory with French-speaking participants. The 37 volunteers were 62 % male, 38 % female, and their ages ranged from 21 to 62 (median: 31.5; mean: 35.1) [6]. The second experimentation took place at the Broca Hospital with 12 French-speaking participants. The volunteers were 35 % male, 65 % female, and their ages ranged from 64 to 86 (median: 75, mean: 74, standard deviation: 6). In both experiments, participants were seated facing the Nao robot at around one meter from it. Audio tracks of 16 kHz have been recorded thanks to a high-quality AKG Lavalier microphone. A total of 3 h 57 min 04 s of audio data has been collected for both experimentations. First experiment accounts for 3 h 20 min 57 s (average session duration: 5 min 25 s; standard deviation: 1 min) while the second scenario accounts for 36 min 07 s (average session duration: 3 min 29 s).

The Sense of Humor Scale (SHS) questionnaire of McGhee [12] was filled by participants after the experimentation to evaluate the impact of individual differences in humor perception. Six dimensions of humor appreciation are assessed in this questionnaire. Each dimension is rated between 4 and 28 and a global sense of humor score is rated by the sum of sub-categories, ranging from 24 to 168. The participant SHS-scores in this experiment range from 72 to 145 (mean: 108.87, standard deviation: 22.38). In addition, participants filled a self-report questionnaire to evaluate several dimensions of the interaction. This questionnaire consists of closed-ended questions about the system, the interaction and the human participant (a more detailed description is given in [3]).

4 Annotation and Responses to Humor

4.1 Annotation Process

Audio data have been transcribed. Based on this transcription, we extracted the adjacency pairs of humorous acts made by the robot and the following human response. All in all, the corpus contains 381 humorous contributions from the robot and 381 human responses. The 381 humorous contributions are divided into 130 humorous acts of canned jokes category and 251 of conversational humor.

4.2 Annotation Scheme and Verbal Responses to Humor

The human responses were coded according to the type of response. The coding system arises from the categories of responses found in previous studies as a starting point (mainly [4, 8]). The annotation scheme for human contributions can be divided into the following dimensions (examples are in English with original French sentences below).

Lack of Verbal Support. The answer can be displayed in multimodal way. Paralinguistic affect bursts, facial expressions or gestures can also be a type of humor support. We consider a category absence of verbal support when the human participant didn’t respond verbally after a humorous act of the robot.

Interjection. This category regroups words uttering emotion or exclamation. This category is made of minimal responses in which laughter or evaluation of the joke would be expected. Interjections do not always clearly signal even the hearer’s recognition, comprehension of the attempt at humor or appreciation.

Subjective Evaluation. As noted by Bell [4], most of the responses can be seen as evaluating the joke in some way (except for the other comments category). This category contains subjective evaluations that did not involve a metalinguistic comments or sarcasm. As demonstrated in the following three examples, the evaluative comments could be directed to the joke, the teller, or both and can assess a positive or a negative evaluation. Example 1 is labeled as positive evaluation of the joke and Example 2 is labeled as a negative evaluation of the teller (the robot).

Example 1

(Participant ID3).

  • [n] No, the answer was: because there is no more pappouth (non bien la reponse etait parce qu’il n’y a plus de pappouth)

  • [h] this is a good one, yeah, it’s funny, I like it (elle est pas mal celle-la ouais elle est rigolote j’aime bien).

Example 2

(Participant ID25).

  • [n] it reminds me of a story of a robot that went into a cafe and splash! (ça me rappelle une histoire c’est celle d’un robot qui est entré dans un café et plouf)

  • [h] yeah frankly you could have done better. (ouais franchement t’aurais pu faire mieux hein là).

Metalinguistic Comment. It comments the previous humorous text itself. As mentioned by Hay [8], these responses allow the human participant to demonstrate recognition and understanding of the attempt of humor, as in the following example.

Example 3

(Participant ID26).

  • [n] what vigor! (/expression with ‘peach’ in French/) You discover all the ingredients [...]! (quelle pêche tu as trouvé tous les ingrédients poils aux dents)

  • [h] you enjoy expressions involving fruits. (tu aimes bien les expressions avec des fruits).

Mode Adoption. As pointed by Hay [8], participants can also respond by contributing with more humor. In this case, the humorous frame is maintained in the second part of the adjacency pair. According to Attardo [1], mode adoption is a way for the speaker to enter into the possible world created by the joker and play along with it. The human participant can mode adopt by two different behaviors (i) he enters the world created by the robot with the humorous act and continue to play with this imaginary world, or (ii) he proposes a humorous act himself. Humor can be supported by echoing the words of the speaker [8]. The participant will repeat the words in appreciation, often as if savoring the humor as in Example 4.

Example 4

(Participant ID33).

  • [n] it reminds me of a story of a robot that went into a cafe and splash!

  • [h] obviously, it took itself for a sugar and then has melted. (évidemment que s’il s’est pris pour un sucre il a fondu).

Example 5

(Participant ID22).

  • [n] well the answer was concentrated milk (non bien la réponse était du lait concentré)

  • [h] can I tell you a joke? (et moi je peux t’en raconter une de blague).

Sarcasm. For the purposes of this paper can be seen as a cutting or a ironie remark intended to express contempt or ridicule. As defined by Haverkate [7], it regroups any instance in which the participant replies by saying the opposite of what they mean or something different from what they mean.

Example 6

(Participant ID28).

  • [n] you know, to have a small head is not really serious, see mine! (tu sais avoir une petite tête c’est pas vraiment grave vise la mienne)

  • [h] no, it does not look too serious (non ça a pas l’air trop grave).

Other Comment. The other comment category is for instances that did not fit into the previous categories. This category mostly regroups sentences made after play on words, made in the game proposed by the robot of discovering ingredients of a recipe. Participant didn’t demonstrate recognition and understanding of the attempt of humor but are engaged in dialog by trying to win the game by discovering all the ingredients.

Example 7

(Participant ID8).

  • [n] yes you are right, but there is more than that, /French idiomatic rhyming playful expression/! (oui tu as raison mais il n’y a pas que ça poil au doigt)

  • [h] hmm, water (hum de l’eau).

5 Results

5.1 Distribution of Human Humor Support Verbal Responses

First, we investigate the distribution of humor responses types after an humorous act made by the robot. Our assumption is that, such as in human-human interaction, participants will use different humor support types in response to the humorous acts made by the robot. Table 2 presents the distribution of human humor support types in response to the two categories of humor made by the robot (see Table 1 for the composition of the humor categories).

The other comments responses were the most common responses to the robot’s attempt of humor, occurring in more than 1/3 of the data (40,94 %). This category occurred mostly after conversational humor and rarely after canned jokes. On the contrary, Table 2 shows that the subjective evaluation responses occurred mostly after canned jokes as for the metalinguistic comments and interjection responses types (69,57 % and 60 %). A strong correlation between the different humorous acts made by the robot and the humor responses types used by participants (Chi-square = 208.4526, df = 15, p-value < 2.2e-16) confirmed this distribution. It suggests that the kind of humor made by the robot strongly determines the types of verbal responses of participants.

Table 2. Humor support types of human responses to an humorous act made by the robot

5.2 Functions of Humor Responses in Interaction

We observe that the humor responses types differ in their contextual appearance according to the robot humorous acts. If we take a closer look into the distribution of the human humor response types, the contextual distribution allows us to regroup the different responses type into supra categories. We assume that each of these categories will play a function in the interaction with the humorous robot.

Given the context, the human humor response types can be grouped into three categories (i) responses types appearing mostly after canned jokes humor, (ii) responses types appearing after both canned jokes and conversational humor and (iii) responses types which appear after conversational humor mostly. If we go deeper, we observe that, in the first category, the elicited responses conform to expectations that participants signal their recognition and understanding of the humorous act. In the second category, participants either supported the robot by developing the joke and contributing more humor or maintaining a humorous frame by teasing the robot. In the third category, participants didn’t feel the need of making an explicit support. Finally, the humor responses types can be grouped in three categories

  • (i) Recognition of the attempt of humor which regroups the responses types evaluation, interjection and metalinguistic comment,

  • (ii) Responding with more humor which regroups mode adoption and sarcasm,

  • (iii) No humor support which is a non recognition of the humorous act of the robot and is made of the other comments responses.

Figure 1 presents the distribution of human responses categories according to the two main types of the robot humorous acts. It shows a similarity in the distribution of category Responding with more humor for both canned jokes and conversational humor. Indeed, the percent of responses in categories Responding with more humor is quite similar after conversational humor and canned jokes performed by the robot (small significant difference on a Student test t = 1.981, df = 180.45, p-value = 0.04912). On the contrary, the distribution of Recognition of the attempt of humor and No humor support categories are significantly different (respectively t = 6.1611, df = 179.306, p-value = 4.634e-09 and t = \(-11.2643\), df = 107.034, p-value < 2.2e-16 on a Student t-test).

Fig. 1.
figure 1

Categories of participant’s responses to humor for Conversational humor and Canned jokes - Chi-square = 208.4526, df = 15, p-value < 2.2e-16 p-value < 2.2e-16

5.3 Sociolinguistic Variables

Then, we assume that the differences in using humor support categories may be explained by sociolinguistic variables or personality traits. Indeed, researches in human-human interaction have been made on identification of differences in the use of humor for men and women [5, 9] or in the use of humor according to personality traits [16].

Age. In this corpus, participant’s age ranges from 21 to 86 years old (average: 40.39, median: 32). We separate participants in 3 groups according to their age. These groups have been made upon the graphical repartition of participant’s ages and have been confirmed by a k-means clustering approach. These groups are: (1) 21–40 with 24 participants, (2) 40–60 with 10 participants and (3) 60–86 with 12 participants. Figure 2 presents the distribution of humor response categories for each age groups, relatively to the humorous acts made by the robot (canned jokes and conversational humor). As shown in Fig. 2, the eldest age group (60–86) showed a marked preference for categories of response Recognition of the humorous act while group 21–40 and 40–60 used more the Absence of humor support category after conversational humor. This difference in the use of Recognition of the humorous act category is significant (Chi-square = 50.2728, df = 6, p-value = 4.145e-09).

If we go deeper into the humor responses categories, a Chi-square exact shows a significant difference for the usage of humor response types over ages (Chi-square = 72.62, df = 12, p-value = 1.033e-10). The group 60–86 showed a marked preference for evaluative responses, which decreases down to 20.44 % among the 40–60 and 5.19 % more for 21–40-year olds. The 21–40 group seems to use more lack of verbal response than the 40–60 and 60–81 groups after canned jokes. On the contrary, this human humor response seems to be less used by the 21–40 group after conversational humor and more used by the other groups. Both Chi-square and the Fisher exact tests were performed due to the very small numbers in many cells. However, the differences were not significant.

Fig. 2.
figure 2

Distribution of humor responses categories given participant’s age

Sex. We observe a decrease of 29 % of men in the 40–60 group (40 % men and 60 % women) and 9 % more in the 60–81 group (31 % men and 69 % women). No significant differences were found according to gender (Chi-squared = 6.111, df = 6, p-value = 0.4109). This suggests that gender may not be a particularly important variable in the humor responses types to a humorous robot.

Relation to Personality Traits and Self-report Questionnaires. Participants filled the Sense of Humor Scale Questionnaire [12] which assess the habits on humor on different dimensions. We found correlation with the dimension Using Humor Under Stress of the Sense of Humor Scale Questionnaire (t = 1.9848, df = 277, p-value = 0.04815). Participants with high value on the dimension Using Humor Under Stress use more mode adoption and sarcasm responses. On the contrary, participants with low value on the dimension Using Humor Under Stress use more evaluative responses after canned jokes. The habits of using humor under stressful situation seems to impact the participant’s responses to humor in interaction with the robot.

5.4 Discussion

We have studied human humor verbal responses types with regards to two main types of our robot humorous acts (canned jokes and conversational humor). We have realized that these responses types can be grouped into three main functional categories: recognition of the attempt of humor, responding with more humor and no humor support. The lack of verbal response humor type cannot easily be placed into these categories. Indeed, into this humor response type the recognition of the absence of verbal support can either be provided with a paralinguistic humor support (e.g., laughter, head nod or smiles) or the silence can be sign of an complete absence of explicit support. All in all, this category merits further investigation on the paralinguistic responses and a multimodal annotation of participants interaction. For example, an examination of laughter for the 60–81 group in the lack of verbal response shows that 25 % of the lack of verbal response human responses are filled by laughter after canned jokes and 40 % after conversational humor. This group have the same percent of lack of verbal response human responses after canned jokes and conversational humor (18 %). This supports the idea that after canned jokes, the major part of the absence of humor support is made by silences whereas after conversational humor, the absence of support is displayed both by other comment responses and silence.

Significant differences in the distribution of the Recognition of the attempt of humor and No humor support responses categories were found. The No humor support responses were mainly observed after conversational humor while Recognition of the attempt of humor responses were mainly observed after canned jokes. This can be explained by the interaction scenario himself. Conversational humor is integrated in a part of dialogue where participants play a game with the robot, trying to recognize ingredients of a recipe. This can also be explained by the failure of the conversational humorous acts. Participants didn’t want to hurt the robot’s face by recognizing lame humor.

We have investigated the impact of sociolinguistic variables on the usage of humor response categories (age, sex, personality traits and sense of humor). Significant differences were found in the use of Recognition of the humorous act and Absence of humor support categories: the group 60–86 showed a marked preference for evaluative responses humor responses type whereas the 21–40 group used more the lack of verbal response humor responses type. Zajdman [18] points out that any joking activity presents a potential face threatening act for both the speaker (because it could fall at) and the hearer (because he might not ‘get the joke’). Our findings suggests a shift, as individuals grow older, for humor responses types which are less face threatening for both the joker and themselves. These results seems to support the findings of Bell [4] that the eldest have a preference for polite responses to failed humor in human-human interaction.

No significant differences were found in our data concerning the impact of sex in the humor support responses. Given the research made on identification of differences in the use of humor for men and women (e.g., [5, 9, 11]), this dimension is worth further investigations. For example, LaCorte [11] observed major effect of gender in the use and appreciation of humor styles: men were more likely to appreciate aggressive and self-defeating humor styles. The use of a hackneyed variety of humor may impact on the absence of difference for men and women.

Finally, we have investigated the impact of Sense of humor and Personality questionnaire on the usage of humor support responses categories. We found correlation with the Using Humor under stress dimension: participants with high value on this dimension used more Contributing with more humor response category. Despite mode adoption was found to be relatively rare in human-human interaction [2], our protocol seems to set up a playful interaction in which participants who have a habits to use humor under stress are willing to entertain the humorous frame.

6 Conclusions and Future Work

This paper has explored ideas related to the responses types of participants to humorous acts made by the robot in a social dialog. This study relies on 49 human-robot interactions and 381 adjacency pairs of humorous acts made by the robot and the following human responses have been extracted. The human responses were coded according to the type of verbal response. As in human-human interactions, the type of humorous act made by the joker robot mainly determines the human responses. Participants rarely notified understanding and recognizing the attempt of humorous after conversational humorous acts (teasing sentences and play on words). On the contrary, joke and riddles are always recognized as an attempt of humour. Three main functional categories of human responses were found (1) providing no support, (2) recognizing the attempt of humor and (3) contributing with more humor. This study reveals the existence of other influence such as the sociolinguistics variable of age or the Sense of Humor (measured by questionnaire).

Future works include further investigations on the non-verbal responses. We will investigate paralinguistic cues (e.g. laughter and affects bursts) and multimodal responses (e.g., smiles, head nod). We hope to find more tangible cues, which could be fruitfully exploited to build and maintain a rich user profile of humorous acts preferences during the Human-Robot interaction.