Keywords

1 Introduction

The development of embodied conversational agents (ECAs) [7] holds the promise of providing important support for their human partners in fields ranging from education to care-giving (see, e.g., [3, 11]). The creation and maintenance of human-agent rapport [12, 13, 21] will be a correspondingly important factor for agents serving in these critical roles. Humans typically signal increased familiarity by, among other things, increasing the amplitude of gestures [6, 19]. However, the reported research on the effects of increased gesture amplitude on human-agent rapport is thin and unsettled. In the work reported here, we test the claim that increasing an agent’s gesture amplitude will lead humans to report greater rapport with the agent, and we do so with quantification of the agent’s gesture amplitudes. Although our hypotheses in initiating this research were that the larger-amplitude gestures associated with extraversion would produce greater feelings of rapport, our results suggest that this is not true, at least for the gestures and amplitudes that our agent used.

In this paper, then, we briefly review the literature relating gesture amplitude with rapport, describe a 55-subject empirical study of this relationship, report the study’s results, and discuss the implications of these results.

2 Background

An ECA is a form of human-computer interaction that involves an intelligent virtual character that can communicate by using speech, facial expressions, and gestures [7]. ECAs can vary graphically in appearance depending on the desired virtual environment the ECA lives in and the role assigned to them. A help-desk ECA may only have its upper body visible, while a museum’s tour-guide ECA may need a full body to convey more lifelike gestures and behavior [22]. ECAs that appear more human-like are easier for humans to interact with and develop rapport [9]. Their combination of gesture, speech, and facial expressions factor not only into believability and rapport but also into the perceived personality of the agent [20]. ECAs are designed to be used in conversational settings. They should be able to handle the discourse within a conversation and respond in humanlike ways to input [4]. ECAs that do not behave in the same humanlike manner that is expected from them may ultimately lose respect and rapport from the user.

To meet the high expectations of users, several features must be considered when designing an ECA, based on its application. Extraversion, agreeableness, and other Big Five personality traits are important to an ECA’s design [8]. Extraversion is being talkative, outgoing, and enjoying social interactions [18]. Users show higher levels of rapport when interacting with extraverted agents even if the users themselves are not extraverted [5, 6]. This may be because extraverts can be perceived as seeking the company of others, and exhibit positive emotions in their behavior [16, 17]. Introverts are characterized by the opposite: they like keeping to themselves, making decisions by reflecting on internal conversations, and avoiding social interactions. These personality traits can be expressed not just through an agent’s speech, but also in conjunction with non-verbal behavior [19, 20]).

Because humans use larger gestures to signal increased familiarity, it seems plausible that an agent’s use of larger gestures would lead their human conversational partners to perceive greater rapport in their interaction. Neff et al. [19] extensively reviewed the relationship of gesture and extraversion. They summarized the relevant literature, in part, as finding that people express extraversion through gestures that are broad and wide rather than narrow. In their study of human perception of agents’ verbal and nonverbal behaviors, they parameterized the relative amplitudes of introverted and extraverted gestures, for the x, y, and z axes of motion respectively, as x*.5, y*.6, z*.8 (introverted) and x*1.4, y*1.2, z*1.1 (extraverted). Their results indicated that, as expected, subjects perceived the larger-amplitude gestures as extraverted.

A subsequent study conducted by Clausen-Bruun, Ek, and Haake [9] examined the effect of gesture amplitude on subjects’ uptake of a short narrative but did not provide details on the relative sizes of the lower- and higher-amplitude gestures. The authors created the high-amplitude gestures by manually extending and fine-tuning the low-amplitude gestures on a case-by-case basis. They found that increased amplitude led to significantly improved comprehension. They also asked subjects to provide a scaled answer to the prompt “I like the character” but did not report results relating amplitude to this emotional preference.

Hu et al. [15] found that users notice if an agent is extraverted, based on the amplitude of its gesture. This study used a storytelling scenario between two agents to see if users perceived the personality of each agent. One agent would gesture with large amplitude while the other used gestures that were the same but smaller in size. Users noticed the difference between the two agents and correctly perceived that the larger amplitude agent was the extraverted agent.

Novick and Gris [21] looked explicitly at the relationship between gesture amplitude and human’s perception of rapport. However, this study had only 20 subjects, and its results were only weakly suggestive of a positive relationship between gesture amplitude and rapport.

The research to date suggests that agents who perform gestures with high amplitude and frequency do appear extraverted to users. Many of these studies, though, the lack a scale to define the amplitude of the gestures and omit clear full-body measurements of the agent in relation to its gestures. The gestures themselves are given a range between extraverted and introverted on a 3D plane, but there is no measurement of the initial point and apex of a gesture. There could be a limit: if a gesture is too large then it may seem unnatural. The same logic applies to gestures that are too small in amplitude. Also the type of gestures used by extraverts may greatly differ than those used by introverts. If that is the case, then agents who are designed to be extraverted should use these specific gestures more frequently than introverted ones. Introverts tend to perform gestures that are closer to their bodies [2], but in most studies done with ECAs an introverted gesture is the same gesture used by the extrovert, just smaller. We are aware of no study that has looked for a diversity of gestures between these personality traits. So it is possible that users may classify an agent as introverted not only on the size of its gesture but on the different types of gesture it uses.

Accordingly, in the present study we wanted to determine whether there is, in fact, a positive relationship between gesture amplitude and rapport. Moreover, in our study we were able to quantify the gesture sizes absolutely. We hypothesized that rapport would be increased in the larger-gesture condition.

3 Methodology

We studied the relationship of quantified gesture to rapport using the ECA application of Gris et al. [14], “Escape from the Castle of the Vampire King.” This is an adventure game, inspired by text-based games such as Zork [1] and Colossal Cave [10], where the user tries to escape from the castle of an evil vampire king. The game had a graphical interface with a full-sized ECA that served as the game’s narrator, and the player controlled the game through speech commands. The game comprised 26 different rooms, each with its own secret passages, exits, items and clues. The game included 3D scenery, recorded speech, agent movement based on motion-capture, and the quick-reference commands and incremental map display. Players’ interactions occurred in 30-minute sessions on two different days, for a total of approximately 60 min per participant. Figure 1 shows a user interacting in a typical scene in the game, with the narrator agent, one of the rooms in the castle, and a map of the castle’s rooms explored to that point in the adventure.

Fig. 1.
figure 1

User interacting in typical scene in “Escape from the Castle of the Vampire King” adventure. The tripod for a video camera, partially hidden in an artificial tree, for recording the user’s interaction is visible at the lower left of the image.

The game dialogs spoken by the agent were scripted in such a way so that, even with limited commands, users could answer with natural language constrained by the specific context. Table 1 presents an excerpt of a player’s interaction, showing the agent’s scripted utterance and the player’s verbal production.

Table 1. An interaction transcript from the first session of the “Vampire King” game.

The experiment used a between-subjects design. In both the experimental and control conditions, subjects played the first half of the adventure in a 30-minute session with the agent using gestures with introverted amplitude. The subjects then returned a day or two later to play the second half of the adventure, again in a 30-minute session. In the second session, subjects in the control condition continued with the agent using gestures with introverted amplitude, and subjects in the experimental condition continued with the agent using gestures with extraverted amplitude. The sessions were completed by 30 subjects in the control condition and by 25 subjects in the experimental condition.

The agent’s gestures were generated via motion capture. Five separate sets of six gestures each were recorded for the high-amplitude and low-amplitude conditions, in an effort to avoid the problems of simple mechanical amplification described by Clausen-Bruun, Ek, and Haake [9]. The gestures were separated into five categories. Three of the categories (A, B, C) were gestures captured from human-human conversation that increased in amplitude. The other two categories (I, E) acted as a control group. The control-group gestures were animated from a previous experiment, with the E animations acting as modified versions of the I gestures but with a larger amplitude. In total there were 30 animations, six from each category. An example of a gesture from each category can be seen in Fig. 2.

Fig. 2.
figure 2

Gesture from each category from left to right: A, B, C, I, and E. A has the smallest amplitude and E and C have the largest.

Figure 3 presents sequences of agent poses that illustrate of one of the extraverted and one of the introverted gestures. The animation gestures were a variation of hand gestures where the agent appears to be explaining or simply speaking. The first sequence of images represents an extraverted animation gesture where the agent lifts her hands and moves the right hand in circles. The second series of images represent the equivalent introverted animation gestures of the first image sequence. As can be noted from the images, the introverted animation does not have the hands lifted as high as the extroverted one.

Fig. 3.
figure 3

Image sequences of introverted gesture (left) and extraverted gesture (right). In addition to the hands being higher in the extraverted version than the introverted, the length of time between each image is longer for the introverted animation movements than for the extraverted movements.

We measured the absolute displacements for the six gestures from both versions of the “Vampire King” agent. To do so, we projected the agent in its actual experimental setting and physically measured the x, y, and z displacements of the agent’s right and left hands. (We measured the displacements in the actual physical space of the experiment so that the results would be more accurate than if we measured from, say, a desktop display). For each hand, we calculated the hand’s displacement vector (in inches) and summed the right and left displacement vectors to produce an overall measure of gesture size. Figure 4 shows one of the authors preparing to measure the x and y displacements for one of the agent’s gestures. Table 2 reports the absolute displacements of representative high- and low-amplitude gestures, Table 3 reports the maximum, minimum, mean, and standard deviation for the gestures, and Fig. 5 compares the mean amplitudes of the stimuli gestures. Overall the gesture stimuli ranged in amplitude from the A gestures as the smallest, through the B, I, and C gestures, to the E gestures, which were the largest.

Fig. 4.
figure 4

One of the authors prepares to measure the x and y displacements of one of the agent’s gestures.

Table 2. Absolute displacements of representative high- and low-amplitude gestures
Table 3. Maximum, minimum, mean, and standard deviation for the gestures
Fig. 5.
figure 5

Mean amplitudes (in inches) of the gestures used in the perception study. Each gesture amplitude was calculated as the maximum of the right-arm and left-arm vectors.

At the conclusion of each session, the subjects completed a twelve-item Likert-scale questionnaire that assessed the three components of rapport proposed in the Paralinguistic Rapport Model [21]: sense of emotional connection, sense of mutual understanding, and sense of physical connection. The questions in each component were balanced with respect to positive and negative responses.

We expected, comparing results in the subjects’ second interaction sessions, that subjects in the experimental condition would interpret the agent’s larger gestures as indicating increased familiarity on the part of the agent. We expected that subjects in the control condition, where the agent used gestures of similar amplitude in both sessions, would not sense an increase in familiarity on the part of the agent. Accordingly, we hypothesized that:

  1. 1.

    Subjects in the experimental condition would report higher levels of overall rapport than subjects in the control condition.

  2. 2.

    Subjects in the experimental condition would report higher levels of rapport for the component of emotional connection than subjects in the control condition.

  3. 3.

    Subjects in the experimental condition would report higher levels of rapport for the component of mutual understanding than subjects in the control condition.

  4. 4.

    Subjects in the experimental condition would report higher levels of rapport for the component of physical connection than subjects in the control condition.

4 Results

Despite our expectations of a positive relationship between gesture amplitude and rapport, the results of the experiment did not confirm the hypotheses. As reported in Table 4, the subjects’ mean ratings for rapport ran consistently in the opposite direction than expected.

Table 4. Mean rapport scores (1–6) for subjects in high- and low-amplitude conditions

While t-tests for the differences in rapport for the emotional and physical rapport components were not significant (p = 0.11 and p = 0.95, respectively, two-tailed test), the t-test for the difference in rapport for the understanding component was significant (p = 0.042, two-tailed test), and the t-test for the difference in overall rapport approached significance (p = 0.052, two-tailed test). Most striking, the difference in across the high- and low-gesture conditions across understanding and emotional rapport components was highly significant (p < 0.01, two-tailed test). In sum, not only was our main claim disconfirmed, but the evidence indicates, at least for conditions of this study, that the relationship runs in the opposite direction: the larger gestures here led to lower perceptions of rapport.

5 Conclusion

We hypothesized that rapport would be increased in the larger-gesture condition. However, our results were exactly the opposite: Rapport fell significantly in the larger-gesture condition. This means that larger gestures are not always better. There are several possible reasons for the study’s results being the opposite of what we expected. These reasons include:

  • The point of the study was to compare the rapport effects of high- and low-amplitude gestures. But it is possible that the high-amplitude gestures we created were simply too big. This factor might be clarified by running perception studies of the gestures from this study, plus additional sets of gestures with other amplitudes. In these studies, subjects would rate the gestures for naturalness, thus providing an empirical basis for characterizing agents’ gestures as appropriately low- and high-amplitude.

  • The agent’s gestures were generated through motion-capture. Some of the gestures, especially the agent’s full-body movement, seemed exaggerated. It may, in fact, be the case that the particular gestures we used for the agent, especially in the high-amplitude condition, were simply odd, awkward, or strange. While it is true, cf. [19], that extraverted gestures differ from introverted gestures in more ways than just amplitude, for the purposes of basic research into the impact of gestures on rapport it might be better to use the same gestures in both high- and low-amplitude conditions, with only amplitude adjustments.

  • The study by Novick and Gris [21] suggested that the difference in gesture amplitude would produce a difference in the subjects’ feelings of physical rapport, while the current study produced essentially no difference at all in the subjects’ feelings of physical rapport. That study indicated that the difference in gesture amplitude would not produce a difference the subjects’ feelings of understanding and emotional rapport, while the current study produced significant differences—in the unexpected direction. This suggests the possibility that the current results can be attributed to statistical anomaly.

The study we report here was subject to several limitations, and the study’s results should be interpreted in light of these limitations, which include:

  • While the “Vampire King” application produced high levels of user engagement [21], the user-agent dialog and the sets of gestures were limited and repetitive. The possibly higher salience for the high-amplitude gestures may have made these limitations more apparent or salient for the high-amplitude condition.

  • The agent’s repertoire of six gestures in each condition meant that the results were vulnerable to the presence of even one, or possibly more, unnatural or otherwise uncharacteristic gestures. If the repertoire had been much larger, then the effect of the inclusion of an odd or awkward gesture would have been diminished.

  • Cronbach’s alpha for the entire survey, combining all three rapport components, was 0.76, which indicates reasonable but not strong consistency among the survey elements. The alpha values for each of the three components was under 0.70. While this may be due simply to having only four questions for each of the components, it may also reflect on the meaningfulness of the survey.