1 Introduction

While virtual agent technology advances towards more and more sophisticated patterns of social interaction, users are still reluctant in the adaptation of this new type of technology for everyday practices and react suspiciously when the virtual agents produce unexpected responses [1]. Thus, it is an important endeavor to identify factors contributing to the acceptance barrier and evaluation of virtual agents in order to address these requirements during the design process. Most research efforts focus on technological properties of virtual agents that are supposed to lead to better evaluations. Some of these features are a virtual agent’s ability to produce gestures [2, 3] and authentic facial expressions [4, 5] as well as its ability to follow social motives [6]. At the same time, user factors were considered only a supplement to the technological property in question with only few exceptions [7, 8]. Yet, understanding user factors in the evaluation of virtual agents might prove a valuable asset because (1) they provide additional insight into the evaluation of a virtual agent’s technological aspects and thereby (2) guide designers attention towards aspects of virtual agents that might have received little attention so far. As one positive exception in this context, Rosenthal-von der Pütten, Krämer, and Gratch [8] found that the predictive value of personality traits for agent evaluations even exceeded the actual behavior of the virtual agent. An important challenge in this endeavor is to identify user factors that are both stable over time and closely related to the evaluation of virtual agents. We propose that interpersonal differences in the tendency to attribute animacy to a virtual agent could serve as such a user factor.

2 User Factors in Virtual Agent Evaluation

User factors in virtual agent evaluations can be conceptualized on a continuum according to their temporal stability. Less stable user factors include variables such as prior experience and familiarity with virtual agents or new technologies in general as well as attitudes towards these technologies. A lack of prior experience with new media technologies has been found to invoke feelings of uncertainty about potential benefits and might lead to feelings of unrest about the adoption [9]. As a result, users might be less willing to engage in interactions with virtual agents (see also: technology acceptance model [10]). Additionally, inexperienced users provide evaluations with greater variance after first use, because they have no anchor for comparison. This bias can result from disappointment due to high expectancies that have not been met [11] or surprise by a virtual agents’ unexpected capability. As a consequence, studies investigating virtual agent technology through biased evaluations suffer from lower than expected statistical power compromising judgments about the effectiveness of the investigated technology. Therefore, less stable user factors such as prior experience with the media technology (i.e. virtual agents) or related technologies (e.g. video games) need to be taken into account in the analysis. Yet, although beliefs about virtual agents shape the early interaction process, they can change quickly with more time spent interacting with virtual agents [9]. Thus, the increased variance in virtual agent evaluations should decrease with growing familiarity. After continued interaction with a virtual agent, the influence of less stable user factors should be negligible.

Stable user factors, on the other hand, influence user evaluations during early stages and even after extended periods of interaction. Most studies control for participant sex, which usually brings in-group and out-group favoritism effects into light [12]. However, sex tends to accumulate variance from correlated stable factors [13], when they are not controlled for. Most of those stable factors belong to the realm of personality traits. Agreeableness, conscientiousness, and neuroticism, for example, have been shown to impact usefulness-evaluations of information technology [14]. In another study, however it has been found that the predictive value of rather general traits is limited, while traits more specific to the interaction with virtual agents prove more useful [8]. In this paper, we investigate a potentially contributing factor that is responsible for the perception of a virtual agent as a living entity in the first place: the attribution of animacy.

3 Animacy Attribution Tendency

The impression of animacy can be seen as a perceptual phenomenon, during which we ascribe intention towards an entity that follows specific movement patterns [1517]. Early work by Heider and Simmel [18] already showed, how observers of rather simplistic movements of geometric shapes infer complex narratives about the motives and actions of the supposed actors. The type of motion carried out by the geometric shapes has repeatedly been shown to have a major impact on the likelihood of animacy attributions [16, 17, 19]. Recently Santos and colleagues [15] investigated different movement parameters and their impact on animacy judgments. They concluded that interruptions of otherwise generic movements, changes of the movement direction towards other entities and interactions between entities reliably lead to a stronger animacy attribution. Thus, even simplistic cues are sufficient to elicit attributions of animacy. When an entity is assumed to be animate, users interpret the capabilities of the respective entities similarly to those of living beings [20]. Virtual agents benefit from these attributions, because even rudimentary indications of animate behavior can lead to complex assumptions about their inner processes rendering them more realistic.

Perceiving other entities as animate is a fundamental human ability [15] and is thus deeply routed within our brains recruiting neural structures with strong ties to social cognition [21]. Entities in our perceptual environment elicit momentary animacy attributions or not. Thus, animacy can be considered as a dichotomous attribution. Human observers, however, might nonetheless differ in their judgments about the animacy of a virtual entity: We know about the artificial nature of a virtual entity and are able to reflect automated reactions towards it. Accordingly, users might either tend to take any deviation from expected behavior as an indicator of the mechanistic nature of a virtual agent or willingly suspend their disbelief by giving less thought to deviant behavior. As a consequence, users might differ in their respective thresholds of perceptual cues that are required to accept virtual entities as being animate. Thus, we argue that possible interpersonal differences in the thresholds of attributing animacy might bridge occurrences in human-agent interaction, in which a virtual agent produces unexpected responses (e.g. repetitious or unrelated responses). Consequently, users with a low threshold for animacy attribution should be less sensitive to odd reactions and thus report less negative evaluation results.

4 Method

Prior research exclusively focused on the influence of behavioral properties of virtual entities on the likelihood of animacy attributions. Thus, we first needed to develop a measure for interindividual differences in animacy attribution. In a second step, we assessed the influence of various user factors on evaluations of virtual agents. We created an online questionnaire containing the newly developed measure and measures for trait empathy, three big-five personality traits, as well as sociodemographic questions. Additionally, we randomly implemented one of four recordings of social interactions with different virtual agents from the internet, for which participants had to evaluate the virtual agent and indicate their subjective impression of animacy. The online questionnaire was filled in by N = 81 students from different disciplines. Because we employed an online questionnaire during which participants had to continually focus their attention on an English video, we had to exclude 25 participants, because they reported insufficient understanding of the language (they were non-native speakers), were distracted during their participation, or reported poor internet connection. As a result we analyzed N = 56 valid data sets (age M = 23.21, SD = 2.96, female = 82,1 %).

4.1 Materials

Animacy Attribution Tendency.

The behavioral cues leading to animacy attributions that are described in the literature include interrupted movement near secondary objects, approach of secondary objects, responses of secondary objects, speed change, direction change, and spatial context [15, 16, 22]. The presence of these characteristics has a strong impact on the perceived animacy and should result in firm attributions of animacy, which is only of little diagnostic value. Therefore, very subtle versions of these behavioral cues had to be employed. We created 165 stimuli depicting a black circle moving across the screen (see Fig. 1), for which we varied a broad range of movement characteristics at different intensities (see Table 1). “The stimuli were created in Adobe Flash (Adobe Systems) and presented to the participants using E-Prime 2 (Psychology Software Tools, Inc.) on PCs with 19” displays and 60 cm seating distance.

Fig. 1.
figure 1

In this example of the employed stimuli, the circle interrupted its linear movement by moving towards the square (arrows added for the purpose of visualization).

Table 1. Movement characteristics varied for to assess agency attribution tendency

A pre-test sample of N = 48 undergraduate students (age M = 21.53, f = 76.6 %) rated the video files on a 4-point scale from “0 – not animate” to “3 – animate”. Thus, participants scoring high in Animacy Attribution Tendency tend to provide higher animacy ratings than participants with lower scores. All stimuli were presented in randomized order with short breaks every few minutes. From the dataset we selected 24 items of different levels of item difficulty and different types of behavioral cues that had sufficient variance to be used as an indicator of the animacy attribution tendency.

For the sake of robustness, the data from all 81 participants of the main study were used to assess descriptive data and scale uniformity. The developed test for animacy attribution formed a highly reliable compound (α = .96) that is normally distributed with a mean score of M = .47 (SD = .19; normed to a range from 0 to 1) and a range of 0.86 of the scale. The results remained stable with the reduced sample size.

Other User Factors.

We used the empathy quotient by Baron-Cohen and Wheelwright [23] to assess trait empathy. It contains 40 items and 20 filler-items that are unrelated to empathy. Participants rated the statements on 4-point likert scales from “strongly disagree” to “strongly agree”. Participants scored 1 point for mildly empathic responses and 2 points for strongly empathic responses (M = 41.52, SD = 10.38, α = .766). The German Big-Five Inventory – Short Form by Rammstedt and John [24] assessed neuroticism (M = .58, SD = .19, α = .73), openness to experiences (M = .76, SD = .02, α = .77), and contentiousness (M = .62, SD = .16, α = .64) using 7-point likert ratings from “strongly disagree” to “strongly agree” and 13 items. We additionally asked for the participants’ age, sex, and prior experience with virtual agents (“How familiar are you with virtual agents?” on a scale from 0 – “not at all” to 10 “very familiar”).

Agent Evaluation.

We employed the Agent Persona Inventory by Baylor and Ryu [25] to assess user evaluations of the virtual agents. We used 15 items with 7-point likert-ratings of the API including the sub-scales credible (α = .76), engaging (α = .86), and human-like (α = .80). Furthermore, we assessed perceived animacy for each virtual agent with a single item for which participants had to rate the agent’s animacy on a scale from “0 – not animate” to “3 – animate”.

4.2 Procedure

Participants were recruited via e-mail. The message contained a link to the online questionnaire. After a short explanation, they first completed the test for animacy attribution tendency. In the next part, they filled in the empathy quotient and the big-five inventory – short form. A video recording of a user interacting with a virtual agent followed the personality measures. The videos were obtained from the internet and represented different virtual agent technologies and scenarios (Obadiah & Spike, chatting, SEMAINE Project; Sgt. Star, virtual guide, ICT Virtual Human Toolkit; Lt. Rocko, virtual patient, ICT Virtual Human Toolkit). We decided to use video recordings from four different interactions to assess the importance of user factors in a more general fashion, while previous research mostly investigated direct interaction with specific agents [e.g. 8]. After watching the complete videos, participants were asked to fill in the Agent Persona Inventory and indicated their perceived level of animacy of the virtual agent. In the remainder of the questionnaire, participants were asked about the above mentioned less stable user factors and sociodemographic variables. We also included several questions to control for attention, Internet connection issues, motivation, and language comprehension.

5 Results

We first wanted to investigate the uniformity of agent evaluations across the four presented video clips. A MANOVA revealed a significant multivariate effect for video clip on the subscales of the Agent Persona Inventory, V = 0.66, F(9,156) = 4.88, p < .001. Although this effect should be considered with caution due to problems with multivariate normality and the assumption of homogeneity of covariance matrices, univariate ANOVAs as follow up analyses confirm this result. We found significant effects for the subscales credible, F(3,52) = 11.23, p < .001, η 2 p = .39, engaging, F(3,52) = 26.96, p < .001, η 2 p = .61, and human-like, F(3,52) = 3.44, p < .05, η 2 p = .17. Closer inspections of the plots for each dependent variable and of post hoc comparisons indicate that these effects are a result of the very positive evaluations of Sgt. Star and to some extent of the negative evaluations of Obadiah.

Table 2 presents first-order correlations between the dependent variables and personality traits across all virtual agents. While situated animacy-judgments correlate highly with agent evaluation scales, .42 < r < .55, p < .01, animacy attribution tendency appears to be unrelated both to perceived animacy, r < .01, and agent evaluation, r < .07. Interestingly, we found a marginally significantFootnote 1 correlation between the animacy attribution tendency and openness to experiences, r = – 0.24, p < .10.

Table 2. Zero-order correlations of the dependent variables and personality traits across the four virtual agents.

Further qualitative inspection of the correlation pattern for the different virtual agents revealed indicated heterogeneous results. Correlations between animacy attribution tendency and perceived animacy, for example, varied between – .30 and .24. Additionally, the empathy quotient and conscientiousness turned out to correlate significantly for only some of the virtual agents. Yet, the correlational differences between the virtual agents remain statistically insignificant due to low test power for the within-groups analyses (N = 17, 12, 11, 16).

To account for the significant differences in the evaluation ratings between the virtual agents, we z-standardized the evaluations within each virtual agent group. Thus, the difference between the virtual agents is removed in further analyses. Next, we carried out three multiple regression analyses, one for each subscale of the Agent Persona Inventory. Overall, the amount of explained variance was rather low. The included user factors only managed to explain sufficient variance for the engaging-subscale, R 2 adj. = .11. The predictive power of the animacy attribution tendency was low, β < .20, n.s. However, trait empathy explained a substantial amount of variance for the factors credible and engaging. Thus, more empathic participants perceived the virtual agents as less credible and less engaging. Additionally, older participants rated the virtual agents as less engaging (see Table 3).

Table 3. Hierarchical regression analyses for agent evaluation

We again inspected individual regression analyses for each virtual agent. Despite the low statistical power, we found different patterns of significant predictors. These differences appeared to be most pronounced for the engaging-subscale.

6 Discussion

We sought out to assess the differential impact of interindividual differences in animacy attribution on virtual agent evaluation compared to other user factors. Generally, animacy judgments appear to be strongly related to the evaluation of virtual agents. This seems to be true only for situational judgments and not for a general tendency of animacy judgments. Yet, it should be noted that the situational judgments about the video recordings were most likely not solely based on observed movement patterns. Instead, the strong correlation between situational animacy judgments and agent evaluation could have been subject to anchoring, because the question immediately followed the Agent Personal Inventory, which assesses agent evaluation in general. Therefore, participants might have employed a wider interpretation of animacy than intended. This is also supported by the fact that agent evaluation scales tend to correlate highly as evidenced in Table 2.

Despite the fact that our scale for the animacy attribution tendency reached an exalting internal consistency of α = .96, the question remains why we could not observe a correlation with agent evaluations and situational animacy judgments. Especially the latter case raises the question, whether the scale faces issues in validity despite its high reliability. One possible explanation could lie in the nature of animacy attributions: As recognizing animacy in other entities is a basic cognitive ability [15], there might simply be no differences between individuals. However, after an analysis of animacy judgments using fMRI Santos and colleagues [21] concluded that both low- and high-level brain structures are recruited in the process. Thus, while the detection of related movement is part of low-level cognitive abilities, interindividual differences might arise at the level of social cognition in the social neural network. These differences are also reflected in the high range of scores in the test we developed. However, there might have also been other factors such as a general response bias during uncertain decisions that could be partially responsible for the variance within the sample. Lastly, because demonstration videos of virtual agent technology usually do not include unexpected behaviors, the quality of the observed behaviors might have been too high compared to a normal interaction scenario, so that interindividual differences in the threshold of animacy attribution could have played a role. At this point the relevance of interindividual differences in agency attributions for agent perception remains unclear. The marginally significant correlation with openness to experiences suggests that users seeking out interesting experiences are less likely to attribute animacy. This might be related to higher expectancies from virtual entities that are required to perceive it as an interesting experience.

The low predictive value of the other user factors in our study challenges the notion that they are an important element to fully understand agent evaluations when technological properties are manipulated. We were only able to observe effects of trait empathy on two of the three employed evaluation scales. The remaining traits from the big-five personality model had no predictive value. However, this result is in line with the argument that we need to consider personality traits that are more specific to the investigated scenario [8]. The correlational pattern of personality traits and agent evaluation could also depend on the employed measure for agent evaluation, because the Agent Persona Inventory has primarily been developed for learning scenarios with pedagogical agents. Furthermore, the diagnostic properties of the Agent Persona Inventory have not fully been explored, yet. On another note, the rather small sample size of the current study should be considered in the interpretation of the results as well as the fact that participants in our study watched videos of virtual agents rather than directly interacting with them.

In separate analyses for the different virtual agent video clips that were part of our study we found indications that the correlational pattern of user factors and agent evaluation differed depending on the displayed virtual agent. This is unexpected, because we usually assume that our findings generalize to comparable scenarios. The absent correlation between animacy attribution tendency and agent evaluation might also be a result of this inconsistency. On the other hand, studies investigating the link between user factors and agent evaluation [7, 8] so far only included one scenario. Still, the relevance of the investigated scenario for the influence of user factors should—given that this indication is confirmed by a larger sample—be high, because the use cases of current virtual agent technology are rather diverse. As a consequence, we would not be able to include a fixed set of personality inventories into user studies, but would have to identify relevant user factors in the process.