Keywords

1 Introduction

Virtual humans in the digital world are beginning to show promise in applications such as education, training, therapy, and entertainment. In large part, this is due to their ability to evoke social responses in real humans. With the ubiquity of smartphones, the obvious next step for virtual humans is integration into various mobile applications. People create and strengthen their social relationships by communicating each other over video chat services (such as Skype). We believe that virtual humans, presented over video chat services and delivered using mobile phones, can be another effective way to deliver counseling and coaching applications. We suggest that virtual humans who communicate over videoconference services like Skype and Apple’s FaceTime have unique advantages over other forms of presentation, including characters in native smartphone apps. The virtual humans may appear to be more realistic since they can communicate using video conferencing, much like real humans, in contrast to the characters running within native smartphone apps.

However, it is well known that users often opt to protect their privacy and are more inclined to share intimate details about themselves when they feel their information will remain confidential. The potential benefit of employing a virtual human in this case is that users experience the nonjudgemental anonymity of a computer not knowing who they are, while still being programmed to behave socially adept enough to maintain an engaging conversation that promotes intimacy [14, 15]. This effect however may be altered when speaking with a counselor who is located far away, as a user might not feel that their conversation is private. Therefore, new research is needed to understand how to effectively present virtual characters on a mobile video platform. We hypothesize that factors such as the context of a smartphone, how the virtual human is presented within a smartphone app, and indeed, the nature of that app, can profoundly affect how a real human perceives the virtual human. Furthermore, video and audio artifacts inherent in Internet based video conferencing may also lower the realism requirements for virtual humans. We have also identified additional design questions involved in developing mobile virtual human experiences. What behaviors, visuals, and utterances might make characters more relatable and increase presence? Can reciprocity and other social behaviors encourage repeated interactions with the virtual character to strengthen social bonds between the character and users?

Our goal is to develop design guidelines for the deployment of virtual humans on smartphones, with a specific focus on mental well-being applications, such as coaching and counseling. To achieve this end, we have developed an apparatus that allows virtual humans to initiate calls and interact over Skype. In this research, we designed two experiments to explore user perceptions of a virtual human interviewer over mobile video chat interactions based on two theories: Presence and Social Exchange Theory.

1.1 Theoretical Background and Research Questions

Presence.

It is desirable to feel presence (being there) with a partner when communicating with him/her via technology such as virtual reality. Previously dominant definitions of this phenomenon of presence with a partner have included the illusion of physically being somewhere together although your partner’s physical body is, in fact, not there [5, 18].

Lee [17] later redefines this feeling of “being somewhere” as a mental state in which the users do not notice that their experience is not real. The experience is further described as sensory (or nonsensory) awareness of virtual (para-authentic or artificial) objects as authentic objects. This redefinition is interesting because it separates unsolicited user feelings of presence from the existing definitions of presence [5, 18] in which the term “illusion” may imply a “somewhat undesirable” feeling of presence [17]. Lee suggests three types of presence to represent this concept: physical presence, social presence, and self presence. These three sub-types of presence were coined originally by Biocca [5], but with different definitions from Lee’s. Lee contends that Biocca’s sub-type definitions were not mutually exclusive and did not explain feelings of presence that could be formulated via low-tech media (e.g. TV). In Lee’s definitions, physical presence indicates user feelings of virtual objects as sensory (or nonsensory) awareness of actual physical objects for which user self-transportation is not a requisite. Social presence signifies sensory (or nonsensory) awareness of virtual social entities as actual social actors through both one-way and two-way communication. Self presence includes user feelings of a sensory (or nonsensory) experience of a virtual (para-authentic or artificial) self/selves as one’s actual self, for example, perception of an avatar’s body as one’s own body. In Lee’s definitions of presence, both physical and social presence do not require user feelings of being there in virtual reality. Lee argues that his redefinition of presence describes user feelings of presence even when using and communicating over a low-tech medium. We also argue that communicating via smartphone does not require users to feel being transported into another world (i.e. virtual reality), but can demonstrate some forms and levels of presence.

Slater [25] also addresses user feelings of presence in both virtual reality and non-virtual reality situations, such as a desktop. He defines realism and presence using place illusion and plausibility illusion. Place illusion is similar to the prevailing definitions of presence, i.e. the sense of “being there” [5, 18]. Plausibility illusion is defined as the illusion of an event without denying that the event is happening virtually. This relates to consistency and correlations between events in the environment not directly caused by the user, for example, the entities in the environment might move to avoid the user. He states that users could have the plausibility illusion without physical realism, as characters might respond to the user, but might not be realistically modeled. He argues that the measures of presence should be different between systems, such as between a virtual reality setting and a desktop setting (or a setting using a smartphone). This implies that aspects of presence could be applied when there are not user feelings of transportation into another world, which Lee [17] also contended.

We conclude that portions of the existing definitions of presence described above could characterize user feelings of being together with a virtual human displayed on a smartphone. We contend that Lee’s concept of social presence applies to sensory (and nonsensory) awareness of a virtual human as a social actor and the smartphone video interaction does not require user feelings of being there through transportation of self into a virtual world (i.e. virtual reality). The notion of social presence has further been explored by other researchers [22] and used interchangeably with “copresence” [2]. Slater also mentions a similar concept and suggests that user feeling of presence may not require physical realism. However, we were interested to examine whether social presence might be influenced and even strengthened by physical realism.

We hypothesized that people might perceive a virtual human to be a responsive animated character and feel some level of presence with the character as it is, but they are more likely to consider the virtual human to be realistic and socially present if they believe the virtual human is in a location that appears to be realistic. Thus, we were specifically interested in Lee’s concept of social presence and how strongly people would feel presence with a virtual human based on the type of physical location displayed behind the virtual human. We formulated a question to investigate this subject as follows:

  • RQ1: How are responses to a virtual human different when individuals interact with a virtual human that is presented with a realistic background, compared to a virtual human with a featureless background?

In addition, gender could be considered as one of the most critical factors for the nature of smartphone use. Geser [8] reports that females use a smartphone more for intimate purposes than males. Females tend to share and express their privacy and emotion with others with long chats and are more likely to articulate their anxieties [23]. Males are more inclined to use a smartphone for functional purposes (e.g. for coordinating meeting times and places) [19]. Exline et al. [7] also suggest that females desire more affection and involvement in their relationships than males. Another finding demonstrates that females are more prone to display affectionate behavior such as immediacy and inclusion through body gesture and orientation as well as gaze in same-sex interactions [11].

These findings imply that female users are inclined to interact with a virtual human displayed on a smartphone more for intimate and anonymous conversations than male users, especially when the female users perceive the virtual human as a counseling interviewer. We formulated an additional research question to investigate this subject as follows:

  • RQ2: How do responses to a virtual human differ based on gender when individuals interact with a virtual human presented with a realistic background, compared to a virtual human with a featureless background?

Social-Exchange Theory.

Social-exchange theory represents human interactions as being driven by a social economics [6]. The theory specifically emphasizes that humans expect to get rewards for what they provide or share with others. The rewards could range from material assets to social currency, such as information, services, etc. Social exchange values social meaning in action rather than material value, which is critical in economic exchange. The reciprocity norm may play a key role in social exchanges between human beings [9] because reciprocity allows people to initiate investment without fear of no rewards [1].

Among many types of reciprocity, self-disclosure reciprocity is known for strengthening social connections [13]. Moon’s study [20] showed that reciprocity would encourage communicators to disclose high risk personal information. Moon reported that communicators’ self-disclosure reciprocity promoted further self-disclosure and attraction to each other. She defined self-disclosure reciprocity as one of the social rules applied for human interactions. She further described self-disclosure reciprocity that would allow people to interact with computers as if they were humans. It could happen that the exchange may not be even, but it eventually should equalize. That is an essence of the reciprocity norm.

In counseling interactions, it has been reported that patients disclose more intimate information if counselors share their own intimate stories with the patients [14]. Based on the theory, this form of social reciprocity could work successfully. Furthermore, communication researchers [26] suggest that additional interactions could encourage intimate relationships better than just a one-time interaction as social exchanges for relational life occur over time. Bickmore and colleagues developed the Relational Agent, a virtual human designed to build long-term, socio-emotional bonds and provide advice on a user’s workout behavior [3, 4]. The agent’s daily interactions with a user occurred over a long period (e.g. a month). Most of the Relational Agent studies explored the use of agents on mobile devices. The results of the studies indicated the Relational Agent to be an effective human-agent interface with regard to health education and behavior changes. Bickmore and colleagues also posited that if an app-based mobile agent had the potential to interact with a user for extended periods of time or even constantly, very close relationships with the user become possible [3].

We were particularly interested in the effect of a virtual human on user perceptions of and reactions to the virtual human when s/he has a longer term interaction with the virtual human. Such a study to explore longer term, repeated interactions with a virtual character while also involving smartphone video chats has rarely if ever been performed, due to logistical and technical complexity required. There are two differentiators between our approach and the existing study [3]. One notable difference is that our approach is designed to give users the impression that the virtual human has a presence in the real world like a real human, and has the interest and agency to reach out and contact the user. Another distinction is that Bickmore’s virtual human plays the role of a workout coach, while our agent uses a counseling interview style interaction and is meant to ultimately play the role of a coaching counselor working with individuals confronting mental issues.

In our study, a real human user and a virtual human mutually exchanged intimate information, consisting of increasing personal questions. We expected these conversational exchanges could encourage users to engage in self-disclosure reciprocity with their virtual human partner, based on Social Exchange Theory. According to the theory, we would be able to determine whether user preparation for their answers might have enhanced the user’s bonds with their partner through the repeated interactions in this study. We further expected that the bonds could affect user feeling of presence with the virtual human. The gifts and rewards were operationally defined as the degree to which each conversant voluntarily disclosed intimate information. We formulated a question to investigate this subject as follows:

  • RQ3: How are perceptions of a virtual human different when individuals interact with a virtual human more often (multiple calls), compared to interacting with the virtual human less often (a single call)?

1.2 Research Approach and Evaluation

Our approach to mobile counseling is focused on bringing virtual human coaches to smartphones through videoconference services like Skype. This approach could utilize the accessibility of the smartphone and the ability to call users thus allowing exploration of feeling of presence in mediated interactions and longer relationships consisting of repeated interactions and follow-up interventions between users and virtual coaches.

We created an apparatus to conduct Skype calls with users (see Fig. 1).

Fig. 1.
figure 1

The apparatus uses a web camera to present imagery of a virtual human in Skype [left]. A male or female version of the virtual human can be presented. The virtual character can initiate a Skype call and communicate to a user over (a) a realistic background or (b) a featureless background on a smartphone [right].

The apparatus allows an experimenter to call a user’s Skype ID and transmit video of a virtual human rendered within a Unity game engine application on a desktop computer in our lab. We created both a male and a female virtual human (Caucasians, 35 years of age) to match genders with the human users to control for gender effects. An experimenter remotely controlled a virtual human displayed on the user’s smartphone using Wizard of Oz (WOZ) methods. An unseen experimenter would trigger various virtual human verbal behaviors (i.e. questions, intimate back stories, and empathetic feedback) and nonverbal behaviors (i.e. smiles and nods). The virtual human had the ability to display various non-verbal actions, such as small or large nods, and small or large smiles. The virtual human was further able to deliver several variations of verbal empathetic feedback: “OK,” “I see,” etc. We measured user presence with the virtual human using the Virtual Rapport scale (e.g. “I felt I had a connection with my partner.”), social attraction toward the virtual human using the Social Attraction scale (e.g. “A virtual character would be pleasant to be with.”), partner perception using the Partner Perception scale (e.g. “compassionate”), and other social perception variables. Participants responded to each question using Likert-type scales. The Virtual Rapport scale is constructed from a combination of the Co-presence (or Social Presence [16, 22]) scale and the Rapport scale [10]. The combined scale has twenty-three items using Likert-type scales with an 8-point metric (1 = Strongly Disagree; 8 = Strongly Agree). The Social Attraction scale has six items with 8 point Likert-type scales (1 = Strongly Disagree; 8 = Strongly Agree) [21]. Items include: “I would like to have a friendly chat with a virtual character” and “I think a virtual character could be a friend of mine.” The Partner Perception scale is a semantic differential with 21 bi-polar pairs of adjectives (e.g. likable-dislikable, threatening-not threatening), each on a 7-point metric [15]. All scales displayed good reliability [14, 15].

2 Experimental Design and Findings

2.1 Experiment 1: Realistic Background VS. Featureless Background

In Experiment 1, we investigated the effects of manipulating the visual backdrop showing the virtual human’s location during a Skype call. A detailed and realistic backdrop could allow users to feel an increased sense of presence with a virtual human by perceiving the virtual human to be located in a realistic setting. We also hypothesized that a subjective sense of physical distance could also be a factor that could alter a user’s feelings of presence. We examined user perception of a virtual human that was displayed on a realistic background, compared to a virtual human presented on a featureless (less realistic) background during a Skype video call.

We designed two different backgrounds to serve as cues for a virtual human’s location in Experiment 1. The experiment is a between-subject design that examined user perception of a virtual human and included two conditions: a realistic animated video background vs. a less realistic featureless gray background (see Fig. 1). The “realistic background” condition used an outdoor scene, videotaped through a window, containing trees that were moving in the wind. The video clip was seamlessly looped during the entirety of the user’s interaction with a virtual human. The “featureless background” condition used a plain, grey colored background.

Participants and Procedure.

Participants were recruited online, via Craigslist. The qualified participants were over 18 years old and able to communicate comfortably in spoken and written English. We recruited a total of 43 participants (51 % men, 49 % women) whose average age was 35 years (Mean = 35, SD = 13.13) in the study.

In the experiment, participants were asked to come to our lab and have a conversation with a virtual human coach over Skype. The users were given a Google Nexus smartphone that included a preset Skype ID for the study. An experimenter remotely controlled a virtual human using a WOZ method and presented the character over Skype using the same apparatus as previously described (see Fig. 1). In both of the conditions, users were first asked to fill out a pre-questionnaire to gather demographic information prior to starting an actual interaction session. They were then asked to wait for the incoming call sound from the phone provided by the experimenter. The sound indicated that someone, i.e. the virtual human, was calling. Users were then asked to click the phone call pick-up symbol on the phone. The users were then able to see and converse with the virtual human.

After completing the interaction session described above, user were asked to fill out a post-questionnaire to assess their perception of and responses to interactions with a virtual human. Users were then briefed and compensated $25 for their participation. Participation took about 60 min or so to complete, including filling out a pre- and a post-questionnaires. Questionnaire data was collected through online surveys.

Findings.

To detect significant effects within the interaction data, we used 2-way ANOVA analysis using condition and gender as 2 independent variables. We did not find statistically significant differences for user presence with and social attraction toward virtual humans that were displayed with different background types. However, the results showed that there was a statistically significant interaction effect between condition and gender [F(1, 39) = 7.03, p = .012] regarding user social attraction toward a virtual human (see Fig. 2). Male users (M = 5.74, SD = 1.56) were more socially attracted to a virtual human when they interacted with the virtual human that was displayed with realistic background than female users (M = 3.80, SD = 1.79). Meanwhile, female users (M = 5.60, SD = 1.70) were more socially attracted to a virtual human when they interacted with the virtual human that was presented with featureless background than male users (M = 4.77, SD = 1.77). The results also showed a trend of users tending to feel greater presence with a virtual human that was displayed with realistic background (M = 5.07, SD = 1.26), compared to interacting with a virtual human that was presented with featureless background (M = 4.87, SD = 1.60) although this trend was not statistically significant.

Fig. 2.
figure 2

The result of 2-way ANOVA for user social attraction toward a virtual human

2.2 Experiment 2: Single Call VS. Multiple Calls

In Experiment 2, we explored whether virtual humans could establish and maintain long term relationships by leveraging ideas from Social Exchange Theory [6, 26]. The theory postulates that the exchange of gifts and rewards can allow people to strengthen their social relationships.

The experiment used a within-subject design that investigated user reactions to a virtual human displayed on a smartphone over multiple interactions with the virtual human. In the experiment, participants were asked to schedule times to receive 3 calls on their own smartphone from a virtual human coach over the course of 3 consecutive days in any place where the participants could have a private conversation. We expected the interactions could encourage users to feel reciprocity with their virtual human partner, based on Social Exchange Theory. The study would allow us to determine whether user preparation for their answers might have enhanced presence with their virtual partner over the repeated interactions in this study.

Participants and Procedure.

Participants were recruited online, via Qualtrics. The qualified participants were over 18 years old, able to communicate comfortably in spoken and written English, and requested access to a smartphone running the Skype app and Wi-Fi or 3G/4G Internet access. We recruited a total of 19 participants (47 % men, 53 % women) whose average age was 32 years (M = 32, SD = 9.88) in the study.

Potential participants were first asked to fill out a pre-questionnaire to gather demographic information and filter unqualified participants prior to starting an actual interaction. Participation in the study required access to a smartphone running the Skype app and Wi-Fi or 3G/4G Internet access. Qualified participants were asked to schedule their interaction times with a virtual human for 3 sessions over 3 consecutive days. Participants were requested to answer a total of 24 questions over the 3 call sessions in the form of a counseling interview. The virtual human asked 8 questions during each call and shared some personal information before asking each question. The participants were also asked by the virtual human to prepare answers for interview questions given in the next session and promised that the virtual human coach would also prepare answers for the same upcoming questions. The virtual human’s verbal questions and responses were pre-scripted using a structure and context adapted from a previous study [15]. During the interaction, users were asked questions of increasing intimacy by the virtual human (e.g. “How old are you?”). The virtual human prefaced questions with its own intimate anecdotes to reciprocate self-disclosure and advance the conversation (e.g. “I am 35 years old.”).

The participants were also asked to fill out a post-questionnaire after both the 1st and 3rd call sessions to assess any changes in user responses between the 1st and the 3rd interactions with the virtual human coach. Each session took about 10–15 min to complete, including a post-questionnaire for the first and last sessions. Questionnaire data was collected through online surveys. The participants were given $50.40 compensation when they completed the study.

Findings.

To detect significant effects within the interaction data, we used a Paired-samples T-test, an analysis used when the same subjects participated in all conditions (two different occasions) of an experiment.

Fig. 3.
figure 3

The result of Paired-samples T-test for user partner perception (Compassionate) between the 1st call and the 3rd call

The results demonstrated that users felt the virtual human was a compassionate partner when they communicated with the virtual human several times [(M = 5.42, SD = 1.50), t(18) = −2.35, p = .031], rather than just once (M = 5.05, SD = 1.31) (see Fig. 3). There were no other significant results for the other variables. In addition, the results showed a trend of users feeling greater presence with the virtual human after experiencing all three calls (M = 5.80, SD = 1.41), rather than just one call (M = 5.62, SD = 1.24), although this trend did not reach statistical significance.

3 Conclusions and Implications

In Experiment 1, the results of the experiment demonstrate that overall, the background of a virtual human’s location does not significantly affect presence with or social attraction toward the virtual human communicating over a smartphone. However, we discovered that males were more attracted socially to a virtual human that was presented with a realistic background, while females were more socially attracted to a virtual human with a featureless background. Congruent with previous studies, users were not required to transfer themselves to another place (self presence) when using a low fidelity medium such as a smartphone but still felt aspects of presence (i.e. social presence or co-presence). We used the measure of Virtual Rapport, which corresponds to social presence (or co-presence), which we argue is an appropriate measurement to assess user presence in this study. According to the results of the experiment, we contend that there would be no difference in perception of a virtual human with regard to their perceived physical location or realism [25]. We further argue that males are more socially attracted to a virtual human presented over a realistic background, because it demonstrates that the virtual human is located somewhere outside the confines of the smartphone. However, females demonstrated greater social attraction toward a virtual human displayed over a less realistic background, which could be expected to decrease feelings of presence in general. Regarding the nature of females’ smartphone uses described in Sect. 1.1, we contend that females were drawn to the featureless surroundings, which could be interpreted as more private, less distracting, and more anonymous. Overall, our findings imply that females like to interact with a virtual human who is located in a private setting.

In Experiment 2, the results of the experiment revealed that users felt the virtual human was a compassionate partner when they shared interactions over multiple calls, rather than just a single call. The results further showed a tendency of users to feel greater presence (measured using the Virtual Rapport scale) with the virtual human after experiencing all three calls, compared to having just one call. Compassion and rapport are of particular interest in counseling applications. Shallcross [24] noted that compassion is one of the qualifications of a great counselor. Joe et al. [12] also emphasized the importance of rapport between a counselor and a client in psychotherapeutic interactions. In our study, the virtual human and users exchanged their own intimate stories, which could be considered the social meaning in action and a value of social exchange [6, 26]. The experiment protocol further facilitated the reciprocity norm by exchanging intimate stories equally over the course of interactions. Therefore, we contend that users were able to interact with virtual human coaches as if they were real humans [20] and perceive the virtual humans as compassionate partners when the users had more interactions with them.

In conclusion, we argue that there are potential advantages for a virtual human coach to use customized or personalized backgrounds for particular users who receive video calls from the virtual human coach. We also argue that there are potential advantages for the use of a virtual human coach on a smartphone for longer term counseling interactions, specifically when users receive multiple calls initiated by the coach. These conclusions are coupled with other technical advantages of this form of interaction. Since the virtual human can “call” the user, it may grant the virtual human an appearance of greater agency. The common conception of Skype as a communications channel connecting one real world locale to another could also reinforce the belief that the character has a real presence somewhere in the world. We are continuing to explore how to leverage the context of this particular communications channel by running studies that manipulate the context of the communications channel. We plan to begin constructing a theory that captures the significant elements (technical, social, cultural, etc.) that provide context and subtext for real human to virtual human interactions mediated by smartphones.