1 Introduction

Prior research has studied improving the sensation of telepresence by using high-resolution displays or by providing a physical manifestation via a simple robot with a video monitor on which the participant’s face is shown [6, 7]. However, using only a video stream, it may be difficult to provide the sensation that a user is in the partner’s location even when the user feels the partner is with him or her. Other work has attempted to enhance presence using augmented reality techniques [810]. We designed and conducted our experiment to examine the sense of telepresence a person has with a partner in a mixed reality environment. We hypothesize that the reasons a user will not experience the sensation of telepresence are, at least in part, due to constraints on visual information and a disagreement between a user’s visual information and his or her actions. First, the visual information provided must give the user a sensation of actually being in a particular environment (place illusion). Moreover, a user will likely have a diminished sense of telepresence if their physical actions – such as moving their arms – do not result in an appropriate visual change (situational plausibility) [4]. Finally, even when users perform physical actions, they will feel less present if these actions do not result in corresponding physical changes in their partner’s location. To test these assumptions, we designed the following experiment (Fig. 1).

Fig. 1.
figure 1

System setup

2 Experimental Design

We used a head-mounted display (HMD) for immersion by fixing the participant’s viewpoint to the collaborator’s location, which was in a different room. A camera in the collaborator’s room was connected to the participant’s HMD. We asked the participant to perform simple cooperative tasks with the collaborator using a control device in his or her location to control a screen located in the collaborator’s room. We hypothesize that participants will feel a higher degree of presence if they see their own hands and fingers during these interactions. To verify this hypothesis, we carried out an experiment with three cases.

2.1 System Design

First, we used a virtual model of the participant’s hands to mimic his or her actions and displayed the model on the HMD. In the second case, we used a generic hand model that looks like a skeleton. Finally, we conducted one case using no displayed hand model. The experimental room has a Leap Motion to track the participant’s hand and finger movements. The experimental setup, including the Leap Motion and control device in the participant’s room and the camera and screen in the collaborator’s room, is shown in (Fig. 2).

Fig. 2.
figure 2

Experiment rooms

The collaborator sits in front of a camera and performs tasks with the participant and sometimes looks into the camera or interacts with the screen that the participant controls (Fig. 3). In this way, we aim to provide the participant with the sensation that the collaborator is sitting right next to him or her.

Fig. 3.
figure 3

Communication between participant and collaborator

While the participant controls the screen, the collaborator also prompts him or her to perform simple gestures involving the participant moving his or her arms (Fig. 4).

Fig. 4.
figure 4

(a) A case using no displayed hand model (b) A case using a generic hand model (c) A case using the participant’s hand model.

Before starting the experiment, we asked the participant to fill out a demographic questionnaire. We also had the participant fill out simple experience questionnaires during each of the three experimental cases. A final questionnaire following the experiment asked the participant to compare these three cases. In this study, we assume that participants will experience a greater sensation of presence when they see their own body parts on the HMD corresponding to their actions. This study is preliminary to work we are carrying out that involves inhabiting a remote robot, seeing everything from the robot’s perspective except for one’s own body parts. To develop the experimental environment, we used Unity as a platform and used Mono for the IDE on a Windows 7 64-bit, AMD Phenom™ X4 B95 3.00 GHz PC with 6 GB RAM.

2.2 Task

We asked participants to perform a simple task: counting a number of objects and solving math problems with small numbers. During the task, the participant answered the questions on each slide displayed on a remote screen seen through the HMD and controlled by the participant’s hands. While the participant controls the display screen and answers the questions, the collaborator interacts with the participant, providing verification for the participant’s answer. For instance, the collaborator might ask the participant “how many blue cube do you see?” or “what is the result of the equation?” and then the collaborator verifies or corrects the answers given by the participant (Fig. 5).

Fig. 5.
figure 5

Example of task contents

2.3 Questionnaire Design

We designed a set of questionnaires similar to those in Lessiter, J., et al. [11]. The set is composed of four parts: sense of physical space, engagement, control and negative effect. Each category has two or three questions with answers selected on a five-point Likert scale. Before beginning the experiment, we asked participants to fill out a demographic form; after the end of each experiment, we asked participants to fill out an experience questionnaire consisting of interval scale questions. After finishing the last experiment, each participant completed a final questionnaire consisting of comparison questions between the three types of experimental conditions. This final questionnaire uses a five-point scale as well.

3 Result and Discussion

In this section, we show results that include graphs of the interval questions, categorical questions and final questions for comparison and discussions. Since the purpose of our paper is to report on the formative phase of a more extensive study associated with enhancing telepresence through an increased sense of body ownership, we conducted the experiment with a small number of people. The participants have different background knowledge on computers, virtual reality, and the concept of telepresence. To show only the preference of each different experimental condition, we did not apply an analytical method but used a simple tally. To create the preference chart, we counted the number of one to five Likert score responses for each question for each of the three cases. Figure 6 displays the questions we asked. We conjecture that a personal model gives the most sense of telepresence as indicated by the results depicted in the first graph item that is associated with the interval question ‘You had the feeling that you were in a different room’. Also, having no model has the most negative effect: dizzy and unnatural control (tenth and eleventh bar in (Fig. 6)). These results make sense because the participants felt they were in the collaborator’s room, communicating with the collaborator using an iPad to control the screen, but they did not see any part of their own body in that context.

Fig. 6.
figure 6

Questionnaires result

However, there were no significant differences between each of the three cases in the remaining questions because we had so few participants. To address this weakness, we represent the categorical graph, which is a summation of each question in four categories to show participant preferences concisely (Fig. 7). As one can see, in the graph of the first category, ‘Sense of Physical Space (Being there)’, our participants perceived that using the personal model, the generic model and no model ranked high to low, respectively. Surprisingly, though, using a personal model has the lowest score in the third category, ‘Ease of Control’. This may relate to the fact that participants sometimes saw uncontrolled finger or hand movement when they tried to manipulate the iPad screen by touch, since the Leap Motion does not detect hand motion very well. After finishing all experiments, participants mentioned the personal model case was not working correctly to control the hand, making it confusing to control the iPad screen. Actually, since we used the same skeleton model for the personal model as the generic model but with a different texture, it should have had similar tracking and rendering performance. However, participants did not perceive control to be weak with the generic model, perhaps because the model consisted of only a simple skeleton whereas the personal model had an explicit hand model with texture, so the lack of control was more obvious to the participants. The fourth category graph shows an interesting result: the total score is relatively low since we have two questions for the fourth category while the others have three questions each. The lack of a model caused the participants to experience negative effects while controlling the physical screen via an iPad; these included feeling dizzy or perceiving unnatural movement, as seen in (Fig. 6). To enhance the sense of telepresence, any disagreement between a user’s visual information and his or her actions is an important factor because of its negative effects. However, we did not encounter a remarkable distinction in the second category of (Fig. 7), ‘Engagement (involvement and interest in the content)’. We assume the reason is that, in all cases, the participants felt presence in the collaborator room.

Fig. 7.
figure 7

Categorical questionnaire result

Finally, we provide a comparison among the three cases (Fig. 8). According to these results, we conclude that using the participant’s personal model enhances presence in a remote context. This is supported by our participants’ responses to most questions, which indicate that a personal model is preferred and feels most natural. There is however a contrary indication in the ratings of ease of control where the absence of any model achieved the highest score (Fig. 7). Because the answers displayed in (Fig. 8) were provided by participants after all experiments were completed, some graphs do not agree with (Fig. 7), especially when one looks at the second and third graphs in (Fig. 8). However, we still believe that participants have more sense of telepresence when they use their personal model to interact with a collaborator if they have significant agreement between visual information and the user’s action. Unfortunately, the visual information we provided was not always synchronized with reality resulting in a reduced sense of telepresence. In addition, we hypothesize that a high degree of visual fidelity in the human model and better performance, e.g., less latency, provide a greater sense of telepresence.

Fig. 8.
figure 8

Final result

4 Conclusion and Future Work

Our goal with this paper is to report on a proof of concept in advance of a study of features that improve a user’s sense of presence, and very specifically body ownership, when inhabiting a remote physical avatar (in our case, a humanoid robot). The study we carried out focused on our participants’ perception of their hands being in the same space as that in which they are remotely interacting. Our hypothesis was that using a human model that accurately reflects the appearance and actions of a user’s body improves one’s sense of telepresence. To investigate this hypothesis, we designed and implemented a system in which we conducted a simple experiment. The results generally support our assumption that proper correlation between visual information and physical actions by a user will enhance the sense of telepresense. However, there were unfortunate confounds in our study due to inaccurate tracking of the user’s hands and fingers. This adversely affected our participant’s control of visual information in the remote environment, leading to some unexpected and counterintuitive findings. As with all formative studies, the goal here was to find deficiencies in the experiment’s design, and so our focus for the next version of the system will include improved quality of tracking and rendering. In our next stage we will use a humanoid robot that mimics a user’s movement. To enhance the sense of telepresence, we will blend the robot’s view of the remote environment with the user’s body, so that the participant perceives his or her own body in the remote context. We expect this future system will enhance the sense of telepresence and improve the user’s sense of place illusion (my body is in this remote place) and situational plausibility (my actions have appropriate consequences in this remote place) [4].