1 Introduction

Nowadays, ICT (Information and Communication Technology) devices enable real-time remote communication over the network. Ranging from the high-qualified commercial systems at the top of the range to freely downloadable application software at the bottom, various choices are available for remote communication [1]. The network-based remote communication is a convenient tool. However, it addresses several critical issues, such as lack of tele-presence feeling and lack of relationship feeling in communication [2].

As for the lack of tele-presence feeling in remote communication, the idea of mobile robot-based remote communication proposes one solution to this issue. The effectiveness of these approaches in remote communication has been shown by experimental studies using a mobile robot [4, 12]. Embodiment of an agent using anthropomorphization of an object [8] also shows an interesting idea towards the higher presence of a remote participant [11].

Tele-presence robots provide the tele-presence of the operator in the remote site and even enable to do some kinds of tele-operating tasks remotely. Some robots provide a basic function to support distance communication using several critical technologies such as face image display of the operator [9], remote-drivability to move around, tele-manipulation as well as the basic communication functions such as “talk”, “listen”, and “see” [5]. However, it is recognized that there is still a gap between robot-based video conference and face-to-face one.

A robotic arm-typed system with mobile function has undertaken a new challenge. For example, Kubi [7], a non-mobile arm type robot, allows the remote user to “look around” during their video call by commanding Kubi where to aim the tablet using intuitive remote controls over the web. Moreover, an idea of enhanced motion display using a moving object has also been reported [10]. However, the movement of human body as non-verbal movement of the remote person is still an open issue.

Considering the critical aspect of entrainment in human communication [13], this research challenges the two issues, which are the lack of tele-presence feeling and the lack of relationship in communication [3]. This paper presents an overview of ARM-COMS (ARm-supported eMbodied COmmunication Monitor System) for connecting remote individuals through augmented tele-presence systems.

2 ARM-COMS (ARm-supported eMbodied COmmunication Monitor System)

2.1 Basic System Overview of ARM-COMS

ARM-COMS is composed of a desktop type mechanical robotic arm, which holds a tablet PC, such as a smart phone, and dynamically manipulates its position and movement autonomously. This autonomous manipulation is controlled by the head movement of a master person whom the tablet represents in remote communication. The head movement of the master person can be recognized by a portable motion sensor, such as a Kinect [6] sensor, and its detected signals are transferred to the PC under which ARM-COMS is connected over the network.

First, a user establishes a connection to the remote ARM-COMS, which is located at a remote site when a remote communication starts. ARM-COMS mimics the movement of its master person as an avatar in communication as if the master person on a remote site is virtually present at the local site. On one hand, ARM-COMS works just like a general mobile robot which supports the remote communication, by way of remote control manipulation over the network. However, ARM-COMS provides more than what is offered by general tele-presence robots, on the other hand. Figure 1 shows the general overview of ARM-COMS and the three critical and unique challenges to which this research is pursuing. These challenges differentiate the ARM-COMS from general tele-presence robot.

Fig. 1.
figure 1

Challenges of ARM-COMS

2.2 Two Critical Modes of ARM-COMS

ARM-COMS works as an intelligent ICT device on a local site as well as an intelligent remote avatar system who represents its master person on a remote site (Fig. 1). The former performs as an intelligent tablet which is called IT-mode herein after, whereas the latter performs as an intelligent avatar module which is called IA-mode herein after.

Tablet PC or smartphone is one of the very popular mobile ICT devices today. As a typical situation in using a tablet PC, a user holds the devise in left hand and manipulates it on the touch screen with right fingers. In addition to that, the device is often set to a holder on an office table, a driving seat of a vehicle, or a bed at home. AP (Autonomous Position) function of IT-mode of ARM-COMS enables the tablet PC autonomously and automatically approaches to the user when needed, for example at the time of incoming phone call, or at the time of lost-and-found of tablet.

ICT devices allow us not only to retrieve information, but also to communicate with others over the network. However, when we compare video communication with face-to-face communication, there is a significant difference. When we talk with somebody in a face-to-face meeting, what we share is not merely the same physical space, but also an invisible communication space or atmosphere. As a result, entrainment between the participants occurs during the conversation. However, when we talk with somebody over the network, we can only see the face on the screen and cannot share the same physical space. As a result, this kind of entrainment is different from that of a face-to-face meeting.

Since the entrainment is associated with physical movement of a person, AEM (Autonomous Entrainment Motion) function of IA-mode enables a dynamic movement of a tablet PC during remote communication for entrainment acceleration by mimicking the head movement of its master person. In addition to the head movement of a person, AEP (Autonomous Entrainment Position) function of IA-mode in ARM-COMS enables the expression of relationship between the persons. The next section covers the critical challenges of ARM-COMS.

2.3 Challenges in ARM-COMS

Autonomous Position Control.

The first challenge of ARM-COMS tackles the issue of autonomous position control, where a tablet PC on ARM-COMS autonomously and automatically approaches to us when we need it as if ARM-COMS understands when we want. For example, suppose a user is working at a desk and receives an incoming video conference call. Considering what the use is doing, ARM-COMS autonomously takes the table PC in front of the user to urge the acceptance of the connection.

Autonomous Entrainment Movement Control.

The first Challenge mentioned above does not directly relate to video communication. However, the second and the third challenges below are directly related to video communications. It has been reported that entrainment among participants emerges during conversation if the participating subjects share the same physical space and engage in the conversation [14]. However, this kind of entrainment in a face-to-face meeting is different from that of remote communication. Tracking the head movement of a speaking person in a remote site, ARM-COMS manipulates the tablet PC as an avatar to mimic the head movement of the remote person so that entrainment emerges as if the local person interacts with the remote person locally. ARM-COMS mimics the head movement on a remote site to represent the speaking person on a remote site.

Autonomous Entrainment Position Control.

The third challenge of ARM-COMS deals with an autonomous entrainment position control. In a face-to-face meeting, each person takes a meaningful physical position to represent the relationship with the others, or to send non-verbal messages to others. A closer position would be taken for friends, showing close relationship, whereas a non-closer position would be taken for strangers, showing unfriendly relationship [5]. ARM-COMS controls a tablet PC to dynamically locate an appropriate position in space and to explicitly represent the relationship with other participants, by sending non-verbal messages. For example, the tablet PC would be approaching to the speaking person to show that the remote person is interested in the talk.

3 Motion Control Using a Prototype System of ARM-COMS

A prototype of ARM-COMS system has been developed to study the feasibility of the proposed ideas. The prototype system is a five axis robotic arm controlled by a microcontroller using gesture signals detected by a motion sensor. The prototype is designed to mimic the basic human head motion as the AEM function, which is one of the challenges of ARM-COMS. Focusing on the typical human head gestures including; nodding motion for affirmative meaning; head shaking motion for negative meaning; and head tilting motion for unsure meaning, the previous paper reported the basic control results [11]. Redesigning of the prototype system, the motion control algorithms of the prototype are also updated. This section presents the experimental results of motion control by these updates.

Figure 2 shows the setup of this experiments. Since the motion range of head movement is not so wide, this experiment used a hand gesture to analyze the motion of the ARM-COMS. In this experiment, ARM-COMS is controlled by gesture signals detected by a non-contact hand motion sensor. Therefore, a subject could freely manipulate hand gestures without any attachment. However, for data collection, a small receiver was attached to the back of the hand and terminal portion of ARM-COMS.

Fig. 2.
figure 2

Experimental setup

As the results of program updates, ARM-COMS could mimic the hand gesture motion based on the motion sensor control. From the experimental date, three types of hand gestures were selected to show the feasibility of head gesture motions, nodding, shaking and tilting, as shown in Figs. 3, 4 and 5. In each case, the hand gestures were repeated three times in a consecutive manner.

Fig. 3.
figure 3

Analysis of nodding motion (Color figure online)

Fig. 4.
figure 4

Analysis of head shaking motion (Color figure online)

Fig. 5.
figure 5

Analysis of head tilting motion (Color figure online)

Figure 3 shows the snapping gesture of hand motion, which is identical to nodding gesture of head movement. The blue line shows the snapping gesture conducted three times in a consecutive manner, whereas the red line shows the corresponding ARM-COMS motion.

Figure 4 shows the twisting gesture of hand motion, which is identical to head shaking gesture of head movement. The blue line shows the snapping gesture conducted three times in a consecutive manner, whereas the red line shows the corresponding ARM-COMS motion.

Figure 5 shows the turning back gesture of hand motion, which is identical to head tilting gesture of head movement. The blue line shows the snapping gesture conducted three times in a consecutive manner, whereas the red line shows the corresponding ARM-COMS motion.

From the experimental results of the selected three types of movements, timing of ARM-COMS movement was almost corresponded to the master motion of the hand gestures. However, the trajectories of ARM-COMS were not identical to the original trajectories of master motion of hand. Jerky movement observed in ARM-COMS needs to be resolved before applying to the head motion gesture expression in remote communication.

4 Concluding Remarks

The paper presented an idea of active display monitor named ARM-COMS with the two types of modes in ARM-COM system, or IT-mode and IA-mode, followed by the three challenges based on these modes. The three basic functions of AP, AEM and AEP were also presented. The future goal which ARM-COMS is pursuing is not only the tele-presence feeling of a remote person, but also implicitly shows the relationship between the remote person and the local participants by way of the entrainmental behavior of a table PC manipulation.