Keywords

1 Introduction

1.1 Human-Device Interaction

With the development of technology, intelligent devices can interact with humans directly through speech output, serving as a conversation companion [1, 2], or interacting with humans as a mediator between humans [3, 4]. The interactions between humans and devices are increasingly similar to human communication, and devices are becoming more and more proactive [5, 6]. However, currently, there is only limited research on devices’ degree of proactivity. Originating from Asimov’s Three Laws of Robotics, social acceptance of human-device interaction has become an increasingly emerging research topic [7]. The purpose of this paper is to explore the level of the devices’ proactivity in direct interactions between devices and humans with high user satisfaction under specific scenarios.

The relationship between humans and devices are becoming closer, and the interaction between them is called interspecific interaction (between two different species) [3]. In the traditional human-device interaction, devices mostly play the role of humans’ tools [8,9,10]. Previously, some studies focused on these tools’ different levels of automation [10]. In recent years, Mäkitalo and his colleagues have proposed that there has been a new kind of socio-digital system between humans and proactive context sensing mobile devices, where the mobile devices work as active participants and can initiate interactions between devices and humans [5] (Fig. 1).

Fig. 1.
figure 1

Comparation of human initiating interaction and device initiating interaction.

In humans’ face-to-face interaction, stimuli to initiate a conversation is essential when people are not familiar with each other, which is called “ticket to talk” according to Sacks [11], such as “hello” or some other casual topics. Among most of the previous research, when humans interact with devices, they need to first “wake up” devices using input devices such as radio or buttons, which will then respond to humans [12]. The development of artificial intelligence technology promotes the exchange of active roles between people and devices [13]. Up to now, a lot of research has been done on social applications of smart device, and consequently, research on social features of smart devices has become a hot topic [4, 14,15,16]. Compared with humans’ face-to-face interaction, the stimuli to initiate human-device interaction is essential.

1.2 Proactivity and Social Device

The theory of proxemics indicates that the increased proximity between people may indicate an intention to interact [17]. Tennenhouse proposed that the majority of computers would be proactive [18]. Pradthana created a sense of proactivity of the technology and allowed an intuitive way of interacting with mobile technology in a social situation [13]. Mäkitalo and Pääkkö introduced the concept of social devices and its implementation, which focused on enriching local interaction by means of technology [5]. Paasovaara designed an application called Next2You which aims to inform users of opportune connections and encourage face-to-face interaction between people [6]. Jarusriboonchai developed Who’s Next, a multiplayer quiz-based mobile game, which was intended to break the ice within a group of strangers [19]. However, most of the research on SD is about the role of devices in human interaction. In this thesis, three models are proposed to represent the intelligent devices’ different levels of proactivity, with the aim of studying users’ experience on devices’ proactively initiating the interaction with humans at different levels of proactivity.

1.3 Scenarios

According to Bonarini, many intelligent devices like vacuum cleaners, automatic doors and cars are called “non-bio-inspired robots” [20]. Our research object is the intelligent device with active interaction function, so we will combine the usage scenarios of robots with the usage scenarios of social devices to be the usage scenarios of intelligent devices. There have been a lot of studies on social robots on the home scene [14, 21, 22], and some studies have found that the home scene is a suitable scene for robot research [23]. Recently, there have been a lot of research focusing on active interaction in public environment, but some research suggested that active interaction in public could cause some negative emotions, such as embarrassment [4]. Thus, how to handle this problem has become a key research point. In this paper, it will study the differences of user experience with different active interaction degrees in two scenarios: home environment and office environment, so as to obtain a more appropriate interaction degree under different contexts.

2 User Study

2.1 Participants

The experiment with 6 short video sequences was conducted, in which two actors were invited to carry out some tasks in each scenario. There has been some research using video sequences to match the dog videos to the robot videos with emotional states [3]. In addition, Zwinderman and his colleagues compared the experiment with videos recorded by actors to the direct experiment, finding that as a medium, video could draw attention to what users do with technology instead of technical workings [24].

30 participants were involved in this study (16 male, 14 female), aged from 22 to 65 with an average age of 33 years old. The participants were recruited by means of a questionnaire survey, which consisted of two parts: the demographic characteristics, and the questions designed for dividing different types of groups. The total participants were divided into two groups: home group and office group, according to the experiences of each scenario. Most participants’ highest level of education was a four-year college degree or higher (66.7%, n = 20), which was followed by a junior college degree (16.7%, n = 5) and then a high school diploma or below (16.6%, n = 5).

2.2 The Proactive Interaction Models

Paasovaara studied social devices with three levels of interaction: automatic, technology-mediated and face-to-face, which was aimed to inform users of the opportune connections and encourage face-to-face interactions between people [6]. However, there is almost no division on devices’ degree of proactivity when devices interact with people. According to the degrees of proactivity, three proactive interaction models were presented (Fig. 2):

Fig. 2.
figure 2

Three levels of proactive interaction.

  • L1 - “arouse and wait”. The L1 model was the least active one among the three models, which represented a pattern that the device aroused people proactively with a beep or other forms of sound (sparkling, motion, graph or voice) and then waited for people’s feedback.

  • L2 - “arouse and output”. The L2 model was designed for the moderate level. It outputted its intention after arousing the people without the feedback from human beings.

  • L3 - “output directly”. The L3 model was the most active model in which the device outputted its intention directly without arousing people.

2.3 Preparation of the Video Materials

Classification of Contexts.

The contexts were divided into two categories: home scene and office scene. People usually feel relaxed at home as home is a private space. Hence, the home scene possesses the feature of ease. In contrast, the office is a public place where no noise is needed, so people always keep quiet on the office scene.

Video Producing.

In this research, a total of 6 short video sequences were studied, which were made by applying the three proactive interaction models to the two contexts with an intelligent device in each scenario (seen Table 1).

Table 1. Classification of 6 video sequences

In the first video marked as A1 on the home scene, the intelligent device swung its head to arouse the man and waited for a feedback, which then expressed a request that it want to play music after getting verbal feedback from the man, and then it played music, otherwise, the device wouldn’t play music if the man didn’t give feedback. In the second video marked as A2, it aroused the man first and enhanced its intensity until the man noticed it and then outputted its intention directly without human’s verbal feedback. Furthermore, the intelligent device greeted the man with voice proactively and directly without any arousing once the man came back from outside and went through the door in the video marked as A3.

On the office scene, the intelligent device aroused the woman in the B1 video and waited for a feedback, and once getting verbal feedback it outputted its intention. In the B2 video, it aroused the woman first and enhanced the intensity of arousing and then outputted its intention directly without verbal feedback. The intelligent device outputted the sound of the phone ringing directly without any arousing in the B3 video.

2.4 Experiment

The study was composed of three parts: (1) pilot experiment, in which the participants were presented with three prepared short video sequences to explain the three proactive interaction models before the main experiment was conducted; (2) the main experiment, including 6 short video sequences which were divided into 2 scenarios, where the participants were presented with 3 sequences of the same scenario every time in a random order, and they were asked to rate each model in the video with subjective scale concerning satisfaction and mental comfort levels (Table 2); (3) a semi-structured interview, after three videos of a single scenario, the participants were asked about the reasons for the scores through verbal expressions. To keep a quiet environment, the study was carried out in a laboratory in School of Design, Hunan University, China.

Table 2. Subjective scale on satisfaction and mental comfort level.

In the main experiment, the participants were presented with three video clips representing the three models every time, which were depicted in the same scenario, and then were asked to rate each model under the context of 5-Point-Likert scale concerning satisfaction and mental comfort. This was followed by a semi-structured interview, where the participants were required to explain the reasons behind the scores, describe their subjective experiences in imaging themselves in the depicted stories and give some advice on these models.

2.5 Data Analysis Methods

In this study, two types of data were produced: (1) subjective scores of participants’ levels of satisfaction and mental comfort were measured on a numerical scale ranging from 1 to 5; (2) notes and audio recordings obtained from the semi-structured interview. The data analysis consisted of two phases: quantitative analysis and qualitative analysis.

Effects of the levels of proactive interaction on dependent variables (satisfaction and comfort) were tested separately by means of repeated measurement analyses of variance (ANOVAs), with an alpha level of .05 involved in all statistical tests. If an interaction effect between the factors was found, a univariate ANOVA and a Scheffé post hoc test would be performed to figure out which level or levels of the factor differed from others in their effects on dependent variables.

The qualitative data were recorded by experimenters on the spot while the other experimenters were given an interview. The data were analyzed based on clustering analysis, thus producing a bottom-up hierarchy of themes. The purpose of the qualitative analysis was to account for the participants’ subjective scores and help us better understand their subjective experiences and opinions towards the concept as well as the application of the models.

3 Results

3.1 Satisfaction Level

On the home scene, the main effect of the level of proactive interaction on satisfaction was statistically significant, F (2,29) = 3.527, p = .034. As was shown, the participants were mostly satisfied with the L3 - “output directly” model with a mean rating of 4.33 followed by the L1 - “arouse and wait” model (mean = 3.80) and the L2 - “arouse and output” model (mean = 3.76). On the office scene, the main effect of the level of proactive interaction on satisfaction was statistically significant, F (2,29) = 7.169, p = .001. As was shown, the participants were more satisfied with the L1 - “arouse and wait” model with a mean rating of 4.10 compared with the L3 - “output directly” model (mean = 3.00) (Fig. 3).

Fig. 3.
figure 3

Mean subjective satisfaction evaluation of the three models.

When considering the entire types of scenarios, the effects of participants’ demographic characteristics on satisfaction would be discovered. The participants classified into the office group were more unsatisfied with the L3 model compared with the home group, and the participants classified into the home group were more satisfied with the L1 model compared with the office group. In addition, no significant difference was found in terms of gender. Only one significant difference was manifested in age when it came to satisfaction ratings. The users aged 36 to 45 were more satisfied with L1 model compared with those aged 18 to 25.

3.2 Comfort Level

The main effect of the level of proactive interaction on comfort on the home scene was statistically significant, F (2,29) = 3.508, p = .034. The most comfortable model on the home scene was the L3 - output directly model with a mean rating of 4.27 followed by the L1 and L2 model, 3.90 and 3.53 respectively. On the office scene, A test for the effect of the level of proactive interaction on comfort showed significant effects, F (2,29) = 14.630, p = .000. A post hoc test showed that the comfort rating of the L3 model was significantly lower than that of the other two models (Fig. 4).

Fig. 4.
figure 4

Mean subjective mental comfort evaluation of the three models.

4 Discussion

4.1 Home Scenario

On the home scene, the participants were mostly satisfied with the L3 - output directly model, considering the L3 was the most comfortable choice on the home scene, which was mainly because they regarded home as a private place where they could do what they wanted without bothering anyone. One of the subjects named U10 said, “the L3 model is pretty good as it can remind me to do something directly and I think it very practical.” User U8 said, “the L3 can timely remind me.” From this perspective, the direct output model (L3) is quite functional. User U20 said, “I always feel very tired when coming back home after working for a long time, but the device interacts proactively with me, which can ease my loneliness like a pet.” “It is very warm-hearted for the device to talk to me actively when I come back” said user U20. Therefore, most subjects think of their homes as their own space, as a result, tending to have the intelligent device that can initiatively care about them at home. Therefore, the output directly model is suitable for the home scene, and the reasons may include both its practical function and warm-hearted role at home.

4.2 Office Scenario

The most satisfactory and comfortable model in on the office scene is the “arouse and wait” model. As the office scene was a public place, the participants argued that the device should keep quiet so as not to disturb others. The reasons for subjects’ choice of L1 model can be classified into two categories: from subjects’ personal perspective or from the perspective of others. Some users considered that the device’s output without human’s permission would interrupt the thoughts of those working in the office. The U18 said, “the L3 model is too abrupt, and it will interrupt my thoughts, while the L1 model will remind me with a prompt, and I can set the time to get it to output the information, which is more comfortable.” Based on the user’s explanation, it can be concluded that for most of the time, the users are concentrating on the office scene, so the output of the device is likely to interrupt their work. At the same time, users are afraid of disturbing their colleagues. The U11 said, “just a little reminder, otherwise it would be embarrassing to disturb my colleagues.” “The L1 is pretty suitable for the office, because it is not too active, but is practical enough to remind me or help me do something after getting my permission.” All in all, in an office scenario, the users expect the device to wake them up in a slightly proactive way and waited for their permission to output information, which is mainly because they are concerned that the device may disturb both themselves and others if it outputs information without human’s control.

5 Conclusion

According to the results, in different scenarios, people’s choices of the proactive levels of smart devices are quite different. As to personal space like homes, people are more inclined to choose L3 active interaction degree model. However, in order to avoid disturbing others in public space, people tend to use the L1 active degree model to get rid of embarrassment. Meanwhile, people in public space usually concentrate on dealing with the matters at hand, so choosing L1 model can guarantee practicality without disturbing their thoughts.

As a future evolution of the present work, the research will be focused on more detailed division of proactivity degrees and external events such as tasks that affect the active interaction in specific scenarios. Furthermore, the state of users will also be one of the effects to be taken into consideration.

6 Limitations

One limitation in this work is that we did not measure user’s experience from more experience dimensions, instead, we measure user’s experience through satisfaction and comfort. Other limitation is all of our subjects were from a second-tier cities in China. In the future, we will study the influence of different tasks and different state of users on model selection.