1 Introduction

The rehabilitation population of the elderly is increasing [1]. Maintaining motivation is essential for effective rehabilitation [2], the practitioners are burdened with pain and mental pain, it is difficult to maintain motivation. In order to maintain motivation at rehabilitation, Itoh et al. proposed a “voice-casting robot” which changes a voice call phrase according to emotion of the practitioner [3]. They designed and implemented a robot that vocalizes different voices according to the state of emotion of the subject, using the value of valence that is calculated on the sensor value of pulse sensor. As a valence evaluation value, they use pNN50 which is an indicator of human autonomic nervousness. The value has been used as to evaluate the emotional state [4], which value is increasing if they feel relax, and decrease if they feel opposite. The experimental results of Ito’s work, shows that the most comfortable state was obtained when combining the use of phrases according to emotion of experimental collaborators and supportive behavior [3].

In the research of Ito et al., it is measurement of comfortable discomfort and voice based on it, and they do not consider the arousal state of human. It is reported that anger is high as a state of high arousal degree among discomfort, and it is reported that voice clashing will be the opposite effect in this state [5]. From this, it is possible that the voice call will have the opposite effect. Therefore, in this research, we realize a new voice-over robot adding awareness and verify its effect.

The paper is organized as follows. In Sect. 2, we firstly present related work, then Sect. 3, describe proposal. In the Sect. 4, we showed the experiment using the proposed robot, and results. Finally conclude the paper in Sect. 5.

2 Related Work

Wada et al. proposed “robot therapy” which mental care through touching with an animal-type robot [4]. Robot therapy has an advantage that it can be carried out more easily than animal therapy by using a safety and sanitary animal robot instead of an animal in animal therapy. The seal-like robot Paro [4] realizes tactile, visual, auditory, and balance senses with its internal sensors, and by combining these data, it can learn people’s names and actions. With Paro, we can gradually build up the relationships between Paro and its owners through interactions, and the owners are expected to interpret it as if Pharaoh had feelings. It is recognized that animal therapy has mainly (1) psychological effects: an increase in smiles and motivation, mitigation of “depression,” etc., (2) physio-logical effects: a decrease in stress, blood pressure, etc., and (3) social effect: an increase in communication, etc. Wada et al. have demonstrated experiments using Paro robots and have shown that they have been effective. However, Paro is mainly healing patients with its singing voice and appearance. It does not, for example, improve specific motivation of patients in rehab by speaking a natural language. On the other hand, one of the factors that enhance the rehab effect is the ambitious effort by the rehab patient himself. Motivation is the driving force of action, and the necessity of activity. Declining motivation, often a problem in everyday life, is an obstacle to implementing rehab. Therefore, Kimishi et al. selected commonly-heard spoken words at a rehab hospital, conducted a questionnaire survey on the staff (physiotherapists, occupational therapists, physicians, etc.) and rehab patients, and investigated the degree of motivation for rehab patients [5].

Using the idea of improve the motivation of rehabilitation, Itoh et al. proposed a “voice-casting robot” which changes a voice call phrase according to emotion of the practitioner [3]. They showed the experimental results, that shows the most comfortable state was obtained when combining the use of phrases according to emotion of experimental collaborators and supportive behavior.

In the research of Ito et al., it is measurement of comfortable discomfort and voice based on it, and they do not consider the arousal state of human. It is reported that anger is high as a state of high arousal degree among discomfort, and it is reported that voice clashing will be the opposite effect in this state. From this, we consider that it is possible that the voice casting would have the decline the motivation of the subject.

3 Proposal

In this study, we consider emotion not only negative/positive from the valence value, but also consider the arousal state of the practitioners. Since as we describe on the previous section, the existent research shows the possibilities of improvement of the motivation of the rehabilitation of the subjects, when the voice-casting robot makes changes the voice over phrase according to the subject’s auto nerves state that indicate the positive/negative valence. The result showed that effectiveness of the robot voice-casting with subject’s emotion compared to without using the emotion of subjects. However, the effectiveness is limited, since Ito et al. only consider the positive/negative valence state of the subjects, it contains both of the angry and sad emotion without consider the arousal classification in the Russell’s model (Fig. 1). The emotions that in the high arousal area in negative valence would evaluated as some frustrated state of patients for rehabilitation. For this case, sometimes encouraging voice-casting would not be effective for them. Even though there are several discussions, there are not sufficiently evaluated to compare the state of the voice casting for the subject’s state.

Fig. 1.
figure 1

Russell’s model and focused emotion

To solve the problem, we propose a design and implemented a voice-casting robot that cast the appropriate phrase based on the estimation of additional classification by the arousal level.

3.1 Bio-signal Information Analysis

To achieve the purpose, we use the bio-emotion estimation method that has been proposed by Ikeda et al. [8] as a method of estimating emotion according to the status of people. The values obtained from the brain waves and pulses were calculated so as to correspond to the Arousal axis and the Valence axis of Russell’s circumplex model [9], and the values of Arousal and Valence were plotted on the two-dimensional coordinate.

The brain wave value associated with the Arousal axis was measured using an electroencephalograph called NeuroSky’s MindWave Mobile [10]. We used the value Attention and Meditation calculated by this electroencephalograph. Attention and Meditation are each a value indicating the degree of concentration and the resting degree of the person, and are calculated at the level of 0 to 100. From this, in this study, we assumed that the difference between the value of Attention and Meditation was appropriate to express the degree of arousal of a person and corresponded to the value of arousal axis of Russell’s circular model.

The value of the Valence axis was correlated with the pulse rate earned by the Sparkfun’s Pulse Sensor. This sensor measures pulse rate by photoelectric volumetric pulse wave recording method, and pNN50 was used as a pulse value corresponding to the Valence axis. The pNN50 shows the rate at which the difference between the 30 adjacent RR intervals exceeds 50 ms. Generally, pNN50 is said to indicate the degree of tension of the nerve, and the smaller the value, the more tense/uncomfortable a person is. Therefore, it can be said that when someone is normal/pleasant, the RR interval exceeds 50 ms for a fair amount. From this, pNN50 was calculated at a rate of 0 to 1.0, and the value was correlated with the valence axis.

An empirical result on effectiveness is reported by associating the state of autonomic nerve and determining short-term emotion [11]. By using sensor values in real time, it can be applied to control of robot and the like. Based on the method, we distinguish discomfort (negative valence) with high arousal level, which is the problem of previous research, and discomfort (negative valence) of arousal degree including sadness and fatigue (Fig. 1).

3.2 Design and Implementation

In order to make it possible to use it in walking rehabilitation in various places, we design and implement the system as shown in Fig. 2.

Fig. 2.
figure 2

Voice-casting robots based on the bio-estimated emotion

There are mainly two parts. The one is robot control unit, the other is emotion detected and select phrases unit. (1) The robot control unit first determines the moving direction and sends a serial command from Raspberry Pi to Arduino for motor control, and (2) classify the emotion from the pulse and the brain wave and to determine the phrases to play.

  1. (1)

    In the determination of the direction of moving, forward and backward switches between walking and feedback/calling, Arduino sends a turn command each time to run. In addition, after the feedback, it was stopped, and it was made the specification to resume moving after calling out. In this experiment, it was assumed that the drive unit mounted this time traveled at 0.5 km/h at the maximum speed at which stable moving can be achieved.

  2. (2)

    We describe the part that performs voice reproduction from emotion classification. First, the Arousal level and the average value of 10 immediately before Valence (pNN50) are compared. When the comparison value falls below the threshold, the support operation (moving control) is executed. Then, select and execute the appropriate voice response according to the emotion. The pNN50 was defined as less than 0.23 as the threshold of valence, and the arousal degree was less than 0 was used as the sleepiness dominant.

In addition, the function to write the time when the voice call was made in the data together with the biological information data was also implemented, making it easy to analyze the biological information at the time of voice conversion. In this experiment, the last voice was performed so that 4 to 5 voices were performed within a 10 m walk to prevent the voices being played continuously and the effects of each voice being obscured. In the 20 s from the time of the application, the specification was not used to make calls even if it was below the threshold.

The data of pulse sensor and electroencephalograph were mounted by wireless communication. The communication method uses Bluetooth and Xbee standardly installed in Raspberry Pi 3. The electroencephalograph used Mindwave Mobile of NeuroSky company which can communicate data by Bluetooth. In addition, as a pulse sensor, pulse sensor of Switch Science, Inc., RN4020 of Microchip’s Xbee module was used to communicate sensor values with Raspberry Pi.

4 Experiment

4.1 Preliminary Experiment

We firstly execute the preliminary experiments with the aim of investigating the biological reaction by robot’s voice call. In a voice call, we conducted a follow-up that showed effectiveness in previous studies. In the evaluation, emotional evaluation was performed using an electroencephalograph and a pulse rate meter. After setting a resting time of 1 min for stabilizing the pulse for experimental collaborators (20 s, 7 people), they are guided by a voice calling robot, and an experiment collaborator walks 6 m using a walking aid (Fig. 3).

Fig. 3.
figure 3

Experimental scene of rehabilitation robot

The case where the average value of pNN50 for 10 s before speech was 0.23 or less was defined as negative valence and used for voice (Table 1). In addition, based on the value acquired by brain waves, the state judgment of a person was classified as negative valence in the case of low arousal degree and negative valence in case of high arousal degree.

Table 1. Appropriate voice based on their emotion status

4.2 Result

As shown in Fig. 4, in the negative valence (including anger and irritation) with high arousal level on the left, a significant trend was observed in the direction in which the value of pNN50 declined before and after voice calling (p < 0.10). On the other hand, in the negative valence (sorrow and fatigue) state with low arousal level in the right figure, there was a significant difference (p < 0.05) in the direction in which the value of pNN50 rises before and after voice call. From these facts, although it is effective to voice-casting in previous research on people with negative valence with low arousal level, for people with a high level of arousal, a voice of encouragement in previous research would be considered not to be effective.

Fig. 4.
figure 4

t-test of before voice-casting and after vice-casting at the time of high arousal level + negative valence (anger/irritation, etc.) condition (left), low arousal level + negative valence (sadness/fatigue etc.)

4.3 Comparison with the Negative Valence and Low Arousal Status

Preliminary experiments have suggested that the voice-casting would not be effective to highly arousal level cooperators. In addition, based on the expert’s advice that it is important to keep the state of high arousal at the time of rehabilitation, we evaluate the effectiveness of voice-casting when the degree of arousal decreases (sleepy dominance).

After providing a rest period of 1 min for pulse stability for the experimental collaborators (5 persons in 20’s), they receive guidance of a vocalization robot, and the experimental collaborators walk using a walking aid. The walking distance was a 10-m course that was found to be effective in the 6-min walking test [12]. Moreover, in order to add a load to walking, a weight was attached to the ankle of the experiment cooperator. Furthermore, from the expert’s advice, in order not to lower the gaze of the gait line too much, the robot and the experiment cooperator were separated by 2 m and walked to maintain the interval.

In this experiment, we set up the three types of conditions: one that does not take account the arousal level and no voice-casting, one that does not take into account the arousal level, but does take into account the negative valence, and the one take into account the arousal level low and voice-casting. We implemented the three patterns. The experimental patterns implemented are summarized below.

  • I: No voice-casting

  • II: At negative valence, voice-casting

  • III: At low arousal, voice-casting

In addition, it is assumed that the follow-up action of the support action is performed before the vocalization.

4.4 Result

In the three experimental patterns, differences in biological information occur when walking from rest. In order to calculate this, the average value of BPM and awakening degree during resting and walking and pNN50 was obtained, and the analysis was performed on the amount of change.

One-way analysis of variance was performed for the difference in the mean value of BPM. However, no significant difference was found (p > 0.05). Similarly, a one-way analysis of variance was performed to determine the difference in mean arousal levels. However, no significant difference was found (p > 0.05). On the other hand, as a result of one-way analysis of variance for the difference in the mean value of pNN50, a significant difference was recognized (p < 0.05) (Fig. 5).

Fig. 5.
figure 5

Comparison with the I No voice casting, II at negative valence voice-casting, III at low arousal voice-casting

Moreover, as a result of performing multiple comparison by Tukey HSD test in order to compare the difference of the mean value of each group, a significant difference is recognized in pattern II and III (p < 0.05), pattern I and II and pattern I There was a significant tendency in and III (p < 0.10).

4.5 Discussion

There was no significant difference in one-way analysis of variance between the difference in the mean of the BPM and the difference in the mean of the arousal level. This is considered to be because the load does not change just by the presence or absence of a voice, since it is an experiment in which every pattern travels the same distance at the same speed.

In addition, significant differences were found in the one-way analysis of variance for differences in the mean values of pNN50, and significant differences and significant trends were found in multiple comparisons for each combination. From this result, it can be considered that the use of voice-casting only when drowsiness is significant makes the value of pNN50 positive valence.

In addition, it was found that it is more likely to be negative valence when voice-casting only when it is offensive than when no voice-casting. From this, it is more effective to support the rehabilitation executor by giving a voice-casting in consideration of the arousal level, and the action to perform a voice-casting without considering the arousal level is more unsupportive than no voice-casting. This is considered to be unpleasant for the rehabilitation implementer to be voice-casting only when it is offensive.

5 Conclusion

In this study, in addition to the heart rate index pNN50 used in the previous study, we evaluated the robot that by taking into account the degree of the arousal that can be acquired from EEG. As a result, it was found that a voice-casting taking into account the degree of arousal more pleasantly than the action that does not voice-casting or the action voice-casting when disgusting makes people feel better. From the results, it is considered possible to realize a more supportive system and a general-purpose rehabilitation robot by considering the arousal degree and the issues described in the next chapter.