Abstract
Patients with Locked-In Syndrome such as people with Amyotrophic Lateral Sclerosis (ALS) rely on technology for basic communication. However, available Augmentative and Alternative Communication (AAC) tools such as gaze-controlled keyboards have limited abilities. In particular, they do not allow for expression of emotions in addition to words. In this paper we propose a novel gaze-based speaking tool that enable locked-in syndrome patients to express emotions as well as sentences. It also features patient-controlled emotionally modulated speech synthesis. Additionally, an emotional 3D avatar can be controlled by the patient to represent emotional facial-expressions. The systems were tested with 36 people without disabilities separated into an affective group - full control of emotional voice, avatar facial expressions and laugh - and a control group - no emotional tools. The study proved the system’s capacity to enhance communication for both the patient and the interlocutor. The emotions embedded in the synthesized voices were found recognizable at 80% on the first trial and 90% on the second trial. The conversation was perceived as more natural when using the affective tool. The subjects felt it was easier to express and identify emotions using this system. The emotional voice and the emotional avatar were found to help the conversation. This highlights the needs for more affective-driven communicative solutions for locked-in patients.
This work was partially supported by Fondazione Roma as part of the project TEEP-SLA.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
1.1 Motivations
Locked-In Syndrome (LIS) is a pathological condition where patients (for instance people in advanced stages of Amyotrophic Lateral Sclerosis - ALS) lost all capability to move any part of their body except the eyes [1]. When this last capability is lost, a state of Complete Locked-In (CLIS) is reached. LIS ad CLIS patients’ abilities are severely limited, including in terms of communication. Therefore, innovative communication systems for such patients are much needed to increase their communication capabilities and engagement with people around them. Considering this context, the main goal of this research is to build novel modalities of technologically mediated communication that are specifically designed to improve LIS patients’ quality of life. In this paper we will focus on the skillset of LIS patients and exploit their ability to control their gaze as the input for computer-assisted communication systems [2]. Overall, the requirements for such solutions include the creation of novel user interfaces adapted to LIS patients capabilities to provide them with a more extensive, and complete communication system than the current state of the art. The solutions should improve their ability to express emotions as well as words.
1.2 State of the Art
The first communication systems for LIS patients consisted in codes using eye blinking to signify yes and no or more complex sentences using techniques such as Morse codes [3]. Other types of communication exist such as transparent letter board held by the interlocutor [3] (Fig. 1). In this case, the patient can indicate a letter by gazing at it. The interlocutor must then write down or remember the letters sequence to form words.
The letter board is still widely used nowadays. Nonetheless, more advanced systems do exist. Notably, the ability of LIS patients to control their gaze was used to send commands to computer systems through eye-tracking cameras [2]. This technique enabled them to select letters on keyboards displayed on computer screens and to “read” the written sentence out loud using voice synthesis [2]. Such systems mostly focus on composing words letter by letter. However, when we communicate, we do not only use words but also a great range of additional non-verbal communications cues such as voice intonation, facial expression or body gesture [4]. Such additional input helps the interlocutor to properly understand the context of the message itself. A simple sentence such as “let’s go now” can be read with excitement or anger and deliver a completely different message. This need for enriching words with emotional features has led to the creation of additional textual communication cues in Computer-Mediated Communication (CMC) such as emoticons [5]. These solutions are now widely used in text communications such as SMS or in social medias. For this reason, it is essential for LIS patients to also be able to communicate such affective state to their interlocutors in the most natural way possible. Focusing on the most common emotional cues in communication, voice and facial expression, we may find a great number of work in recreating such concept for CMC. For instance, emotional speech synthesis has been widely studied in the past [6,7,8]. Additionally, facial expression was often associated with avatars and 3D characters as a way to express emotions online [9,10,11]. The usage of those two technologies together were also studied in the past for CMC [12].
However, to our knowledge, those advances in technology related to emotion expression haven’t been adapted for LIS patients yet. Augmentative and Alternative Communication (AAC) systems for persons with disabilities rarely provide tools for emotion expression [13]. Focusing on children with disabilities, [14] reviews the past studies on AAC and exposes the great need for emotional communication in such tools. Additionally, the effect of such affective capabilities on communication abilities for patients with LIS haven’t been studied so far. To fill this gap in the literature we propose a novel open-source system controlled with eye gaze, including emotional voice synthesis and an emotional personalized avatar to enhance affective communication.
2 The Proposed Solution
In order to allow LIS patients to communicate their emotions in addition to words, we proposed a system including a gaze-based keyboard, an emotional voice synthesizer and a personalized emotional avatar. We focused on the 3 most common basic emotions: Happy, Sad and Angry. An additional option allowed the patients to generate a laughing sound. The laugh consists in a recorded sound and it is not created through the voice synthesizer. The system was created using the Unity3D platformFootnote 1, which is most notably know as a game development engine. However, this versatile platform is also very useful for the development of assistive user interfaces. (the work presented in this paper adopts the GUI developed through Unity3D for TEEP-SLA projectFootnote 2.
2.1 Gaze-Based Keyboard
The general aspect of the keyboard can be found in Fig. 2. It uses a standard dwell time system for key selection [15]. A menu button enables the customization of this dwell time. Autocompletion words are proposed using the Lib-face library [16] and displayed in the center of the keyboard to reduce gaze-movements that have been proven to induce fatigue [17]. Additionally, according to preliminary data, the users would most likely see the proposed words positioned in this way rather than above all the keys as their gaze would often pass over the words.
2.2 Emotional Voice Synthesis
The open-source voice modulation platform Emofilt [8] was used to modulate the voice according to emotions. To tune the emotional voice, we took as an hypothesis that a great voice differentiation between emotions was primordial to insure the emotion recognition by the interlocutor in the long-term. The selected Emofilt settings for the happy (H), sad (S) and angry (A) voice can be found in Fig. 3.
The settings containing an asterisk are additions to the original system. The pitch were capped to a maximum and a minimum to avoid unnatural voices. The user are able to select the desired emotion using 3 emoticons buttons positioned above the keyboard (Fig. 2). If no emotion is selected the voice is considered as neutral.
2.3 Emotional Avatar
Because LIS patients are not able to communicate their emotion through facial expression, we decided to simulate this ability using a 3D avatar. To do so, the AvatarSDK Unity asset [18] was used. It allows to create a 3D avatar using a simple picture of the user or of someone else. 3D animations such as blinking and yawning are provided. We created additional 3D animations of the 3 previously cited emotions. An example of such avatar expressions can be found in Fig. 4. The avatar facial expressions are triggered using the same emoticons buttons used for the emotional voice. The selected facial expression is displayed until the emotion is disactivated by the user.
3 Methodology
In order to test the capability of this system in enhancing patients’ communicative abilities, we performed a between-subject study with 36 subjects (26 males, 10 females) separated into two gender-balanced group (5 females, 13 males, avg. age 29 years): a control group (CG) and an affective group (AG). The experimental flow may be found in Fig. 5).
For the control group, unlike the affective group, the affective features (emoticon buttons, emotional voice, emotional avatar, laugh button) were hidden and therefore inaccessible. During each session, a subject was assigned to represent either “the patient” (SP) or “the healthy interlocutor” (SI). After signing the inform consent, we first tested the validity of the emotional voice. 5 sentences were randomly picked among 10 sentences (Table 1) and were each played in the 3 different emotions.
Therefore, in total, 15 sentences were played to the subjects in random order who had to decide if it was a Happy, Sad or Angry voice (Trial 1). Both SP and SI rated the emotional voices separately, in written form, without consulting each other. SP was then seated in front of a commercial eye-tracking monitor system (Tobii 4C [19]) and SI next to him. The eye-tracker was calibrated using the dedicated Tobii software. For the affective group, a picture of SP was taken using the camera from the computer. The 3D avatar was then built from this picture. A second screen displayed the emotional avatar positioned so that both subjects could see it. For both groups, the dwell time was originally fixed to 1 s but SP was able to adjust it at any time through the menu. They were then given a talk scenario designed to simulate an emotional conversation (Fig. 6).
The subjects were asked to have a conversation with each other. They were free to say whatever they desired while respecting the scenario. AG-SP was instructed to use the emotional buttons as much as possible. Once the conversation finished, both subjects were asked to answer a questionnaire on a 7-point Likert-type scale (Fig. 7). The first two questions were designed to compare the efficiency of the systems. The other questions were added to assess the user experience but will not be used to compare the systems. Such questions will be different between subjects and conditions to fit the context the subjects were experiencing. The first part of the study was then repeated with the remaining 5 sentences (Table 1) (Trial 2).
4 Results
4.1 Speech Synthesis Emotion Recognition
The control group (both interlocutors and patients) were able to recognize 81% of the emotions from the emotional voice synthesis in the first trial and 87% in the second trial. The affective group had a 80% recognition in the first trial and 92% in the second trial (Fig. 8).
4.2 Questionnaire
In the questionnaire, only the first two questions aimed at comparing both systems and evaluating emotional communication efficiency. Results of these questions can be found in Fig. 9. The questionnaire results (ordinal scale measures, analyzed through Mann-Whitney U test) show the advantages of using the system proposed in this paper. The subjects considered the conversation significantly (p < 0.05) more similar to a normal dialog (in terms of communication abilities, not in terms of scenario) while they were using the expressive solution (Q1-AG-SP: 4.5 and Q1-AG-SI: 4.75 in average) compared to the control group (Q1-CG-SP: 3.375 and Q1-CG-SI: 3.25). This is true for both patient-like subjects (U = 16 with p = 0.027) and their interlocutors (U = 12.5 and p = 0.012).
The questionnaire data are analyzed through appropriate non-parametric tests because of the dependent variables are constituted by ordinal scale measures. The assumptions of the tests are checked.
The “patients” from the affective group found that they were more able to express their emotions (Q2-AG-SP: 5.875) compared to the control group (Q2-CG-SP: 3.25). A significant difference was found between the 2 conditions (AG and CG) (Mann-Whitney U(16) = 1.5 with p < 0.001).
The “healthy subjects” from the affective group found that they were more able to identify emotions from their interlocutors (Q2-AG-SI: 5.5) compared to the control group (Q2-CG-SI: 3.75). A significant difference was found between the 2 conditions (Mann-Whitney U(16) = 10.5 with p = 0.008).
Additional questions were asked aiming at collecting additional insights on both systems. The results can be found in Fig. 10.
The “patients” in the affective condition found that the ability to convey their emotions helped with the communication (Q3-AG-SP: 5.875) and the ones from the control group thought that it would have helped (Q3-CG-SP: 5.2). The “healthy subjects” in that affective condition found that the ability to identify their interlocutor’s emotion helped with the communication (Q3-AG-SI: 6.1) and the ones from the control group thought that it would have helped (Q3-CG-SI: 5.5).
5 Discussion
Firstly, we can see that the overall recognition of the emotional voice in the first task was sufficient for it to be used meaningfully in this experiment. Additionally, we can see that this recognition quickly increases with time since the recognition on task2 is much higher than the one on task1. This increase is higher for the affective group that had additional time to familiarize with the voice modulation during the scenario part of the experiment, reaching a score of 92%. This ability to successfully express emotion (Q4-AG-SP) and identify emotions (Q4-AG-SI) through the voice synthesizer were confirmed by the questionnaire. Furthermore, the affective group found this emotional voice helpful for the communication (Q7-AG) and the control group thought it would be a useful feature (Q7-CG). It confirms our hypothesis that strongly distinctive emotional voices are easily recognizable in the long term and improve communicative abilities.
The emotional avatar was found to successfully represent the desired emotion (Q5-AG-SP), to provide easily identifiable emotions (Q5-AG-SI) and to help with the communication in the affective group (Q6-AG). It is interesting to notice the affective group found the avatar to be more helpful for the communication than the voice (Q6-AG and Q7-AG).
Overall, the communication was found more natural to the affective group than to the control group (Q1). SP subjects found that they were more able to express their emotion (Q2-SP). It highlights the positive impacts of both the emotional avatar and the emotional voice on the communication which is confirmed by Q3-AG-SP and Q3-AG-SI. Concurrently, the control group that did not have access to any emotional tools, also found that the ability to express emotion (Q3-CG-SP) and to identify emotion (Q3-CG-SI) would have helped with the communication.
It is interesting to notice that in the affective group the “healthy” subject ranked higher how much the avatar and the voice helped (Q6-AG-SI and Q7-AG-SI) compared to the “patient” (Q6-AG-SP and Q7-AG-SP). This highlights the fact that this system is particularly useful for the interlocutor who is the one looking for cues about the emotion felt by the patient. The “patients” subjects often stated that they did not really pay real attention to the avatar as they were focused on writing on the keyboard.
6 Conclusions and Future Work
People in LIS have limited methods to communicate. In the past decades, technology has greatly improve their quality of life by providing a range of communication tools. However AAC are still constrained in communicating words and rarely include ways of expressing emotions. This work focused on studying the impact of expressing emotion on the communicative abilities of LIS patients. To do so we created a platform that allows the user to select an emotion between happy, angry and sad. A 3D avatar was then animated according to the selected emotion along-side with an emotionally modulated voice synthesis. This system was tested by 36 subjects who were successfully able to recognize the emotions from the voice modulation and the avatar. They found that the two emotional tools helped with the communication as they were more able to convey and identify emotions. This system is available in open-source [20] and also includes a gaze-based computer game [21] and a gaze-based web-browsing system [22]. While today the avatar is only expressing fixed emotions it shows the need for extending AAC tools to include more non-verbal communication cues. This system could in the future include additional animations such as lip synchronization, visual reaction to detected skin temperature (sweating, shivering), additional gesture (wink, hand gesture, raised eyes...), additional type of sounds (“waaaw”, “uhm uhm”, “oooh”).
The avatar could therefore become an extensive communication tool as well as a quick visual aid for the interlocutor, family and caregiver to understand the internal state of the patient. Advanced avatar control could be used for instance to perform art [23].
While voice modulation and facial expression are the most common in non-verbal communication, other types of natural communication may be simulated such as physical contact. Indeed, systems such as heating wristbands placed on family and loved ones may be activated by the patient using gaze control to convey the idea of arm touching.
In the future, patients’ emotions could be automatically detected for instance from physiological signals [24]. However, this should be investigated carefully as it would raise concerns regarding the patients’ willingness to constantly display their emotion without a way to hide them from their interlocuter.
Notes
- 1.
https://unity.com/ (Accessed Jan 14th 2020).
- 2.
https://teep-sla.eu/ (Accessed Jan 14th 2020).
References
Smith, E., Delargy, M.: Locked-in syndrome. BMJ 330(7488), 406–409 (2005)
Majaranta, P., Räihä, K.-J.: Twenty years of eye typing: systems and design issues. In: ETRA, vol. 2, pp. 15–22 (2002)
Laureys, S., et al.: The locked-in syndrome: what is it like to be conscious but paralyzed and voiceless? Prog. Brain Res. 150, 495–611 (2005)
Mehrabian, A.: Nonverbal Communication. Routledge, New York (2017)
Lo, S.-K.: The nonverbal communication functions of emoticons in computer-mediated communication. CyberPsychol. Behav. 11(5), 595–597 (2008)
Xue, Y., Hamada, Y., Akagi, M.: Emotional speech synthesis system based on a three-layered model using a dimensional approach. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 505–514. IEEE (2015)
Lee, Y., Rabiee, A., Lee, S.-Y.: Emotional end-to-end neural speech synthesizer. arXiv preprint arXiv:1711.05447 (2017)
Burkhardt, F.: Emofilt: the simulation of emotional speech by prosody-transformation. In: Ninth European Conference on Speech Communication and Technology (2005)
Neviarouskaya, A., Prendinger, H., Ishizuka, M.: Textual affect sensing for sociable and expressive online communication. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 218–229. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74889-2_20
Fabri, M., Moore, D.J., Hobbs, D.J.: The emotional avatar: non-verbal communication between inhabitants of collaborative virtual environments. In: Braffort, A., Gherbi, R., Gibet, S., Teil, D., Richardson, J. (eds.) GW 1999. LNCS (LNAI), vol. 1739, pp. 269–273. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46616-9_24
Morishima, S.: Real-time talking head driven by voice and its application to communication and entertainment. In: AVSP 1998 International Conference on Auditory-Visual Speech Processing (1998)
Tang, H., Fu, Y., Tu, J., Huang, T.S., Hasegawa-Johnson, M.: EAVA: a 3D emotive audio-visual avatar. In: 2008 IEEE Workshop on Applications of Computer Vision, pp. 1–6. IEEE (2008)
Baldassarri, S., Rubio, J.M., Azpiroz, M.G., Cerezo, E.: Araboard: a amultiplatform alternative and augmentative communication tool. Procedia Comput. Sci. 27, 197–206 (2014)
Na, J.Y., Wilkinson, K., Karny, M., Blackstone, S., Stifter, C.: A synthesis of relevant literature on the development of emotional competence: implications for design of augmentative and alternative communication systems. Am. J. Speech-Lang. Pathol. 25(3), 441–452 (2016)
Jacob, R.J.K.: Eye tracking in advanced interface design. In: Virtual Environments and Advanced Interface Design, pp. 258–288 (1995)
Matani, D.: An o (k log n) algorithm for prefix based ranked autocomplete. English, pp. 1–14 (2011)
Yuan, W., Semmlow, J.L.: The influence of repetitive eye movements on vergence performance. Vision. Res. 40(22), 3089–3098 (2000)
ItSeez3D: Avatarsdk (2014). https://avatarsdk.com. Accessed 31 July 2019
Tobii Group: Tobii 4C (2001). http://www.tobii.com. Accessed 4 Mar 2019
Larradet, F.: Liscommunication (2019). https://gitlab.com/flarradet/liscommunication/. Accessed 27 Dec 2019
Larradet, F., Barresi, G., Mattos, L.S.: Effects of galvanic skin response feedback on user experience in gaze-controlled gaming: a pilot study. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2458–2461. IEEE (2017)
Larradet, F., Barresi, G., Mattos, L.S.: Design and evaluation of an open-source gaze-controlled GUI for web-browsing. In: 11th Computer Science and Electronic Engineering (CEEC). IEEE (2018)
Aparicio, A.: Immobilis in mobili: performing arts, BCI, and locked-in syndrome. Brain-Comput. Interfaces 2(2–3), 150–159 (2015)
Jerritta, S., Murugappan, M., Nagarajan, R., Wan, K.: Physiological signals based human emotion recognition: a review. In: 2011 IEEE 7th International Colloquium on Signal Processing and its Applications, pp. 410–415. IEEE (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Larradet, F., Barresi, G., Mattos, L.S. (2020). Affective Communication Enhancement System for Locked-In Syndrome Patients. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Design Approaches and Supporting Technologies. HCII 2020. Lecture Notes in Computer Science(), vol 12188. Springer, Cham. https://doi.org/10.1007/978-3-030-49282-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-49282-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49281-6
Online ISBN: 978-3-030-49282-3
eBook Packages: Computer ScienceComputer Science (R0)