Keywords

1 Introduction and Background

One of the core characteristics of Autism Spectrum Disorder (ASD) is the presence of early and persistent impairments in social-communicative skills (APA 2013); and among the diagnostic characterization, difficulty in recognizing faces and interpreting facial emotions have been reported at all stages of development in ASD (among many (Harms et al. 2010; Baron-Cohen et al. 1993; Gross 2004; Hobson 1986). However, these earlier works on face perception and emotion recognition functions have produced inconsistent results (Picard 2009; Peterson et al. 2015; Gross 2004; Dawson et al. 2010; Nuske et al. 2013), thus, research in this area remains inconclusive to date (Webb et al. 2016; Weigelt et al. 2013; Peterson et al. 2015). Nuske et al. urges more empirical studies to be conducted at various contexts of an “emotion communication system” (2013).

Meanwhile, despite these inconsistent and inconclusive results, in response to the population’s diminished functions in recognizing emotion, a number of computer-assisted applications have been developed to train face perception, emotion mimicking and demonstration skills (among numerous, (Harrold et al. 2014; Golan et al. 2010, Kouo and Egel 2016; Rice et al. 2015; Lacava et al. 2007; Lierheimer and Stichter 2012; McHugh et al. 2011; Hopkins et al. 2011)).

However, in our present study, instead of training autistic children emotion recognition skills, we offer a collaborative play environment to inform autistic children each other’s emotions with an aim to engage them happily and with much less stress. As Baron-Cohen put it in 1993 that a training environment “cannot expect learning to proceed smoothly or even to occur at all if the information is in a form that causes distress or is even painful” (p. 3527, (Baron-Cohen et al. 1993)). The emotion recognition is accomplished through a mounted motion capture camera, the Intel RealSenseTM which can capture users’ facial landmark data and generate emotion labels accordingly.

The organization of this paper is as follows. In Sect. 2, we provide discussions on previous works in order to position our research in the research context. In Sect. 3, the first version of the game will be presented along with a short discussion on our pilot study. Finally, we will discuss our current plan and conclude our paper in Sect. 4.

2 Related Work

2.1 Emotion Recognition Training Games for Children with ASD

According to a recent white paper, there are more than two million children with ASD in China (Colorful deer 2015). Chinese children with ASD, like their western counterpart, have difficulty experiencing emotion, and communicating with others, which have posed a serious problem for their families and the society (Cong 2010). To the best of our knowledge, there is no dedicated computerized emotion-recognition training program in China. Yet, it has been recognized that early intervention on face perception and emotion recognition skills for children with ASD is very crucial (Rehg 2011, 2013; Webb et al. 2016).

Computer-aided Learning (CAL) for autism has been heralded to offer a very consistent and predictable environment to the users (Colby 1973; Golan et al. 2007; Yamamoto and Miya 1999; Moore and Calvert 2000; Bölte et al. 2006). Hence, there does not lack of such computer assisted training and remediation environment where English remains the main communication language (Harrold et al. 2014; Golan et al. 2010; Kouo and Egel 2016; Rice et al. 2015; Lacava et al. 2007; Lierheimer and Stichter 2012; McHugh et al. 2011; Hopkins et al. 2011; Bölte et al. 2006; Golan et al. 2015). The faces used in almost all of these training applications are posed by typically developing (TD) individuals. For example, Natalie et al. developed an iPad game, CopyMe, as a serious offline single-player game for children to learn emotions through observation and mimicry (Harrold et al. 2014). In particular, a player is asked to mimic the photo expression in CopyMe (posed by TD individuals) in order to advance to the next level. In the small-scale pilot study, some individuals with ASD struggled to make expressions which is consistent to one of the core impairments the population exhibit. Hence, the validity of the training approach in CopyMe remains unknown; the authors did propose to include more player inputs to complement the insufficiency of the facial expression.

These previous studies find that computer-based intervention is more suitable than the paper-based intervention for the young children (Harrold et al. 2014), and more complex social skills including complex emotion recognition can improve with CAL approach (Golan et al. 2010; Golan and Baron-Cohen 2006; Lacava et al. 2007; Young and Posselt 2012).

2.2 Faces Posed by Children with ASD for Emotion Recognition Training Games: Current Progress

Almost all of the prior works make trainings on either animated faces or posed faces by TD individuals (Tang 2016) mainly due to the population’s persistent and noted impairments in posing recognizable facial expression (Brewer et al. 2015; Grossman et al. 2013; Weimer et al. 2001) and recognizing emotions (among many, (Baron-Cohen et al. 1993; Gross 2004; Harms et al. 2010). Some clinical, neurological and behavioral works emerged to address the issue on the emotion recognition skills on faces posed by individuals with ASD (among many, recent ones (Brewer al 2015; Capps et al. 1993; Faso et al. 2015; Stagg et al. 2014)). Some computerized approach relying on capturing facial expression posed by individual with ASD has emerged (Tang and Winoto 2017; Tang et al. 2017); our understanding on it is very limited (Tang 2016) and more works are expected which motivates our current study.

Our game is similar to that described in (Harrold et al. 2014), but ours will distribute the emotion labels through on-screen visualization to another player (see Fig. 3 on the current design). Therefore, our game could provide greater flexibility and generate less stress for children engaging in the play environment and foster more natural collaboration accordingly.

2.3 Computational Sensing Based on Facial Landmark Data for Automatic Emotion Recognition

While emotion recognition research is mature and emergingly popular thanks to the recent rekindled interest in deep learning field, however, learning emotion labels based on autistic facial data is rare (Tang et al. 2017).

According to (Rehg 2011, 2013), it is very labor-intensive to acquiring social and communication behavioral data. Computational sensing could play a key role in transforming the measurement, analysis, and understanding of such human behavior (Rehg 2011, 2013, Tang et al. 2017). In our previous study, we rely on the Microsoft Kinect motion sensor (v2) to capture autistic children’s skeleton data (Winoto et al. 2016); however, due to the interferences between multiple sensors, it is too computational costly to adopt such a system at home. Rehg pointed out that widespread availability and increasingly low cost of sensor technology makes it possible to capture a multimodal portrait of behavior through video, audio, and wearable sensing (Rehg 2011). (Tang et al. 2017) mounted a portable motion camera, the Intel RealSenseTM to learn and generate autistic children’s emotion labels during their cartoon-watching sessions. The generated emotion tags were then compared with the manually labeled ones by the special education teacher or their parents who were at present for validation purpose.

While their studies offer an early glimpse of such automatic emotion recognition via face-tracking on autistic facial landmarks (Tang et al. 2017), it is different from present study in that in our proposed game, no human intervention is needed. Instead, the game is expected to make adjustment based on autistic children’s behaviors

3 Our Game

3.1 Early Design of the Game

Our Game and Playing Rules

Our proposed game is a multiplayer feeding game, where two players need to feed some fishes in a simulated aquarium (see Fig. 1). Each player will play on a PC connected to another one through LAN. In addition, player’s information will be shown to others on the game screen (or teachers or other children in the environment) using various light colors and intensity without overstimulating autistic children (see Fig. 2). Our game is intended to help children with autism to express their own emotion or to detect other’s emotion, and can also allow people with typical development (TD) be informed of it.

Fig. 1.
figure 1

The user interface of the game (version one)

Fig. 2.
figure 2

Two players are seen playing the game together

Behavioral Data Collection for Game Tuning and Computational Sensing.

Our system will also record the players’ in-game action (playing) and other behavioral data (such as speech and prosody characteristics) which can be used to automatically adjust some game parameters (such as playing speed so as to maximize game playability). These behavioral data, meanwhile, can further be computed for teachers and clinical doctors to understand children’s behavioral patterns (Winoto et al. 2016; Rehg 2011, 2013; Picard 2009; Tang et al. 2017).

Pilot Testing and User Feedbacks.

The first version of the game has been tested by two adults; feedbacks were given on the user interface design aspects (Fig. 2). Figure 3 lists the new game design with a story.

Fig. 3.
figure 3

The updated design

A pilot study is scheduled later in the summer.

3.2 Emotion Recognition via Face-Tracking

Players’ facial expression is captured via an Intel RealSenseTM motion capture camera; computation and generation of emotion labels is then accomplished through the API provided by RealSenseTM. Four movements in the face and head areas are supported in the FaceExpression Module provided in the API: eye brow movement, mouth movement, head movement, and eye movement. For example, the “Smile Score” computed based on mouth movement data returns a value between 0 (no smile at all) to 100 respectively. The collected data include the timestamp associated with continuous micro-mouth movements (Fig. 4) when the player’s face remains in the detected area.

Fig. 4.
figure 4

The mouth movement data collected associated with smile score

This computation is different from our previous work where we designed lightweight emotion-recognition algorithm to compute and generate emotion index based on Face Action Units (AUs) (Tang et al. 2017). It is unclear which approach yields to more accurate results even though assessment of such emotion labels remains to be a challenging issue (Tang 2016; Tang and Winoto 2017).

4 Discussion and Further Work

Much heterogeneity is apparent in emotion recognition and processing in ASD, more empirically studies need to be conducted. Previous attempts on computer assisted emotion recognition and face perception training applications had built upon the theory of mind (ToM) which has been empirically investigated to significantly improve the abilities in children with ASD (Weigner and Depue 2011). However, the ecological validity of such results, across population, is unknown. In this paper, stead of pursing research down this path, we offer a collaborative play environment to inform autistic children each other’s emotions with an aim to engage them happily and with much less stress. The emotion recognition is accomplished through a mounted motion capture camera which can capture autistic children’s facial landmark data and generate emotion labels accordingly.

Although the research described in this paper offers an early glimpse of one of the few earliest attempts down this path, it is our hope that the experiment and knowledge emerged from such an early attempt would help to inform remediation strategies of this kind to target emotion-related difficulties in order to help individuals with ASD to lead emotionally rich lives during their social interaction within the population and with TD individuals.