1 Introduction

Imagine John, a four-year-old boy and his mother, Alice, are eating at a local McDonalds when John points to the ketchup on the table (referred to as initiating joint attention bids, IJA) while looking at Alice, and saying “here is the ketchup”. In responding to John’s initiation, Alice looks at the ketchup and then back at John uttering “oh, yes, ketchup” (referred to as responding to joint attention bids, RJA). The eye shifting behaviors that Alice engaged between the ketchup and John are also included in such daily social interaction.

IJA and RJA are two key aspects of joint attention (JA) which must occur in social interaction [1]. It is regarded as the executive form of information processing from early in infancy through adulthood: it is predictive of later language development [2,3,4], theory of mind abilities [5] and social communicative skills [6]. Raver’s study on the social interaction between typically developing (TD) toddlers and their mothers revealed the link between JA and emotion regulation [7]. It is notably known that children with autism spectrum disorder (ASD) often exhibit atypical JA behaviors [8]. Specifically, they engage in fewer joint attention behaviors, including eye-gaze shifting [9], initiating and responding to joint attention bids [10], etc. Due to its criticality and motivated by the following two facts, we proposed the present study:

  • The ecological validity of an intervention

As White et al. argued that “Joint attention behaviors may vary across ethnicity, language, family structure, or socioeconomic status, and currently there is no assessment of how those vary” (pp. 1293 [11]); hence, there is a necessity of assessing (and charactering) such skills in a Chinese special education classroom.

  • Current assessment protocols are more inclined to focus on the more abstract and higher-level social skills where JA skills precede.

For example, mutual planning and joint performance in [12], turn-taking and negotiating in [13]. In our present study, the evaluation is measure in the context of the tasks—puzzle-making in a loosely coupled collaborative play environment to engage children with ASD so as to minimize their cognitive loads without enforced collaboration (EC) [14].

Our proposed approach presented in this paper is different from most of earlier attempts in that we do not capitalize the sophisticated feature-space construction methodology; instead, the simple designs and in-game automatic data collection offers hassle-free benefits for such individuals as special education teachers and parents to use in both classrooms and at homes.

The organization of this paper is as follows. In Sect. 2, relevant research will be presented; followed by the detailed descriptions of our training application including the defined IJA and RJA bids which can be utilized for behavioral pattern recognition. In Sect. 3, we will show the detailed in-game pattern analysis module. Section 4 includes a pilot testing in the lab with two typically developing (TD) adults for evaluating the feasibility of our behavioral pattern modeling module. We conclude our paper in Sect. 5 with discussions on our future research along this avenue.

2 Previous Works

Two indirect lines of past research are relevant to our present study.

2.1 IJA, RJA and Best Practices in Teaching JA Skills

Aligning with the two JA bids, Whalen and Schreibman [15] documented two phases of joint attention intervention in a non-computerized setting which has prevailed in such intervention: initiation and response training. The former includes coordinated gaze shifting and pro-declarative pointing. The latter sponsors five levels of responses as “response to hand on object”, “response to showing of object”, “eye contact”, “response to object being tapped”, “following a point”, and “following a gaze”. In addition, physical (i.e. touching a child’s hand to remind), verbal (utterances such as “you can drag the puzzle to here”) and gestural prompts were adopted to further assist children to engage with others during the response training phase [15]. Both IJA and RJA behaviors to JA bids had also been studied in a parent-child intervention setting (i.e. Parent-Mediated Communication-Focused Treatment in Children with Autism (PACT) [16]) and caregiver or parent mediated behavioral intervention (i.e. Joint Attention-Mediated Learning (JAML) [17, 18]).

Over the past years, computerized JA training applications have emerged, and many of them had been deployed in a collaborative play environment in a tabletop which allows larger space to afford joint performance [12,13,14, 19,20,21,22]. The majority of these earlier systems engage children in a tightly coupled collaborative play environment where only one work-space is deployed [1, 13, 19,20,21,22] except for [14] which does not enforce collaboration by providing private workspace for each child. These earlier works investigated the feasibility, usability and usefulness of the play environment in training JA skills, while the present study focuses on the automated pattern and data analysis to facilitate personalized training and intervention.

2.2 Behavioral Modeling and Pattern Analysis for Technology-Based ASD Intervention and Training

Users’ interaction and engagement in a virtual and physical space (including in a computerized application space) offers rich information on profiling and modeling the users in user-centered computing. In the pattern recognition and computer vision area, activity and behavioral analysis based on multimodal data has been much studied for a long time (for example among many references, [23,24,25,26]). The majority of these prior works focus on the recognition of single user activities and behaviors which often spans a considerable temporal duration. Recently, many works had focused on characterizing group activities at a coarse level [27,28,29,30,31, 34].

Among them, [28, 29, 31,32,33] targeted at children with ASD. For example, Chong et al. [28] attempted to measure and predict eye contact of infant with ASD via eye-gaze tracking during interaction sessions with the examiner who wears a pair of commercially-available glasses to capture infants’ face and head poses; such automatic system is beneficial and efficient to characterize atypical gaze behavior involving children with ASD in natural social settings. Anzulewicz et al. [29] focused on obtaining gesture data during ASD children’s (touch-sensitive) tablet gameplay sessions; the unique touch-sensitive screens and embedded inertial movement sensors are programmed to record movement kinematics and gesture forces. Winoto et al. [34] proposed to feature users’ movements in a naturalistic space which had been captured using depth-camera in the form of temporal skeleton data; and they argued that such data, if combined with other ambient sensing data could provide social-meter to predict social relationships. Prabhakar and Rehg [31] segmented and analyzed real-world social interaction videos to characterize turn-taking interactions between individuals. [32] documented a detailed study on the computational analysis of children’s social and communicative behaviors based on video and audio data in the dyadic social interaction between adults and children with ASD.

These recent earlier works demonstrated the applicability of the activity and behavioral pattern analysis mechanisms in the computer vision and pattern recognition area to assist therapists, care-givers and individuals with development disorders including ASD [32, 33].

Two recent studies focused on visualizing the behavioral patterns (including eye-gaze direction) during the social interaction between a child with ASD and a therapist [35, 36]. In both studies, sophisticated data capture system had been deployed. For example, in [36], the eye-gaze direction data were retrieved and analyzed via a high-definition video-recording system, followed by gaze-analysis based on facial landmark and head movement data. The computational cost of the system is inherently high; however, the authors claimed that compared with the manual rating and evaluation based on videos by therapist, the system can facilitate the medical specialist’s evaluation [36]; it is unclear, however, whether such behavioral visualizing system can easily be deployed. Unlike video-based data capturing system in [36], Kong et al. [35] utilizes Abaris to allow therapists using Anoto digital pen and paper technology and Nexidia voice recognition to create meaningful indices to videos [37].

Despite earlier efforts, however, the computational cost and sophistication of behavioral modeling systems in most of these works might prevent such automatic and semi-automatic systems from deploying, which might in turn restrict its actual use. Our proposed approach is different in that we do not capitalize the sophisticated feature-space construction; instead, the simple designs and in-game automatic data collection offers hassle-free benefits for such individuals as special education teachers and parents to use in both classrooms and at homes.

In the next two sections, detailed description of our system and the in-game data collection and automatic behavioral analysis model will be presented.

3 Our Joint Attention Training Application

3.1 The Training Application at a Glance

The two-player game is deployed on a 27 inches tabletop as a puzzle game (see Fig. 1, 2 for two screenshots). Figure 2 shows the general application design.

Fig. 1.
figure 1

Application screenshots where two players have a separate work space. (Color figure online)

Fig. 2.
figure 2

Application screenshots where two players have a separate work space where one puzzle piece belonging to the left player is in the right player’s work space (with puzzle border color in orange). (Color figure online)

Each child has her/his own work space, where he/she needs to piece the puzzles together; the blue button (i.e. the help button) with a question mark can be tapped when either play cannot find a piece in his/her workspace (Figs. 1, 2 and 3). Upon tapping the help button, the puzzle will be blinked to prompt to alert the child to pass the puzzle that does not belong to his/her space to another. When the puzzle is being blinked in a child’s work space, he/she can ignore it or take actions to swipe it to another space. The border color of each puzzle corresponds to the color of each player’s work space (Figs. 1, 2 and 3), which serves two purposes: (a) providing visual cues for each player; (b) initiating JA bids for a player when he/she points to a puzzle piece in another player’s workspace (see Fig. 4).

Fig. 3.
figure 3

Application screenshots where two players have a separate work space; a help button can be pressed for players to seek help. (Color figure online)

Fig. 4.
figure 4

The left player is seen pointing to the puzzle piece in another player’s space (IJA). (Color figure online)

3.2 The IJA and RJA Bids Defined in Our Application

As we discussed in the previous section, a help button has been placed at the right bottom of the screen for children to ask for help (see Fig. 3). Once they click the button, the puzzle piece on his/her own working space will be automatically moved to the correct place, the piece on other working space will blink to prompt another user to share it to his/her. Such blinking puzzle piece, in the form of a visual pattern provides visual cues to prompt for RJA. When the piece is in being blinked, the other child can ignore or take action to deliver the puzzle piece. As such, as a unique design in our training application, the behavior of clicking the help button is defined as a IJA bid. RJA bid occurs when the puzzle piece is blinking and (a) the player notices it; (b) the player passes the blinking piece to another play. Obviously, a child might have noticed a puzzle piece that does not belong to his/her works space, and swipe it to the other child, which is regarded as a indicative evidence of proactive help.

Table 1 below lists key IJA and RJA bids in the puzzle training game, where the ones below the green bar shows the unique designs in our application where such bids can be objectively assessed.

Table 1. IJA and RJA bids in our puzzle training application

These bids can best be evaluated based on behavioral and speech analysis of their actions (recorded in a video) during the interactions. In cases that either child failed to initialize or respond to each other’s attention bids, such reminders can come from the teacher/parents who are present, in the form of verbal and bodily cues [15].

4 Behavioral Modeling and Preliminary Analysis in Our Joint Attention Application

4.1 In-Game Data Collection Module

It includes a built-in game data collection module to indirectly assess the quantitative degree of reciprocity as well as the overall performance of each children. The help button is specially designed as a visual cue and objective measurement of proactive help.

Figure 5 shows these parameters. Each player has 12 puzzle pieces, the pieces for left and right player is labelled from L1 (R1) to L12 (R12) respectively. X and Y represent the 2D index of each puzzle piece (see the image in Figs. 1, and 2 for the user interface of this version of the application).

Fig. 5.
figure 5

In-game data related to the help behavior.

Proactive help is essential for mutual planning and better joint performance. Each player’s behavioral data will be collected and stored as a data book. The data reflects the temporal movement of the puzzle pieces registered to each player (Left and Right player respectively). For each movement of a puzzle piece, the following data will be automatically collected: the piece’s index, the time stamp of the attempt, the duration of the operation, the final location of the piece and the wrong place that the piece was placed, if any, respectively. Figure 6 shows such a data example of player’s behaviors for further user modeling and analysis.

Fig. 6.
figure 6

An example of player behaviors for user modeling and analysis

The data shown in Fig. 6 records the temporal manipulations of a puzzle piece by a given player (L or R). They are computed to measure the overall performance of the task and can be used to obtain the behavioral pattern of both players.

When all puzzle pieces had been placed on the correct place, our in-game data collector will generate one row of data which contains the performance of each player at a given level with the following additional behavioral data (Fig. 7).

Fig. 7.
figure 7

Overall behavioral data (by user).

Notice that in order to reduce stress for low-functioning ASD children, an advance help has been added: when such a button is pressed, the puzzle-filling operation will be automatically finished. Such a design is important in that individuals with ASD (including children) are much less reluctant to engage in eye-contact and close encounter of the face [38]. Hence, when such automatic operation is observed, the ASD child’s JA skills might need to be further trained.

4.2 The Feasibility Study: Preliminary Analysis

In order to assess the usability of the module before deploying it in a special education classroom, we conducted an in-lab study involving two TD adults (see Fig. 8). The test environment is similar to that in [14].

Fig. 8.
figure 8

Preliminary feasibility testing in the lab.

Due to limited space, in this paper, we focus on the RJA and IJA skills in terms of both proactive and non-proactive helps. To this end, we measure the quantity of these skills (see Fig. 5).

Two students were invited to participate in the study. They completed six levels of the games which consists of 6 (from level one to five) and 12 puzzle pieces (level six) respectively. We followed the study protocol as in [14].

Figures 9 and 10 show the temporal JA bid pattern of both players respectively. Some quick assessment of the quantity of the JA bids can be easily drawn from the figures. For example, the IJAs and RJAs of both players tend to show opposite patterns; overall, the left player is more socially active in terms of both IJAs and RJAs; more joint attention and proactive help patterns can be observed in when both players entered level four, five and six., etc.

Fig. 9.
figure 9

The JA bids of the left player.

Fig. 10.
figure 10

The JA bids of the right player.

Obviously, these behavioral marks provide rich information for therapists to assess the appropriateness of game activities as well as the social interaction patterns of players. The data collection and analysis can automatically be conducted at the background to allow tele-therapy and facilitate live behavioral marking by therapists in different physical location [37, 39, 40]. Our system has the advantage over the previous ones, including [28, 29, 31,32,33, 35, 36], in that it is lightweight and easily deployable, which thus made it an ideal use at home.

4.3 Discussion

Our in-game data collection module has been carefully designed to assess the performance of the tasks, measurement of reciprocity which is key to social interaction and JA skills [41].

We speculate that a good performance on a given level could indicate an intact or typical JA skill sets reflect compensatory strategies such as pressing the ‘auto-finish’ button. The preliminary in-lab study demonstrated the high feasibility of such an automated system from data collection to analysis.

Further sophisticated analysis such as finer-tuned eye-tracking is expected. However, such game design to facilitate compensatory strategy is necessary to avoid the melt-down of the child. A more challenging research path to pursue is to provide adaptive and personalized visual support based on such behavioral pattern recognition and analysis so as to enhance the quality of therapy and intervention [42].

5 Concluding Remarks and Future Works

Although recent research works have highlighted and demonstrated the applicability of the activity and behavioral pattern analysis mechanisms in offering early windows of opportunities in the assessment and intervention for individuals with ASD [32, 33], the computational cost and sophistication of such behavioral modeling systems might prevent such automatic and semi-automatic systems from deploying, which might in turn restrict its actual use.

Drawn from the findings from these earlier works, we proposed an easily deployable automatic system to train joint attention skills, assess the frequency and degree of reciprocity, characterizing the IJA and RJA behaviors. Our proposed approach is different from most of earlier attempts in that we do not capitalize the sophisticated feature-space construction methodology; instead, the simple designs and in-game automatic data collection offers hassle-free benefits for such individuals as special education teachers and parents to use in both classrooms and at homes. The preliminary in-lab study demonstrated the high feasibility of such an automated system from data collection to analysis.

The design of the game and activities followed our previous approach [14]; the revised system described in this paper (including the integration of the automated data collection and analysis module) had been developed based on our interview with the special education teachers.

We expect the system to be deployed in Chinese special education classrooms to evaluate its usability and applicability over an extended use of period.