Keywords

1 Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder mainly characterized by repetitive and restricted behaviors and deficits in verbal communication, social interaction and emotion recognition [3, 21]. Children with ASD are less willing to communicate with others including in classrooms where education takes place; hence, compared to those typically developing (TD) children, they have more limited channels to acquire new knowledge. In addition, although autistic children demonstrated delays in expressive and receptive language, the extent of such delays largely varies across the population and contexts [18]. On the other hand, infant brains are very malleable, so early intervention which largely capitalize on the great potential of learning that an infant brain has could lead to positive effects in limiting some developmental impairments [7, 10, 17, 21], including early language and nonverbal skills [18]. Thanks to the technological advances, numerous technology-based intervention applications have been developed (see [19] for a brief discussion on these). However, an overwhelming number of these previous works have been focused on teaching individuals with autism social communication skills [20], where few addressed the feasibility and efficacy of early language intervention; in additions, many prior works targets learning in a more formal and established learning environment (such as classrooms, clinical centers, etc.).

On the other hand, AR uniquely combines multiple methods of instruction channels including static and dynamic visual stimuli and auditory stimuli via accessing a kinesthetic movement using a mobile phone; it allows users to interact with the real-world in an enhanced way. By learning in different ways according to learners’ different needs is paramount and has been emphasized and grounded in a foundational framework called Universal Design for Learning (UDL) theory [16] which has been served as a launching point for these AR-based learning tools. Over the past few years, there are a number of AR-based teaching tools; however, few have been built to target children with autism, which motivates our work.

In particular, in this paper we present a lightweight augmented reality-based (AR) mobile word-learning application which allows users to capture a photo where up-to-four objects can be recognized and spoken out in both English and Chinese. The core of the application is an offline deep-learning technique-based object recognition module which is capable of recognizing objects from any angle. Such unique feature offers superior learning opportunities for not only children with autism but also those with other special needs, which had been confirmed from two very small-scale pilot studies.

In particular, a small-scale feasibility and usability pilot test was conducted during a public show with typically development (TD) children and adults (including parents who tried our application with their young children), with very satisfying results. Based on their comments, we simplified the system design. We demonstrated this enhanced version in a special education school in one of the biggest cities in southern China, interviewed some teachers and let some children play with it; our very positive feed-backs provide us with valuable inputs to further adjusting the design of the system.

The organization of this paper is as follows. Previous works will be presented in Section Two, while system descriptions are shown in Section Three. Our observations and discussions in the two pilot studies are shown in Section Four. We conclude this paper by revealing the early yet valuable insights from children, their parents and special education teachers which can be used for further development of our system.

1.1 Motivation

Since the current literature of AR-technology on Chinese word-learning for children with special needs is still in its infancy, which thus offers limited insights into its therapeutic efficacy, feasibility and applicability of individualized intervention for autistic individuals, particularly children. Our works, although preliminary, particularly offers early insights into our understanding towards the usability and usefulness of such AR-based mobile learning application.

2 Previous Works on AR-Based Technology for Children with Autism

There exists an abundant previous work on the adoption of AR technology for therapeutically use and education, particularly for individuals with development disorders [15]; the major advantage of such an AR-enabled environment is that it highly facilitates the cognitive mapping of what is in users’ prior knowledge with what they are observing in the real world [12]. Such authentic opportunities can thus promote knowledge transfer and offer more opportunistic learning. In this section, we focus on the use of the technology for tailored and personalized intervention for children with autism. The application of it for Chinese speaking autistic individuals is also discussed to motivate the development of our application.

The majority of AR-based applications had targeted intervention for enhancing children’s social and communication skills. For example, [9] described an Object Identification System which allows teachers superimpose digital content on top of physical objects; a five-week study revealed that AR-based application could lead to increased sustained and selective attention of children with autism, and elicits positive emotions, which thus promoted engagement during therapies. However, since the application requires specially trained therapists, it could not easily be used outside the clinic, which thus restricted its usefulness, as most of autistic children’s learning mainly occur outside a classroom. McMahon et al. [15] applied AR in teaching science vocabulary and strongly advocated the authentic opportunities enabled via AR for children with development disorder including autism. Improvement of attraction and enhanced social skills training had been observed in [5] where AR technologies had been used to visually conceptualize social stories for children with high-functioning autism. An AR-based application was also developed to train autistic children’s emotion expression and social skills [4]. Enhancing pretend play had been the focus in [2, 8, 11], and results from these studies indicate that the AR-based technology offers superior advantages over other traditional intervention techniques; among these three studies, [8] focused on such AR-based play setting in a classroom. [1] implemented an audio-augmented paper which supports audio recording with standard sheets of paper in a storytelling activity; the unique feature of the tool is that it is built with tangible physical tools that can be shared between the therapist and the child. The AR-based Google Glass was studied in home-based social affective learning in children with autism [6]. In a series of studies at home with parents on facial affect recognition tasks, a reported increased eye contact and greater social acuity has led the researchers to support its therapeutic purposes [6]. Liu et al. [13] systematically explored, for the first time, the feasibility of autism-focused AR-based smart-glasses for training social communication and behavioral coaching and concluded that the AR-based can significantly increase children’s engagement and fun, which thus might in turn improve the respective skills during intervention. However, a recent study on AR for social skills intervention failed to find significant improvement between groups [14]; further ecological studies are necessary. Despise it, due to the unique and immersive learning environment an AR-based system can offer, it would be more popular in future systems.

Of note, we failed to uncover any published English articles on AR-based application for Chinese speaking users, which is the motivation as well as the uniqueness of our works.

3 System Overview and Offline Object Recognition

3.1 The Offline Object Recognition Module Powered Within the TensorFlow Platform

The offline object recognition module was implemented within Google’s TensorFlowFootnote 1 machine learning framework. Since our system aims at offering and facilitating teaching and learning at any time and any place (particularly in an outdoor setting and at home) without relying on online training, we did not modify the sample codes; instead, we took advantage of around one hundred already-trained models to cold start our system. We realized that in the absence of large amount of training data, such algorithm might suffer from inaccurate recognition which was constantly observed during our in-lab testing. To alleviate this problem, we added and integrated a small module to allow the user to correct the result manually; the discussion of it is beyond the scope of this paper.

At present, our system can recognize around one hundred typical daily objects in an offline fashion, which thus offers tremendous advantage particularly for rural users who have much less access to therapists and the internet [19].

3.2 System Development Phases and User Interfaces: System Versions Without Arduino Sensors

Our system went through several design iterations, which aims at facilitating the ease of use for autistic children and their loved ones, particularly in rural areas. Two types of the AR-based systems had been implemented, one with Arduino sensors and one without these sensors (target younger children). In this paper, we focus our discussion on the latter part. Figure 1 presented two sample user interfaces during these design iterations where the system can only recognize one object per use (the first and middle images in Fig. 1).

Fig. 1.
figure 1

Three sample user interfaces in the system development iterations where more than one objects can be recognized (in the rightmost image, the system is seen to recognize four objects in one scene.

To differentiate and highlight the items in one scene, colored border will wrap the object and the colored border is matched to colored buttons; when a user presses a colored button, the Chinese word for the item corresponding to the colored button will be shown and spoken, as shown in Fig. 2. Notice in Fig. 2, the user targeted a photo in a browser, which shows that our system is capable of supporting learning anywhere and anytime for these children.

Fig. 2.
figure 2

A sample example of the word recognition where both Chinese and English words appear; and all the bicycles are highlighted.

Notice that as shown in Fig. 2, button design has been simplified by removing the texts and replacing them with simple buttons to facilitate younger autistic children with limited vocabulary.

4 Observations from Pilot Studies and Discussions

Since there is no prior works to draw in the design of such AR-based word-learning mobile application, we conducted two small pilot studies to obtain some feed-backs and inputs from TD children and their parents, children with autism and their loved ones and special education teachers. Main observations and discussions will be presented in the following sections.

4.1 Pilot Study One and Main Observations

The first small-scale usability testing is conducted with typical children and adults in a public show on the university campus. Another testing goal is to assess the accuracy of the offline object recognition algorithm and its portability. Figure 3 shows the moment when a young girl whose mother was seen by her side was shown the application; Fig. 4 illustrates a group of adults were demonstrated the offline object-recognition.

Fig. 3.
figure 3

A young girl was shown of the application recognizing an object by the researcher where her mother is seen at her back.

Fig. 4.
figure 4

A group of adult attendees were shown of the application recognizing an object by the researcher.

The simple and lightweight mobile application receives very good reviews from both adults and children, particularly for parents and young audiences who claimed that such AR-based applications are very rare in China; and children were observed to demonstrate high interest in trying it after several rounds. Parents are particularly satisfied with the audio module of the application which could facilitate teaching and learning at home and largely relieve them from repeated teaching.

Supervising, the accuracy of the application is satisfying, though a few items supplied by the audiences were wrongly recognized. Another main reason that the accuracy seems not be a big issue in this pilot study might be due to the limited time each child is playing with the application.

We hypothesize that objects that are used for teaching and learning with young children are typical ones which mostly can be accurately recognized by the algorithm. However, when children grow up with expanding vocabularies and sophisticated environment, the performance of the algorithm will be declined.

4.2 Pilot Study Two and Discussions

General Description and Goals.

We conducted a second pilot test in a private special education center in Hangzhou, one of China’s biggest cities. The main goals are two folds and same as those in study one.

Study Participants and Environment.

Five children with two different age groups tried the application; one group consists of children under five years old; another is those between six to eight years old. Testing objects including a set of toy animals (see Figs. 5 and 6) we brought and those in the center. Besides children, accompanying parents and teachers had also tried the application and been interviewed by us.

Fig. 5.
figure 5

The toy animals brought to the center for children to try.

Fig. 6.
figure 6

The application can accurately recognize the elephant toy.

General Observation with Children with Autism and Discussions.

Overall, the group with younger children are observed to have difficulty in using the application correctly; they did show excited-ness and surprise after the application pronounces objects’ names. However, they quickly lost interest in the app for hearing the voice several times. Comparably, the application is very well received among elder children who not only showed high interests in learning with application, but also were observed to use the application without any training and difficulty. Figure 7 shows one testing moment with a child on the animal toy group, while Fig. 8 shows testing on multiple cups.

Fig. 7.
figure 7

The application is being tested among a child and his parent at a private education center.

Fig. 8.
figure 8

A boy was seen testing on multiple cups in the center with the research at a private education center.

Feed-backs from Teachers.

Very satisfying feed-backs had been received from teachers regarding its usefulness and usability. Specifically, they are especially satisfied with the advantages of technology-based application to significantly reduce their efforts: the repeatability of the application. Its ease of use and its audio features also attract their attention in that children with special needs could also learn the pronunciations. However, the performance of the offline algorithm is not satisfactory. For example, the application cannot recognize some books due to their seem-to-be strange and varied covers, shapes and colors. Teachers claimed that it would make an excellent learning companion if the performance could be significantly improved.

5 Concluding Remarks and Future Works

We developed a lightweight AR-based word-learning application for children with autism to learn words at anytime and anywhere, particularly outside the classrooms where many AR-based applications targeting TD children works effectively.

Overall, the feasibility and usability results obtained from our two pilot studies are aligned with those in previous ones, particularly in [8, 13] that the application greatly attracts children’s attention, which thus might promote learning at their own pace outside the classroom. Both special education teachers and parents’ feed-backs highlighted the importance of learning while playing and learning at anytime and anyplace, not only for children with autism but those with other special needs.

However, the accuracy of the offline object recognition model significantly compromised the acceptability and general applicability of our application. That said, how to balance accuracy and lightweight is crucial. Our current thought is to integrate a reinforcement learning module to take user-inputs so as to further train the existing algorithm.

Despite it, to the best of our knowledge, our application, as the first few, offers valuable insights into the design of such mobile learning applications, particularly to facilitate learning at anytime and anywhere.