American sign language recognition and training method with recurrent neural network

https://doi.org/10.1016/j.eswa.2020.114403Get rights and content

Highlights

  • An American Sign Language recognition model was developed using Leap Motion.

  • LSTM-RNN with kNN method was proposed for recognition 26 alphabets.

  • 3D motion of hand gesture and relevant 30 features were extracted.

  • 26 alphabets with recognition rate of 99.44% accuracy was obtained.

Abstract

Though American sign language (ASL) has gained recognition from the American society, few ASL applications have been developed with educational purposes. Those designed with real-time sign recognition systems are also lacking. Leap motion controller facilitates the real-time and accurate recognition of ASL signs. It allows an opportunity for designing a learning application with a real-time sign recognition system that seeks to improve the effectiveness of ASL learning. The project proposes an ASL learning application prototype. The application would be a whack-a-mole game with a real-time sign recognition system embedded. Since both static and dynamic signs (J, Z) exist in ASL alphabets, Long-Short Term Memory Recurrent Neural Network with k-Nearest-Neighbour method is adopted as the classification method is based on handling of sequences of input. Characteristics such as sphere radius, angles between fingers and distance between finger positions are extracted as input for the classification model. The model is trained with 2600 samples, 100 samples taken for each alphabet. The experimental results revealed that the recognition rate for 26 ASL alphabets yields an average of 99.44% accuracy rate and 91.82% in 5-fold cross-validation with the use of leap motion controller.

Introduction

Sign languages are natural languages that have been developed through the evolution of contact between the hearing impaired but not invented by any system (Napier & Leeson, 2016). They differ from spoken languages in primarily two ways. First, sign languages are natural and mature languages are “articulated in visual-spatial modality”, unlike spoken ones, that are presented in “oral-aural modality”. Second, Napier and Leeson (2016) pointed out that sign languages employ two hands, facial muscles, the body and head and sometimes also involve vocalisation. They are neither universal nor mutually intelligible (Beal-Alvarez, 2014). In other words, a sign language that is developed in one region is not applicable in other regions and contains non-relevant varieties that require special methods/techniques of acquisition. Currently, 141 types of sign languages exist worldwide (Liddell & Johnson, 1989).

The American sign language (ASL) is the foremost used language for the deaf in the United States and English-speaking regions of Canada (Napier, Leigh, & Nann, 2007). Though increasing recognition for ASL has boosted confidence among the hearing impaired, the limited resources available has created social and cultural issues among the hearing impaired communities, compared to the amount of linguistics research despite the amount of linguistic research carried out in the field (Marschark & Spencer, 2010). In the United States, hearing impaired and hard-of-hearing students can choose between attending residential (catering to only students who are hearing impaired or hard-of-hearing) or public schools. As the integration of hearing impaired with peers without hearing impairment is emphasised, an increasing number of hearing impaired students are enrolling in public schools. However, they are placed in environments without adequate teaching support in most cases (Marschark & Spencer, 2010).

To create an inclusive environment with hearing students and hearing impaired in public schools, promoting ASL among the hearing public would be effective. With the implementation of ASL in schools, hearing teachers and students can communicate through both linguistic and non-linguistics ways that can aid in creating an interactive environment for hearing impaired and hard-of-hearing students and thus enhance the effectiveness of academic learning. Furthermore, the promotion of ASL helps achieve the inclusion of the hearing impaired in society through boosting learning motivation with educational applications. Being a feasible and economical solution, the leap motion controller is commonly used as a device for sign recognition systems (Arsalan et al., 2020, Elboushaki et al., 2020). However, there exists a research gap on the adoption of leap motion controller in sign education purposes. A predominant section of the research only examines the viability of different sign recognition models with the leap motion controller and does not extend the model into an educational application that aids sign language learning and promotes sign languages. Only Parreño, Celi, Quevedo, Rivas, and Andaluz (2017) have proposed a didactic game prototype for Ecuadorian signs. Therefore, there is a paucity of research focusing on the development of educational applications for ASL with the leap motion controller and investigating the effectiveness of such applications in improving sign learning.

This research seeks to design an ASL learning application in game-learning and develop a real-time sign recognition system with leap motion controller for the use of the application. The sign recognition environment starts with identifying and extracting ASL’s sign features and by subsequently developing a suitable algorithm for the recognition system. After applying the algorithm and training the network architecture, the system gains the capacity to recognise and classify ASL signs into 26 alphabets. The classification using feature extraction was processed by long-short term memory recurrent neural network (LSTM-RNN) with k-nearest neighbour (kNN) method. Finally, the system will be integrated into the game environment in the ASL learning application. This application is expected to promote ASL among the hearing impaired and the non-hearing impaired, thereby motivating them to learn ASL by entertainment and engagement provided by the game environment and further helping the hearing impaired to better integrate into society. Furthermore, it encourages and promotes the use of ASL as a second language that is worthy of acquiring.

The contributions of the research can be summarised as follows:

  • The proposed LSTM-RNN with kNN method could recognise 26 alphabets with a recognition rate of 99.44% accuracy and 91.82% in 5-fold cross-validation using leap motion controller. The proposed method outperforms other well-known algorithms in the literature.

  • Leap motion controller is a monochromatic-IR-cameras and three-infrared-LEDs based sensor to track the 3D motion of hand gesture, including Palm centre, fingertip position, sphere radius and finger bone positions for every 200 frames collected. Given that those data are available using a leap motion controller, we could further extract the feature for the classification of ASL, which an application in our study.

  • The programming flow of the proposed model was designed as a learning-based program. A game module and recognition module are performed in real-time. We aim at promoting ASL in a learning-based environment as our application.

The rest of this article is organised as follows. Section 2 describes the literature review and Section 3 illustrates the proposed framework for the ASL learning application, including the game module and real-time recognition system. Section 4 presents the validation results and analyses the performance of the proposed recognition system. Section 5 summarises the research, including the conclusion, research contributions, limitations and future development.

Section snippets

Learning application

In terms of educational technology, knowledge acquisition in students can be improved through the fusion of academic activities with interactive, collaborative and immersive technologies (Souza, 2015). Notably, several studies have proposed new approaches that stimulate sign language mastering and knowledge acquisition by promoting motivation and excitement in pedagogical activities. Parreño et al. (2017) suggested that an intelligent sign learning game-based system is more effective in the

Methodology

The system conceptual framework is shown in Fig. 2 and consists of two running modules - game module and the real-time sign recognition system. The proposed learning application is fundamentally, a special Whack-A-Mole game. Rather than mouse-clicking, a question pertaining to ASL signs has to be accurately answered in order to strike the mole. Each mole would come up from 7 holes randomly holding a stick, on which 1 out of the 26 English alphabets is randomly printed. In the meantime, the

Results and discussion

With cross-validation, the comprehensive performance of the model can be evaluated before the output as the real-time sign recognition module of the game. In this session, 5-fold cross-validation was performed and the overall accuracy of the model is estimated to be 91.8%, averaging the 5 trials. The result is shown in Table 6.

Meanwhile, 26-class confusion matrices for the 5 trials were generated and were further transformed into matrices of TP, TN, FP and FN. Accuracy, sensitivity and

Concluding remarks

Sign recognitions in real-life applications are challenging due to the requirements of accuracy, robustness and efficiency. This project explored the viability of a real-time sign recognition system embedded in an ASL learning application. The proposed system involves the classification of 26 ASL alphabets and 30 selected features for the training of the model. The RNN model is selected since dynamic signs J and Z require the process of sequences of input. The overall accuracy of the model in

CRediT authorship contribution statement

C.K.M. Lee: Conceptualization, Validation, Resources, Supervision, Funding acquisition. Kam K.H. Ng: Conceptualization, Methodology, Resources, Writing - original draft. Chun-Hsien Chen: Conceptualization, Validation, Supervision. H.C.W. Lau: Conceptualization, Validation, Resources. S.Y. Chung: Data curation, Formal analysis, Writing - original draft. Tiffany Tsoi: Data curation, Formal analysis, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to express their gratitude and appreciation to the anonymous reviewers, the editor-in-chief and editors of the journal for providing valuable comments for the continuing improvement of this article. The research was supported by the Hong Kong Polytechnic University, Hong Kong and Nanyang Technological University, Singapore and The University of Western Sydney, Australia.

References (53)

  • R. Rastgoo et al.

    Hand sign language recognition using multi-view hand skeleton

    Expert Systems with Applications

    (2020)
  • W. Tao et al.

    American Sign Language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion

    Engineering Applications of Artificial Intelligence

    (2018)
  • J.M. Valente et al.

    SVR-FFS: A novel forward feature selection approach for high-frequency time series forecasting using support vector regression

    Expert Systems with Applications

    (2020)
  • B. Zhong et al.

    Deep learning-based extraction of construction procedural constraints from construction regulations

    Advanced Engineering Informatics

    (2020)
  • W. Aly et al.

    User-independent american sign language alphabet recognition based on depth image and PCANet Features

    IEEE Access

    (2019)
  • D. Avola et al.

    Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures

    IEEE Transactions on Multimedia

    (2019)
  • Beal-Alvarez, J. S. (2014). Deaf students’ receptive and expressive American Sign Language skills: Comparisons and...
  • Bheda, V., & Radpour, D. (2017). Using deep convolutional networks for gesture recognition in American sign language....
  • Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation...
  • Chong, T.-W., & Lee, B.-G. (2018). American sign language recognition using leap motion controller with machine...
  • C.-H. Chuan et al.
  • Ciaramello, F. M., & Hemami, S. S. (2011). A Computational Intelligibility Model for Assessment and Compression of...
  • Du, Y., Liu, S., Feng, L., Chen, M., & Wu, J. (2017). Hand Gesture Recognition with Leap Motion. arXiv preprint...
  • R. Fluss et al.

    Estimation of the Youden index and its associated cutoff point

    Biom. J.

    (2005)
  • E. Fujiwara et al.

    Flexible optical fiber bending transducer for application in glove-based sensors

    IEEE Sensors J.

    (2014)
  • Cited by (71)

    • A review of IoT systems to enable independence for the elderly and disabled individuals

      2023, Internet of Things (Netherlands)
      Citation Excerpt :

      IoT for sensory and communication/voice challenges attempt to improve the sensory and communication skills of the elderly and the disabled. In this category, we include hearable devices to help listen [182], devices to help people to see via retinal implants [183,184], electrolarynxes augmented with neural networks for people with voice challenges [185,186], mobile devices/wearables to translate in real time sign language (using deep neural networks [187] and other machine learning algorithms [188]). In addition to IoT for communication, some other IoT such as smart gloves and wearable accelerometers connected to games [189,190] have been developed to stimulate the proprioceptive system which is a system in the human body (and other vertebrates/invertebrates) that provides awareness of self-movement and body position It worth noting that research has shown that the proprioceptive system is used by the elderly more than visual and vestibular cues for posture control [191].

    View all citing articles on Scopus
    View full text