American sign language recognition and training method with recurrent neural network
Introduction
Sign languages are natural languages that have been developed through the evolution of contact between the hearing impaired but not invented by any system (Napier & Leeson, 2016). They differ from spoken languages in primarily two ways. First, sign languages are natural and mature languages are “articulated in visual-spatial modality”, unlike spoken ones, that are presented in “oral-aural modality”. Second, Napier and Leeson (2016) pointed out that sign languages employ two hands, facial muscles, the body and head and sometimes also involve vocalisation. They are neither universal nor mutually intelligible (Beal-Alvarez, 2014). In other words, a sign language that is developed in one region is not applicable in other regions and contains non-relevant varieties that require special methods/techniques of acquisition. Currently, 141 types of sign languages exist worldwide (Liddell & Johnson, 1989).
The American sign language (ASL) is the foremost used language for the deaf in the United States and English-speaking regions of Canada (Napier, Leigh, & Nann, 2007). Though increasing recognition for ASL has boosted confidence among the hearing impaired, the limited resources available has created social and cultural issues among the hearing impaired communities, compared to the amount of linguistics research despite the amount of linguistic research carried out in the field (Marschark & Spencer, 2010). In the United States, hearing impaired and hard-of-hearing students can choose between attending residential (catering to only students who are hearing impaired or hard-of-hearing) or public schools. As the integration of hearing impaired with peers without hearing impairment is emphasised, an increasing number of hearing impaired students are enrolling in public schools. However, they are placed in environments without adequate teaching support in most cases (Marschark & Spencer, 2010).
To create an inclusive environment with hearing students and hearing impaired in public schools, promoting ASL among the hearing public would be effective. With the implementation of ASL in schools, hearing teachers and students can communicate through both linguistic and non-linguistics ways that can aid in creating an interactive environment for hearing impaired and hard-of-hearing students and thus enhance the effectiveness of academic learning. Furthermore, the promotion of ASL helps achieve the inclusion of the hearing impaired in society through boosting learning motivation with educational applications. Being a feasible and economical solution, the leap motion controller is commonly used as a device for sign recognition systems (Arsalan et al., 2020, Elboushaki et al., 2020). However, there exists a research gap on the adoption of leap motion controller in sign education purposes. A predominant section of the research only examines the viability of different sign recognition models with the leap motion controller and does not extend the model into an educational application that aids sign language learning and promotes sign languages. Only Parreño, Celi, Quevedo, Rivas, and Andaluz (2017) have proposed a didactic game prototype for Ecuadorian signs. Therefore, there is a paucity of research focusing on the development of educational applications for ASL with the leap motion controller and investigating the effectiveness of such applications in improving sign learning.
This research seeks to design an ASL learning application in game-learning and develop a real-time sign recognition system with leap motion controller for the use of the application. The sign recognition environment starts with identifying and extracting ASL’s sign features and by subsequently developing a suitable algorithm for the recognition system. After applying the algorithm and training the network architecture, the system gains the capacity to recognise and classify ASL signs into 26 alphabets. The classification using feature extraction was processed by long-short term memory recurrent neural network (LSTM-RNN) with k-nearest neighbour (kNN) method. Finally, the system will be integrated into the game environment in the ASL learning application. This application is expected to promote ASL among the hearing impaired and the non-hearing impaired, thereby motivating them to learn ASL by entertainment and engagement provided by the game environment and further helping the hearing impaired to better integrate into society. Furthermore, it encourages and promotes the use of ASL as a second language that is worthy of acquiring.
The contributions of the research can be summarised as follows:
- •
The proposed LSTM-RNN with kNN method could recognise 26 alphabets with a recognition rate of 99.44% accuracy and 91.82% in 5-fold cross-validation using leap motion controller. The proposed method outperforms other well-known algorithms in the literature.
- •
Leap motion controller is a monochromatic-IR-cameras and three-infrared-LEDs based sensor to track the 3D motion of hand gesture, including Palm centre, fingertip position, sphere radius and finger bone positions for every 200 frames collected. Given that those data are available using a leap motion controller, we could further extract the feature for the classification of ASL, which an application in our study.
- •
The programming flow of the proposed model was designed as a learning-based program. A game module and recognition module are performed in real-time. We aim at promoting ASL in a learning-based environment as our application.
The rest of this article is organised as follows. Section 2 describes the literature review and Section 3 illustrates the proposed framework for the ASL learning application, including the game module and real-time recognition system. Section 4 presents the validation results and analyses the performance of the proposed recognition system. Section 5 summarises the research, including the conclusion, research contributions, limitations and future development.
Section snippets
Learning application
In terms of educational technology, knowledge acquisition in students can be improved through the fusion of academic activities with interactive, collaborative and immersive technologies (Souza, 2015). Notably, several studies have proposed new approaches that stimulate sign language mastering and knowledge acquisition by promoting motivation and excitement in pedagogical activities. Parreño et al. (2017) suggested that an intelligent sign learning game-based system is more effective in the
Methodology
The system conceptual framework is shown in Fig. 2 and consists of two running modules - game module and the real-time sign recognition system. The proposed learning application is fundamentally, a special Whack-A-Mole game. Rather than mouse-clicking, a question pertaining to ASL signs has to be accurately answered in order to strike the mole. Each mole would come up from 7 holes randomly holding a stick, on which 1 out of the 26 English alphabets is randomly printed. In the meantime, the
Results and discussion
With cross-validation, the comprehensive performance of the model can be evaluated before the output as the real-time sign recognition module of the game. In this session, 5-fold cross-validation was performed and the overall accuracy of the model is estimated to be 91.8%, averaging the 5 trials. The result is shown in Table 6.
Meanwhile, 26-class confusion matrices for the 5 trials were generated and were further transformed into matrices of TP, TN, FP and FN. Accuracy, sensitivity and
Concluding remarks
Sign recognitions in real-life applications are challenging due to the requirements of accuracy, robustness and efficiency. This project explored the viability of a real-time sign recognition system embedded in an ASL learning application. The proposed system involves the classification of 26 ASL alphabets and 30 selected features for the training of the model. The RNN model is selected since dynamic signs J and Z require the process of sequences of input. The overall accuracy of the model in
CRediT authorship contribution statement
C.K.M. Lee: Conceptualization, Validation, Resources, Supervision, Funding acquisition. Kam K.H. Ng: Conceptualization, Methodology, Resources, Writing - original draft. Chun-Hsien Chen: Conceptualization, Validation, Supervision. H.C.W. Lau: Conceptualization, Validation, Resources. S.Y. Chung: Data curation, Formal analysis, Writing - original draft. Tiffany Tsoi: Data curation, Formal analysis, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors would like to express their gratitude and appreciation to the anonymous reviewers, the editor-in-chief and editors of the journal for providing valuable comments for the continuing improvement of this article. The research was supported by the Hong Kong Polytechnic University, Hong Kong and Nanyang Technological University, Singapore and The University of Western Sydney, Australia.
References (53)
Comparing of deep neural networks and extreme learning machines based on growing and pruning approach
Expert Systems with Applications
(2020)- et al.
OR-Skip-Net: Outer residual skip network for skin segmentation in non-ideal situations
Expert Systems with Applications
(2020) - et al.
Deep neural network based framework for complex correlations in engineering metrics
Advanced Engineering Informatics
(2020) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment
Pattern Recognition Letters
(2007)- et al.
MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences
Expert Systems with Applications
(2020) - et al.
Sensor data reconstruction using bidirectional recurrent neural network with application to bridge monitoring
Advanced Engineering Informatics
(2019) - et al.
Real-time anomaly detection framework using a support vector regression for the safety monitoring of commercial aircraft
Advanced Engineering Informatics
(2020) - et al.
A novel axle temperature forecasting method based on decomposition, reinforcement learning optimization and neural network
Advanced Engineering Informatics
(2020) - et al.
Linguistic properties based on American Sign Language isolated word recognition with artificial neural networks using a sensory glove and motion tracker
Neurocomputing
(2007) - et al.
American Sign Language word recognition with a sensory glove using artificial neural networks
Engineering Applications of Artificial Intelligence
(2011)
Hand sign language recognition using multi-view hand skeleton
Expert Systems with Applications
American Sign Language alphabet recognition using Convolutional Neural Networks with multiview augmentation and inference fusion
Engineering Applications of Artificial Intelligence
SVR-FFS: A novel forward feature selection approach for high-frequency time series forecasting using support vector regression
Expert Systems with Applications
Deep learning-based extraction of construction procedural constraints from construction regulations
Advanced Engineering Informatics
User-independent american sign language alphabet recognition based on depth image and PCANet Features
IEEE Access
Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures
IEEE Transactions on Multimedia
Estimation of the Youden index and its associated cutoff point
Biom. J.
Flexible optical fiber bending transducer for application in glove-based sensors
IEEE Sensors J.
Cited by (71)
Double handed dynamic Turkish Sign Language recognition using Leap Motion with meta learning approach
2023, Expert Systems with ApplicationsA cyber-physical robotic mobile fulfillment system in smart manufacturing: The simulation aspect
2023, Robotics and Computer-Integrated ManufacturingMulti-scale modeling temporal hierarchical attention for sequential recommendation
2023, Information SciencesISL recognition system using integrated mobile-net and transfer learning method
2023, Expert Systems with ApplicationsA review of IoT systems to enable independence for the elderly and disabled individuals
2023, Internet of Things (Netherlands)Citation Excerpt :IoT for sensory and communication/voice challenges attempt to improve the sensory and communication skills of the elderly and the disabled. In this category, we include hearable devices to help listen [182], devices to help people to see via retinal implants [183,184], electrolarynxes augmented with neural networks for people with voice challenges [185,186], mobile devices/wearables to translate in real time sign language (using deep neural networks [187] and other machine learning algorithms [188]). In addition to IoT for communication, some other IoT such as smart gloves and wearable accelerometers connected to games [189,190] have been developed to stimulate the proprioceptive system which is a system in the human body (and other vertebrates/invertebrates) that provides awareness of self-movement and body position It worth noting that research has shown that the proprioceptive system is used by the elderly more than visual and vestibular cues for posture control [191].