American Sign Language word recognition with a sensory glove using artificial neural networks

doi:10.1016/j.engappai.2011.06.015

Engineering Applications of Artificial Intelligence

Volume 24, Issue 7, October 2011, Pages 1204-1213

https://doi.org/10.1016/j.engappai.2011.06.015 Get rights and content

Abstract

An American Sign Language (ASL) recognition system is being developed using artificial neural networks (ANNs) to translate ASL words into English. The system uses a sensory glove called the Cyberglove™ and a Flock of Birds^® 3-D motion tracker to extract the gesture features. The data regarding finger joint angles obtained from strain gauges in the sensory glove define the hand shape, while the data from the tracker describe the trajectory of hand movements. The data from these devices are processed by a velocity network with noise reduction and feature extraction and by a word recognition network. Some global and local features are extracted for each ASL word. A neural network is used as a classifier of this feature vector. Our goal is to continuously recognize ASL signs using these devices in real time. We trained and tested the ANN model for 50 ASL words with a different number of samples for every word. The test results show that our feature vector extraction method and neural networks can be used successfully for isolated word recognition. This system is flexible and open for future extension.

Highlights

► We developed an American Sign Language word recognition system based on artificial neural networks. ► We used the histograms of feature vectors to design a constant dimension model. ► Increasing training data will increase the recognition accuracy of the system.

Introduction

Sign language, which is a highly visual–spatial, linguistically complete and natural language, is the main mode of communication among deaf people. However, deaf people still experience serious problems communicating with people who hear normally, almost all of whom do not understand sign language systems such as American Sign Language (ASL). This communication barrier affects deaf people's lives and relationships negatively. Deaf people usually communicate with hearing people either through interpreters or text writing. Although interpreters can facilitate communication between deaf persons and hearing persons, they are often expensive, and their involvement leads to a loss of independence and privacy. While writing is used by many deaf people to communicate with hearing people, it is very inconvenient while walking, standing at a distance, or when more than two people are involved in a conversation.

Sign language is not universal. Different countries have different sign languages, for example, American Sign Language (ASL) and German Sign Language (GSL) have different alphabets and word sets. The similarities among signs in a sign language are created by complex body movements, i.e., using the right hand, the left hand, or both. When signs are created using both hands, the right hand is more active than the left hand. Sign language speakers also support their signs with their heads, eyes, and facial expressions.

Many researchers have been working on the recognition of various sign languages and gestures, but this research poses major difficulties due to the complexity of hand and body movements in sign language expression. Sign language recognition research can be categorized into three major classes: (i) computer-vision based, (ii) data-glove and motion-sensor based, and (iii) a combination of these two methods. Computer-vision based ASL recognition relies on image processing and feature extraction techniques for capturing and classifying body movements and handshapes when a deaf person makes an ASL sign. On the other hand, data-glove and motion-tracker based ASL recognition methods use a sensory glove and a motion tracker for detecting handshapes and body movements. The third method includes a combination of techniques from these two methods (Oz et al., 2004).

Acquiring data is more difficult with the vision-based method than with the data-glove and motion-sensor based methods. Data can be collected efficiently through a 3-D vision system, which has multiple cameras and a fast frame grabber. This system requires complicated image processing methods, which demand more data and slow the recognition rate. The main advantage of this approach is that the user does not need to wear any uncomfortable devices. Additionally, facial expressions can be incorporated. In the data-glove and motion-sensor based systems, the signer has to wear a glove and sensor devices that measure the physical features of the gesture, e.g., trajectory, angles, motion, and finger bending.

In the earliest linguistic description of ASL, Wilbur (1987) used a structural linguistic framework to analyze sign formation. His purpose was to develop a national system for writing signs that contained symbols for each individual hand shape, location, and movement. Stokoe (1978) analyzed ASL formation and suggested additions to the three basic building blocks of hand shape, location, and movement. One major parameter, orientation of the palm, was suggested by Battison (1978). These and similar ASL studies have furthered the research of other ASL recognition scientists.

In parallel to the advancements in sensor and computer technology, some successful computer-vision based sign language recognition systems have been developed. Earlier sign language recognition research appeared in the literature at the beginning of the 1990s. Charayaphan and Marble (1992) developed an image-based processing system to understand ASL using hand motions. Takahashi and Kishino (1991) used a range classifier to recognize 46 Japanese Kana manual alphabet with a VPL data glove. Their study was based on simply encoding data ranges for joint angles and hand orientation.

Since 1990, Artificial Neural Networks have been used widely for solving engineering and industrial problems. Because of the popularity of ANNs, sign language researchers have applied this algorithm to solve their problems. Kramer and Leifer developed an ASL finger spelling system using a Cyberglove with the use of a neural network for feature classification and sign recognition (Kramer and Leifer, 1990, Kramer, 1996). Murakami and Taguchi (1991) established a recurrent neural network method to recognize 110 distinct Japanese Sign Language signs. Waldron and Kim (1995) used a neural network method to recognize 14 ASL signs using a different network for hand shape and hand orientation and position. Wysoski et al., (2002) developed an image-based ASL recognition system with a neural network for 26 static postures. Allen et al. (2003) developed an ASL finger spelling recognition system, which could recognize 24 ASL letters with a neural network. Wang et al. (2004) designed an ASL gesture recognition system with a sensory glove using an ANN, a Hidden Markov Model (HMM), and a minimum distance classifier. Oz and Leu (2007) designed an ASL recognition system based on linguistic properties with a sensory glove using an ANN.

The Hidden Markov Model, which has a well-founded mathematic basis and is an efficient doubly stochastic process, has been used widely in speech recognition, text recognition, and other engineering problem solving (Rabiner, 1989, Takiguchi et al., 2001). Many ASL researchers achieved successful results using HMM. Vogler and Metaxas, 1997, Vogler and Metaxas, 1999 used HMM for continuous ASL recognition with video streaming. In 1997, they were able to recognize 53 signs and a completely unconstrained sentence structure. In 1999, they were able to recognize ASL sentences with 22 signs based on ASL phonemes. Grobel and Assan (1996) used HMMs to recognize isolated signs based on computer vision with the signers wearing colored, normal gloves. Their accuracy was 91.3% for 262 signs.

There are also some Human Computer Interaction (HCI) studies based on human gestures. Lee and Xu (1996) used HMM to recognize the ASL alphabet for a human–robot interface. Lee et al. (2000) developed a hand gesture recognition system with human–computer interaction. Stergiopolou and Papamarkos (2009) used a neural network with a shape fitting filter to recognize hand gestures.

In this study, we present an ASL word recognition system that is constructed to translate ASL signs into the corresponding English words with an ANN method. A reliable adaptive filtering system with a recurrent neural network is used to determine the duration of ASL signing. All parameters affect the accuracy of feature vector extraction. The histogram method is used to extract features from ASL signs. Based on these features, a word recognition neural network is used as a classifier to convert ASL signs into English words. The developed system is capable of recognizing all 50 ASL words used in the testing.

Section snippets

System hardware and software

One of the primary means by which we physically connect to the world is through our hands. We perform most of our everyday tasks with them; however, along with our hands, we also rely on devices such as a mouse, keyboard, and joystick to work with computers and computer applications. Glove-based input devices could overcome this limitation (Sturman and Zelter, 1994). Commercial devices such as the VPL data glove and the Mattel power glove have led to an explosion of research and development

Determine the duration of ASL signing

Commonly used English words are represented by signs in ASL (Sternberg, 1994). The signs are expressed by hand movements over a period of time. The data collected from the Cyberglove™ and Flock of Birds^® are input to the word recognition system. Each time a person signs an ASL word, the data gathered consist of hand position, wrist rotation, and finger bending. Let x_i, y_i, and z_i be the x, y, and z coordinates stored in the ith sampling cycle. A change in hand direction from the previous cycle

Data collection and feature extraction

The data collection and feature recognition operations consist of the following stages: first, the duration of the ASL word signing is identified; second, the data are collected and passed through a filter; and third, features are extracted from the data. As explained previously, the first filter, which is online, identifies the ASL word. The second filter, which is offline, reduces noise from the sensors' data. Position and orientation data are filtered using the following filter: $y (t) = \frac{1}{9} [u (t - 2)$

Artificial neural network model

A backpropagation algorithm is used for training the ANN model. The basic structure and formulation of backpropagation is summarized here. Training a neural network involves computing weights so as to get an output response to the input within an error limit. The input and target vectors make up a training pair. The backpropagation algorithm includes the following steps (Lippman, 1987):

1.
Select the first training pair and apply the input vector to the net.
2.
Calculate the net output.
3.
Compare the

Design of the ASL word recognition system using ANN

The ASL speaking space is the hemisphere region in front of the speaker. When signs are created with both hands, the right hand is often more active than the left hand. Speakers also support their signs with their heads, eyes, and facial expressions. In the present paper, we study right-hand words only.

Some ASL words used in the training set for the ANN are given in Table 2. When we look at the definition, we see that each word begins with a hand shape in a start position and continues with

Test results

Two ASL word recognition tests were developed, one with single-user data and the other with multi-user data. In both tests, the ASL recognition system trained with three, six, and twelve samples of data with 50 words. At the testing stage, real-time data were used. In total, 300 ASL signs (6×50) in the training set were used for the test. Both the single-user model and the multiple-user model were tested in two different ways, sequentially and randomly. The sequential test started with the word

Conclusions

The development and evaluation of an ASL word recognition system was described in this paper. The data from a Cyberglove™ sensory glove and a Flock of Birds^® 3-D motion tracker were processed by a velocity network with noise reduction and feature extraction and by a word recognition network for the purpose of ASL recognition. Some global and local features were extracted for every ASL word. Neural networks were used to classify these feature vectors. The system was trained and tested for single

Acknowledgments

This research is partially supported by a National Science Foundation award (DMI-0079404) and a Ford Foundation grant, as well as by the Intelligent Systems Center at the Missouri University of Science and Technology in the United States.

References (30)

C. Charayaphan et al.
Image processing system for interpreting motion in American Sign Language
J. Biomed. Eng.
(1992)
C. Oz et al.
Linguistic properties based on American Sign Language recognition with artificial neural network using a sensory glove and motion tracker
Neurocomputing
(2007)
Abulafya, N., 1995. Neural Networks for System Identification and Control, Ms. Thesis, University of...
Allen, M.J., Asselin, P.K., Foulds, R., 2003. American Sign Language finger spelling recognition system. In:...
R. Battison
Lexical Borrowing in American Sign Language
(1978)
Grobel, K., Assan, M., 1996. Isolated sign language recognition using hidden Markov Models. In: Proceedings of the...
Kramer, J., Leifer, L.J., 1990. A ‘Talking Glove’ for Nonverbal Deaf Individual. Technical Report CDR TR 1990 0312,...
Kramer, J., 1996. The Talking-Glove: Hand-Gesture-to-Speech Using an Instrumented Glove and a Tree-Structured Neural...
Lee, C., Xu, Y., 1996. Online, interactive learning of gestures for human/robot interface. Proceedings of the IEEE...
L.K. Lee et al.
Recognition of hand gesture to human–computer interaction
IEEE
(2000)

R.P. Lippman

An introduction to computing with neural nets

IEEE ASSP

(1987)

Murakami, K., Taguchi, H., 1991. Gesture recognition using recurrent neural networks. In: CHI'91 Conference...

K.S. Narendra

Adaptive control using neural networks

Oz, C., Sarawate N.N., Leu, M.C., 2004. American Sign Language Word Recognition with a Sensory Glove Using Artificial...

C. Oz et al.

Vehicle License Plate recognition with Artificial Neural Network

(2003)

Cited by (0)

View full text