Abstract
Sign Language Recognition (SLR) minimizes the communication gap when interacting with hearing impaired people, i.e. connects hearing impaired persons and those who require to communicate and don’t understand SLR. This paper focuses on an end-to-end deep learning approach for the recognition of sign gestures recorded with a 3D sensor (e.g., Microsoft Kinect). Typical machine learning based SLR systems require feature extractions before applying machine learning models. These features need to be chosen carefully as the recognition performance heavily relies on them. Our proposed end-to-end approach eradicates this problem by eliminating the need to extract handmade features. Deep learning models can directly work on raw data and learn higher level representations (features) by themselves. To test our hypothesis, we have used two latest and promising deep learning models, Gated Recurrent Unit (GRU) and Bidirectional Long Short Term Memory (BiLSTM) and trained them using only raw data. We have performed comparative analysis among both models and also with the base paper results. Conducted experiments reflected that proposed method outperforms the existing work, where GRU successfully concluded with 70.78% average accuracy with front view training.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cheng, Q., Mayberry, R.I.: Acquiring a first language in adolescence: the case of basic word order in American sign language. J. Child Lang. 46(2), 214–240 (2019)
Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019)
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078
Cui, Z., Ke, R., Wang, Y.: Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. CoRR abs/1801.02143 (2018). http://arxiv.org/abs/1801.02143
Elsayed, N., Maida, A.S., Bayoumi, M.: Deep gated recurrent and convolutional network hybrid model for univariate time series classification. arXiv preprint arXiv:1812.07683 (2018)
Gangrade, J., Bharti, J.: Real time sign language recognition using depth sensor. Int. J. Comput. Vis. Robot. 9(4), 329–339 (2019)
Ghotkar, A.S., Kharate, G.K.: Dynamic hand gesture recognition and novel sentence interpretation algorithm for Indian sign language using Microsoft kinect sensor. J. Pattern Recogn. Res. 1, 24–38 (2015)
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space Odyssey. arXiv e-prints arXiv:1503.04069, March 2015
Haidong, S., Junsheng, C., Hongkai, J., Yu, Y., Zhantao, W.: Enhanced deep gated recurrent unit and complex wavelet packet energy moment entropy for early fault prognosis of bearing. Knowl.-Based Syst. 188, 105022 (2020). https://doi.org/10.1016/j.knosys.2019.105022. http://www.sciencedirect.com/science/article/pii/S0950705119304289
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015). http://arxiv.org/abs/1508.01991
Kovács, G., Szekrényes, I.: Applying neural network techniques for topic change detection in the HuComTech corpus. In: Hunyadi, L., Szekrényes, I. (eds.) The Temporal Structure of Multimodal Communication. ISRL, vol. 164, pp. 147–162. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-22895-8_8
Kumar, P., Kaur, S.: Sign language generation system based on Indian sign language grammar. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(4), 1–26 (2020)
Kumar, P., Gauba, H., Roy, P.P., Dogra, D.P.: Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recogn. Lett. 86, 1–8 (2017)
Kumar, P., Roy, P.P., Dogra, D.P.: Independent Bayesian classifier combination based sign language recognition using facial expression. Inf. Sci. 428, 30–48 (2018)
Kumar, P., Saini, R., Roy, P.P., Dogra, D.P.: A position and rotation invariant framework for sign language recognition (SLR) using Kinect. Multimedia Tools Appl. 77(7), 8823–8846 (2017). https://doi.org/10.1007/s11042-017-4776-9
Liwicki, M., Graves, A., Fernàndez, S., Bunke, H., Schmidhuber, J.: A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, ICDAR 2007 (2007)
Maaten, L.v.d., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Mehrotra, K., Godbole, A., Belhe, S.: Indian sign language recognition using Kinect sensor. In: Kamel, M., Campilho, A. (eds.) ICIAR 2015. LNCS, vol. 9164, pp. 528–535. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20801-5_59
Rabiner, L.R., Lee, C.H., Juang, B., Wilpon, J.: HMM clustering for connected word recognition. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 405–408. IEEE (1989)
Saini, R., Kumar, P., Kaur, B., Roy, P.P., Dogra, D.P., Santosh, K.: Kinect sensor-based interaction monitoring system using the BLSTM neural network in healthcare. Int. J. Mach. Learn. Cybern. 10(9), 2529–2540 (2019). https://doi.org/10.1007/s13042-018-0887-5
SigOpt: Sigopt hyperparameter optimization. https://sigopt.com/product. Accessed 03 July 2020
Tang, X., Chen, Y., Dai, Y., Xu, J., Peng, D.: A multi-scale convolutional attention based GRU network for text classification. In: 2019 Chinese Automation Congress (CAC), pp. 3009–3013. IEEE (2019)
Tolentino, L.K.S., Juan, R.O.S., Thio-ac, A.C., Pamahoy, M.A.B., Forteza, J.R.R., Garcia, X.J.O.: Static sign language recognition using deep learning. Int. J. Mach. Learn. Comput. 9(6), 821–827 (2019)
Wario, R., Nyaga, C.: A survey of the constraints encountered in dynamic vision-based sign language hand gesture recognition. In: Antona, M., Stephanidis, C. (eds.) HCII 2019. LNCS, vol. 11573, pp. 373–382. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23563-5_30
Wikipedia: Ok gesture. https://en.wikipedia.org/wiki/OK$_$gesture$#$cite$_$note-1. Accessed 04 July 2020
Zeshan, U., Vasishta, M.N., Sethna, M.: Implementation of Indian sign language in educational settings. Asia Pac. Disabil. Rehabil. J. 16(1), 16–40 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rakesh, S., Javed, S., Saini, R., Liwicki, M. (2021). Sign Gesture Recognition from Raw Skeleton Information in 3D Using Deep Learning. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1377. Springer, Singapore. https://doi.org/10.1007/978-981-16-1092-9_16
Download citation
DOI: https://doi.org/10.1007/978-981-16-1092-9_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1091-2
Online ISBN: 978-981-16-1092-9
eBook Packages: Computer ScienceComputer Science (R0)