Deep motion templates and extreme learning machine for sign language recognition

Imran, Javed; Raman, Balasubramanian

doi:10.1007/s00371-019-01725-3

Deep motion templates and extreme learning machine for sign language recognition

Original Article
Published: 25 July 2019

Volume 36, pages 1233–1246, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Javed Imran¹ &
Balasubramanian Raman¹

1097 Accesses
38 Citations
Explore all metrics

Abstract

Sign language is a visual language used by persons with hearing and speech impairment to communicate through fingerspellings and body gestures. This paper proposes a framework for automatic sign language recognition without the need of hand segmentation. The proposed method first generates three different types of motion templates: motion history image, dynamic image and our proposed RGB motion image. These motion templates are used to fine-tune three ConvNets trained on ImageNet dataset. Fine-tuning avoids learning all the parameters from scratch, leading to faster network convergence even with a small number of training samples. For combining the output of three ConvNets, we propose a fusion technique based on Kernel-based extreme learning machine (KELM). The features extracted from the last fully connected layer of trained ConvNets are used to train three KELMs, and the final class label is predicted by averaging their scores. The proposed approach is validated on a number of publicly available sign language as well as human action recognition datasets, and state-of-the-art results are achieved. Finally, an Indian sign language dataset is also collected using a thermal camera. The experimental results obtained show that our ConvNet-based deep features along with proposed KELM-based fusion are robust for any type of human motion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated recognition of Myanmar sign language using deep learning module

Article 02 January 2024

Understanding vision-based continuous sign language recognition

Article 17 May 2020

Sign Language Recognition with Multi-modal Features

Notes

This dataset can be downloaded from our website: https://www.iitr.ac.in/mvlab/research.html.

References

Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Article Google Scholar
Baraldi, L., Paci, F., Serra, G., Benini, L., Cucchiara, R.: Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: IEEE CVPR (2014)
Bauer, B., Kraiss, K.F.: Video-based sign recognition using self-organizing subunits. In: IEEE ICPR, pp. 434–437 (2002)
Bi, L., Feng, D., Kim, J.: Dual-path adversarial learning for fully convolutional network (fcn)-based medical image segmentation. Vis. Comput. 34, 1–10 (2018)
Article Google Scholar
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: IEEE CVPR, pp. 3034–3042 (2016)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE CVPR, pp. 4724–4733 (2017)
Chen, C., Liu, K., Kehtarnavaz, N.: Real-time human action recognition based on depth motion maps. J. Real-Time Image Process. 23, 1–9 (2013)
Google Scholar
Chen, C., Jafari, R., Kehtarnavaz, N.: Action recognition from depth sequences using depth motion maps-based local binary patterns. In: WACV, IEEE, pp. 1092–1099 (2015)
Choi, H., Park, H.: A hierarchical structure for gesture recognition using RGB-D sensor. In: ACM HAI, pp. 265–268 (2014)
Cirujeda, P., Binefa, X.: 4DCov: A nested covariance descriptor of spatio-temporal features for gesture recognition in depth sequences. In: IEEE 3DV, vol. 1, pp 657–664 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE CVPR, pp. 2625–2634 (2015)
Dreuw, P., Deselaers, T., Keysers, D., Ney, H.: Modeling image variability in appearance-based gesture recognition. In: ECCVW, pp. 7–18 (2006)
Duan, J., Zhou, S., Wan, J., Guo, X., Li, S.Z.: Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition (2016). arXiv preprint arXiv:1611.06689
Escalera, S., Baró, X., Gonzalez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce-López, V., Escalante, H.J., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: dataset and results. In: ECCVW, pp. 459–473 (2014)
Feichtenhofer, C., Pinz, A., Zisserman: Convolutional two-stream network fusion for video action recognition. In: IEEE CVPR (2016)
Gao, Z., Zhang, H., Liu, A.A., Xue, Yb, Xu, Gp: Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning. KSII Trans. Internet Inf. Syst. (TIIS) 8(2), 483–503 (2014)
Article Google Scholar
Gao, Z., Zhang, H., Xu, G., Xue, Y.: Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition. Neurocomputing 151, 554–564 (2015)
Article Google Scholar
Gao, Z., Zhang, H., Liu, A.A., Xu, G., Xue, Y.: Human action recognition on depth dataset. Neural Comput. Appl. 27(7), 2047–2054 (2016)
Article Google Scholar
Geng, L., Ma, X., Wang, H., Gu, J., Li, Y.: Chinese sign language recognition with 3D hand motion trajectories and depth images. In: IEEE WCICA, pp. 1457–1461 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE CVPR, pp. 580–587 (2014)
Gogić, I., Manhart, M., Pandžić, I.S., Ahlberg, J.: Fast facial expression recognition using local binary features and shallow neural networks. Vis. Comput. 1–16 (2018). https://doi.org/10.1007/s00371-018-1585-8
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR, pp. 770–778 (2016)
Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: Activitynet: a large-scale video benchmark for human activity understanding. In: IEEE CVPR, pp. 961–970 (2015)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. (2017). arXiv preprint arXiv:1704.04861
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: IEEE ICPR, vol. 1, p. 3 (2017)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Article Google Scholar
Huang, G.B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B Cybern. 42(2), 513–529 (2012)
Article Google Scholar
Imran, J., Kumar, P.: Human action recognition using RGB-D sensor and deep convolutional neural networks. In: IEEE ICACCI, pp. 144–148 (2016)
Jiang, T., Zhang, Z., Yang, Y.: Modeling coverage with semantic embedding for image caption generation. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1565-z
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Li, X., Huang, H., Zhao, H., Wang, Y., Hu, M.: Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1582-y
Article Google Scholar
Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., Chen, H.M.: Human action recognition and retrieval using sole depth information. In: ACMMM, pp. 1053–1056 (2012)
Liu, H., Tian, L., Liu, M., Tang, H.: SDM-BSM: a fusing depth scheme for human action recognition. In: IEEE ICIP, pp. 4674–4678 (2015)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. IJCAI 1, 1493–1500 (2013)
Google Scholar
Liu, M., Liu, H.: Depth context: a new descriptor for human activity recognition by using sole depth sequences. Neurocomputing 175, 747–758 (2016)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR, pp. 3431–3440 (2015)
Ma, C., Wang, A., Chen, G., Xu, C.: Hand joints-based gesture recognition for noisy dataset using nested interval unscented Kalman filter with LSTM network. Vis. Comput. 34(6–8), 1053–1063 (2018)
Article Google Scholar
Nishida, N., Nakayama, H.: Multimodal gesture recognition using multi-stream recurrent neural network. In: Springer PSIVT, pp. 682–694 (2015)
Pigou, L., Dieleman, S., Kindermans, P.J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: ECCVW, pp. 572–578 (2014)
Ronchetti, F., Quiroga, F., Estrebou, C.A., Lanzarini, L.C., Rosete, A.: LSA64: an Argentinian sign language dataset. In: CACIC (2016)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: A large scale dataset for 3D human activity analysis. In: IEEE CVPR, pp. 1010–1019 (2016)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014a)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-image recognition. (2014b). arXiv preprint arXiv:1409.1556
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE CVPR, pp. 1–9 (2015)
Tanwar, V.K., Buckchash, H., Raman, B., Bhargava, R.: Dense motion analysis of German finger spellings. Multimed. Tools Appl. 78, 1–26 (2018)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE CVPR, pp. 4489–4497 (2015)
Tung, P.T., Ngoc, L.Q.: Elliptical density shape model for hand gesture recognition. In: ACM SoICT, pp. 186–191 (2014)
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1510–1517 (2017)
Article Google Scholar
Wan, J., Zhao, Y., Zhou, S., Guyon, I., Escalera, S., Li, S.Z.: Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: IEEE CVPRW, pp. 56–64 (2016)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE CVPR, pp. 3169–3176 (2011)
Wang, P., Li, W., Gao, Z., Tang, C., Zhang, J., Ogunbona, P.: Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring. In: ACMMM, pp. 1119–1122 (2015)
Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., Ogunbona, P.O.: Action recognition from depth maps using deep convolutional neural networks. IEEE Trans. Hum.–Mach. Syst. 46(4), 498–509 (2016a)
Article Google Scholar
Wang, P., Li, W., Liu, S., Gao, Z., Tang, C., Ogunbona, P.: Large-scale isolated gesture recognition using convolutional neural networks. In: IEEE ICPR, pp. 7–12 (2016b)
Yu, Z., Liu, Q., Liu, G.: Deeper cascaded peak-piloted network for weak expression recognition. Vis. Comput. 34, 1–9 (2017)
Google Scholar
Zheng, J., Feng, Z., Xu, C., Hu, J., Ge, W.: Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition. Multimed. Tools Appl. 76(20), 20525–20544 (2017)
Article Google Scholar
Zhou, F., Hu, Y., Shen, X.: MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition. Vis. Comput. 1–12 (2018)
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)
Article Google Scholar

Download references

Funding

This project was funded by SMILE Project, IIT Roorkee.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, India
Javed Imran & Balasubramanian Raman

Authors

Javed Imran
View author publications
You can also search for this author in PubMed Google Scholar
Balasubramanian Raman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javed Imran.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imran, J., Raman, B. Deep motion templates and extreme learning machine for sign language recognition. Vis Comput 36, 1233–1246 (2020). https://doi.org/10.1007/s00371-019-01725-3

Download citation

Published: 25 July 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00371-019-01725-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep motion templates and extreme learning machine for sign language recognition

Abstract

Access this article

Similar content being viewed by others

Automated recognition of Myanmar sign language using deep learning module

Understanding vision-based continuous sign language recognition

Sign Language Recognition with Multi-modal Features

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep motion templates and extreme learning machine for sign language recognition

Abstract

Access this article

Similar content being viewed by others

Automated recognition of Myanmar sign language using deep learning module

Understanding vision-based continuous sign language recognition

Sign Language Recognition with Multi-modal Features

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation