ABSTRACT
We present SL-ReDu, a recently commenced innovative project that aims to exploit deep-learning progress to advance the state-of-the-art in video-based automatic recognition of Greek Sign Language (GSL), while focusing on the use-case of GSL education as a second language. We first briefly overview the project goals, focal areas, and timeline. We then present our initial deep learning-based approach for GSL recognition that employs efficient visual tracking of the signer hands, convolutional neural networks for feature extraction, and attention-based encoder-decoder sequence modeling for sign prediction. Finally, we report experimental results for small-vocabulary, isolated GSL recognition on the single-signer "Polytropon" corpus. To our knowledge, this work constitutes the first application of deep-learning techniques to GSL.
Supplemental Material
- 2019. ELAN (Version 5.8) [Computer software]. Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. https://archive.mpi.nl/tla/elan.Google Scholar
- Epameinondas Antonakos, Vassilis Pitsikalis, and Petros Maragos. 2014. Classification of extreme facial events in sign language videos. EURASIP Journal on Image and Video Processing 14 (2014).Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Computing Research Repository (2014). arXiv:abs/1409.0473v7.Google Scholar
- Kshitij Bantupalli and Ying Xie. 2018. American sign language recognition using deep learning and computer vision. In Proc. IEEE International Conference on Big Data. 4896--4899.Google ScholarCross Ref
- Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowen. 2018. Neural sign language translation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7784--7793.Google ScholarCross Ref
- Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.Google ScholarCross Ref
- Onno Crasborn and Han Sloetjes. 2008. Enhanced ELAN functionality for sign language corpora. In Proc. Workshop on the Representation and Processing of Sign Languages: Construction and Exploitation of Sign Language Corpora. 39--43.Google Scholar
- Maartje De Meulder. 2016. The Power of Language Policy: The Legal Recognition of Sign Languages and the Aspirations of Deaf Communities. Ph.D. Thesis, Faculty of Humanities, University of Juväskylä, Finland.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarCross Ref
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2011), 2121--2159.Google ScholarDigital Library
- Eleni Efthimiou, Kiki Vasilaki, Stavroula-Evita Fotinea, Anna Vacalopoulou, Theodoros Goulas, and Athanasia-Lida Dimou. 2018. The POLYTROPON parallel corpus. In Proc. International Conference on Language Resources and Evaluation (LREC).Google Scholar
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. PMLR 9. 249--256.Google Scholar
- Tobias Haug, Wolfgang Mann, Eveline Boers-Visker, Jessica Contreras, Charlotte Enns, Ros Herman, and Katherine Rowley. 2016. Guidelines for Sign Language Test Development, Evaluation, and Use. Unpublished document (upd. 2018), retrieved from http://www.signlang-assessment.info/.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarCross Ref
- Siming He. 2019. Research of a sign language translation system based on deep learning. In Proc. International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM). 392--396.Google ScholarCross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computing 9 (1997), 1735--1780.Google ScholarDigital Library
- Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3D convolutional neural networks. In Proc. IEEE International Conference on Multimedia and Expo (ICME).Google ScholarCross Ref
- Jong-Min Jeong, Tae-Sung Yoon, and Jin-Bae Park. 2014. Kalman filter based multiple objects detection-tracking algorithm robust to occlusion. In Proc. SICE Annual Conference. 941--946.Google ScholarCross Ref
- Byeongkeun Kang, Subarna Tripathi, and Truong Q. Nguyen. 2015. Real-time sign language fingerspelling recognition using convolutional neural networks from depth map. In Proc. IAPR Asian Conference on Pattern Recognition (ACPR). 136--140.Google Scholar
- Diederik P. Kingma and Jimmy Lei Ba. 2014. Adam: A method for stochastic optimization. Computing Research Repository (2014). arXiv:abs/1412.6980v9.Google Scholar
- Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (2015), 108--125.Google ScholarDigital Library
- Dimitrios Konstantinidis, Kosmas Dimitropoulos, and Petros Daras. 2018. A deep learning approach for analyzing video and skeletal features in sign language recognition. In Proc. IEEE International Conference on Imaging Systems and Techniques (IST).Google ScholarCross Ref
- Ioannis Koulierakis, Georgios Siolas, Eleni Efthimiou, Stavroula-Evita Fotinea, and Andreas-Georgios Stafylopatis. 2019. Gesture recognition using keypoints detection in the context of sign language translation. In Proc. Workshop on Sign Language Translation and Avatar Technologies (SLTAT).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS) 25. 1097--1105.Google Scholar
- Silke Matthes, Thomas Hanke, Anja Regen, Jakob Storz, Satu Worseck, Eleni Efthimiou, Athanasia-Lida Dimou, Annelies Braffort, John Glauert, and Eva Safar. 2012. Dicta-Sign - Building a multilingual sign language corpus. In Proc. Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon.Google Scholar
- Arpit Mittal, Andrew Zisserman, and Philip H. S. Torr. 2011. Hand detection using multiple proposals. In Proc. British Machine Vision Conference (BMVC).Google Scholar
- Jill P. Morford and Martina L. Carlson. 2011. Sign perception and recognition in non-native signers of ASL. Language Learning and Development 7 (2011), 149--168.Google ScholarCross Ref
- Katerina Papadimitriou and Gerasimos Potamianos. 2018. A hybrid approach to hand detection and type classification in upper-body videos. In Proc. European Workshop on Visual Information Processing (EUVIP).Google ScholarCross Ref
- Katerina Papadimitriou and Gerasimos Potamianos. 2019. End-to-end convolutional sequence learning for ASL fingerspelling recognition. In Proc. Annual Conference of the International Speech Communication Association (Interspeech). 2315--2319.Google ScholarCross Ref
- Vassilia N. Pashaloudi and Konstantinos G. Margaritis. 2004. A performance study of a recognition system for Greek sign language alphabet letters. In Proc. International Conference on Speech and Computer (SPECOM). 545--551.Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proc. Neural Information Processing Systems Workshops (NeurIPS-W).Google Scholar
- Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2015. Sign language recognition using convolutional neural networks. In Proc. European Conference on Computer Vision Workshops (ECCVW), Vol. LNCS 8925. 572--578.Google ScholarCross Ref
- G. Anantha Rao, K. Syamala, P. V. V. Kishore, and A. S. C. S. Sastry. 2018. Deep convolutional neural networks for sign language recognition. In Proc. Conference on Signal Processing and Communication Engineering Systems (SPACES). 194--197.Google ScholarCross Ref
- Anastasios Roussos, Stavros Theodorakis, Vassilis Pitsikalis, and Petros Maragos. 2013. Dynamic-affine invariant shape-appearance handshape features and classification in sign language videos. Journal of Machine Learning Research 14 (2013), 1627--1663.Google ScholarDigital Library
- Khamar Basha Shaik, P. Ganesan, V. Kalist, B. S. Sathish, and J. Merlin Mary Jenitha. 2015. Comparative study of skin color detection and segmentation in HSV and YCbCr color space. Procedia Computer Science 57 (2015), 41--48.Google ScholarCross Ref
- Bowen Shi and Karen Livescu. 2017. Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 389--396.Google ScholarCross Ref
- Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, and Karen Livescu. 2018. American sign language fingerspelling recognition in the wild. In Proc. IEEE Spoken Language Technology Workshop (SLT). 145--152.Google ScholarCross Ref
- David H. Smith and Jeffrey E. Davis. 2014. Formative assessment for student progress and program improvement in sign language as L2 programs. In Teaching and Learning Signed Languages, David McKee, Russell S. Rosen, and Rachel McKee (Eds.). Palgrave Macmillan, London, 253--280.Google Scholar
- Wenjin Tao, Ming C. Leu, and Zhaozheng Yin. 2018. American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion. Engineering Applications of Artificial Intelligence 76 (2018), 202--213.Google ScholarCross Ref
- Stavros Theodorakis, Vassilis Pitsikalis, and Petros Maragos. 2014. Dynamic-static unsupervised sequentiality, statistical subunits and lexicon for sign language recognition. Image and Vision Computing 32 (2014), 533--549.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS) 30. 5998--6008.Google Scholar
- Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Zhaoyang Yang, Zhenmei Shi, Xiaoyong Shen, and Yu-Wing Tai. 2019. SF-Net: Structured feature network for continuous sign language recognition. Computing Research Repository (2019). arXiv:abs/1908.01341v1.Google Scholar
Index Terms
- SL-ReDu: greek sign language recognition for educational applications. Project description and early results
Recommendations
Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning
Research in automatic analysis of sign language has largely focused on recognizing the lexical (or citation) form of sign gestures as they appear in continuous signing, and developing algorithms that scale well to large vocabularies. However, successful ...
A review of sign language recognition research
Sign language is the primary way of communication between hard-of-hearing and hearing people. Sign language recognition helps promote the better integration of deaf and hard-of-hearing people into society. We reviewed 95 types of research on sign language ...
CopyCat: Using Sign Language Recognition to Help Deaf Children Acquire Language Skills
CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing SystemsDeaf children born to hearing parents lack continuous access to language, leading to weaker working memory compared to hearing children and deaf children born to Deaf parents. CopyCat is a game where children communicate with the computer via American ...
Comments