research-article

Towards Real-Time Sign Language Recognition and Translation on Edge Devices

Authors:
Shiwei Gan

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China

0000-0003-3360-4321
View Profile

,
Yafeng Yin

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China

0000-0002-9497-6244
View Profile

,
Zhiwei Jiang

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China

0000-0001-5243-4992
View Profile

,
Lei Xie

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China

0000-0002-2994-6743
View Profile

,
Sanglu Lu

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China

0000-0003-1467-4519
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 4502–4512https://doi.org/10.1145/3581783.3611820

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 4502–4512

ABSTRACT

To provide instant communication for hearing-impaired people, it is essential to achieve real-time sign language processing anytime anywhere. Therefore, in this paper, we propose a Region-aware Temporal Graph based neural Network (RTG-Net), aiming to achieve real-time Sign Language Recognition (SLR) and Translation (SLT) on edge devices. To reduce the computation overhead, we first construct a shallow graph convolution network to reduce model size by decreasing model depth. Besides, we apply structural re-parameterization to fuse the convolutional layer, batch normalization layer and all branches to simplify model complexity by reducing model width. To achieve the high performance in sign language processing as well, we extract key regions based on keypoints in skeleton from each frame, and design a region-aware temporal graph to combine key regions and full frame for feature representation. In RTG-Net, we design a multi-stage training strategy to optimize keypoint selection, SLR and SLT step by step. Experimental results demonstrate that RTG-Net achieves comparable performance with existing methods in SLR or SLT, while greatly reducing the computation overhead and achieving real-time sign language processing on edge devices. Our code is available at https://github.com/SignLanguageCode/realtimeSLRT.

References

Kshitij Bantupalli and Ying Xie. 2018. American sign language recognition using deep learning and computer vision. In 2018 Big Data. IEEE, 4896--4899.Google Scholar
Jan Bungeroth and Hermann Ney. 2004. Statistical sign language translation. In Workshop on representation and processing of sign languages, LREC, Vol. 4. Citeseer, 105--108.Google Scholar
Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, and Richard Bowden. 2017. Subunets: End-to-end hand shape and continuous sign language recognition. In ICCV. IEEE, 3075--3084.Google Scholar
N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden. 2018. Neural Sign Language Translation. In CVPR. 7784--7793. https://doi.org/10.1109/CVPR.2018.00812Google ScholarCross Ref
Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020a. Multi-channel Transformers for Multi-articulatory Sign Language Translation. arXiv preprint arXiv:2009.00299 (2020).Google Scholar
Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020b. Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. In CVPR. 10023--10033.Google Scholar
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. 7291--7299.Google Scholar
Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, and Stephen Lin. 2022a. A simple multi-modality transfer learning baseline for sign language translation. In CVPR. 5120--5130.Google Scholar
Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, and Brian Mak. 2022b. Two-stream network for sign language recognition and translation. Advances in Neural Information Processing Systems, Vol. 35 (2022), 17043--17056.Google Scholar
Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, and Yu-Wing Tai. 2020. Fully convolutional networks for continuous sign language recognition. In European Conference on Computer Vision. Springer, 697--714.Google ScholarDigital Library
Runpeng Cui, Hu Liu, and Changshui Zhang. 2017. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7361--7369.Google ScholarCross Ref
Runpeng Cui, Hu Liu, and Changshui Zhang. 2019. A deep neural framework for continuous sign language recognition by iterative training. MM, Vol. 21, 7 (2019), 1880--1891.Google Scholar
Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, and Jian Sun. 2021. Repvgg: Making vgg-style convnets great again. In CVPR. 13733--13742.Google Scholar
Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Lei Xie, and Sanglu Lu. 2021. Skeleton-Aware Neural Sign Language Translation. In ACM MM. 4353--4361.Google Scholar
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML. 369--376.Google Scholar
K. Grobel and M. Assan. 1997. Isolated sign language recognition using hidden Markov models. In SMC, Vol. 1. 162--167 vol.1. https://doi.org/10.1109/ICSMC.1997.625742Google ScholarCross Ref
Dan Guo, Wengang Zhou, Anyang Li, Houqiang Li, and Meng Wang. 2019. Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. TIP, Vol. 29 (2019), 1575--1590.Google Scholar
Dan Guo, Wengang Zhou, Meng Wang, and Houqiang Li. 2016. Sign language recognition based on adaptive hmms with data augmentation. In ICIP. IEEE, 2876--2880.Google Scholar
Aiming Hao, Yuecong Min, and Xilin Chen. 2021. Self-Mutual Distillation Learning for Continuous Sign Language Recognition. In ICCV. 11303--11312.Google Scholar
Hezhen Hu, Wengang Zhou, and Houqiang Li. 2021. Hand-Model-Aware Sign Language Recognition. In AAAI, Vol. 35. 1558--1566.Google ScholarCross Ref
Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2018a. Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 29, 9 (2018), 2822--2832.Google ScholarDigital Library
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018b. Video-based sign language recognition without temporal segmentation. In AAAI.Google Scholar
Jichao Kan, Kun Hu, Markus Hagenbuchner, Ah Chung Tsoi, Mohammed Bennamoun, and Zhiyong Wang. 2022. Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3367--3376.Google ScholarCross Ref
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
Oscar Koller, Cihan Camgoz, Hermann Ney, and Richard Bowden. 2019. Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. TPAMI (2019).Google Scholar
Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. CVIU, Vol. 141 (Dec. 2015), 108--125.Google ScholarDigital Library
Oscar Koller, Hermann Ney, and Richard Bowden. 2016a. Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In CVPR. 3793--3802.Google Scholar
Oscar Koller, O Zargaran, Hermann Ney, and Richard Bowden. 2016b. Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In BMVC.Google Scholar
Oscar Koller, Sepehr Zargaran, and Hermann Ney. 2017. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs. In CVPR. 4297--4305.Google Scholar
Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Benjamin Swift, Hanna Suominen, and Hongdong Li. 2020b. TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation. In NIPS, Vol. 33.Google Scholar
Haibo Li, Liqing Gao, Ruize Han, Liang Wan, and Wei Feng. 2020a. Key action and joint ctc-attention based sign language recognition. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2348--2352.Google ScholarCross Ref
Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI conference on artificial intelligence.Google ScholarCross Ref
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74--81.Google Scholar
Tao Liu, Wengang Zhou, and Houqiang Li. 2016. Sign language recognition with long short-term memory. In ICIP. IEEE, 2871--2875.Google Scholar
Yuecong Min, Aiming Hao, Xiujuan Chai, and Xilin Chen. 2021. Visual alignment constraint for continuous sign language recognition. In ICCV. 11542--11551.Google Scholar
Zhe Niu and Brian Mak. 2020. Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In European Conference on Computer Vision. Springer, 172--186.Google ScholarDigital Library
Kenta Oono and Taiji Suzuki. 2019. Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947 (2019).Google Scholar
Alptekin Orbay and Lale Akarun. 2020. Neural sign language translation by learning tokenization. In FG 2020. IEEE, 222--228.Google ScholarDigital Library
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. 311--318.Google Scholar
Junfu Pu, Wengang Zhou, Hezhen Hu, and Houqiang Li. 2020. Boosting Continuous Sign Language Recognition via Cross Modality Augmentation. In MM. 1497--1505.Google Scholar
Junfu Pu, Wengang Zhou, and Houqiang Li. 2018. Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition.. In IJCAI, Vol. 3. 7.Google Scholar
Junfu Pu, Wengang Zhou, and Houqiang Li. 2019. Iterative alignment network for continuous sign language recognition. In CVPR. 4165--4174.Google Scholar
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In CVPR. 5693--5703.Google Scholar
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.Google Scholar
Andreas Veit, Michael J Wilber, and Serge Belongie. 2016. Residual networks behave like ensembles of relatively shallow networks. NIPS, Vol. 29 (2016), 550--558.Google Scholar
Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated sign language recognition with grassmann covariance matrices. TACCESS, Vol. 8, 4 (2016), 1--21.Google ScholarDigital Library
Chengcheng Wei, Jian Zhao, Wengang Zhou, and Houqiang Li. 2020. Semantic Boundary Detection with Reinforcement Learning for Continuous Sign Language Recognition. TCSVT, Vol. 31, 3 (2020), 1138--1149.Google Scholar
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.Google Scholar
Aoxiong Yin, Zhou Zhao, Jinglin Liu, Weike Jin, Meng Zhang, Xingshan Zeng, and Xiaofei He. 2021b. SimulSLT: End-to-End Simultaneous Sign Language Translation. In MM. 4118--4127.Google Scholar
Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, and Malihe Alikhani. 2021a. Including signed languages in natural language processing. arXiv preprint arXiv:2105.05222 (2021).Google Scholar
Kayo Yin and Jesse Read. 2020. Better sign language translation with STMC-transformer. In COLING. 5975--5989.Google Scholar
Jihai Zhang, Wengang Zhou, and Houqiang Li. 2014. A threshold-based hmm-dtw approach for continuous sign language recognition. In ICIMCS. 237--240.Google Scholar
Hao Zhou, Wengang Zhou, and Houqiang Li. 2019. Dynamic pseudo label decoding for continuous sign language recognition. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1282--1287.Google ScholarCross Ref
Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, and Houqiang Li. 2021a. Improving Sign Language Translation with Monolingual Data by Sign Back-Translation. In CVPR. 1316--1325.Google Scholar
Hao Zhou, Wengang Zhou, Yun Zhou, and Houqiang Li. 2020. Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition.. In AAAI. 13009--13016.Google Scholar
Hao Zhou, Wengang Zhou, Yun Zhou, and Houqiang Li. 2021b. Spatial-temporal multi-cue network for sign language recognition and translation. TMC (2021).Google Scholar
Ronglai Zuo and Brian Mak. 2022. C2SLR: Consistency-Enhanced Continuous Sign Language Recognition. In CVPR. 5131--5140.Google Scholar

Index Terms

Towards Real-Time Sign Language Recognition and Translation on Edge Devices
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Learning Methods for Sign Language Translation
Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for ...
Read More
A machine translation system from Arabic sign language to Arabic
Abstract
Arabic sign language (ArSL) is one of the sign languages that is used in Arab countries. This language has structure and grammar that differ from spoken Arabic. Available ArSL recognition systems perform direct mapping between the recognized sign ...
Read More
A Machine Translation System from Indian Sign Language to English Text

Sign language recognition and translation is a crucial step towards improving communication between the deaf and the rest of the society. According to the Indian Sign Language Research and Training Centre (ISLRTC), India has around 300 certified human ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
edge device
real-time
sign language recognition
sign language translation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 184
  Total Downloads
- Downloads (Last 12 months)184
- Downloads (Last 6 weeks)34
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Real-Time Sign Language Recognition and Translation on Edge Devices

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep Learning Methods for Sign Language Translation

A machine translation system from Arabic sign language to Arabic

A Machine Translation System from Indian Sign Language to English Text