Multi-level Temporal Relation Graph for Continuous Sign Language Recognition

Guo, Jingjing; Xue, Wanli; Guo, Leming; Yuan, Tiantian; Chen, Shengyong

doi:10.1007/978-3-031-18913-5_32

Jingjing Guo¹⁵,
Wanli Xue¹⁵,
Leming Guo¹⁵,
Tiantian Yuan¹⁶ &
…
Shengyong Chen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13536))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1620 Accesses
2 Citations

Abstract

Temporal relation modeling is one of the key points to describe gesture changes in continuous sign language recognition. However, there are many similar gestures in sign language, therefore focusing only on global information can exacerbate the recognition ambiguity caused by various gesture combination. To alleviate this problem, we attempt to achieve the balance between the global information and the local information in gesture changes. Therefore, we construct a multi-level temporal relation graph (MLTRG). Specifically, the multi-level temporal relation graph of the video sequence is established according to different time spans, where the graph nodes are the corresponding visual features. Then the feature fusion and propagation of the multi-level temporal relation graph are performed by a graph convolutional network (GCN). Finally, we can reason and balance the global and the local temporal information of gesture changes in continuous sign language videos. We evaluate our method on the large-scale public datasets RWTH-PHOENIX-Weather-2014 and 2014T, the results prove the advantages and effectiveness of our method.

This work was supported in part by the National Natural Science Foundation of China under Grant 61906135, Grant 62020106004 and Grant 92048301.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

de Amorim, C.C., Macêdo, D., Zanchettin, C.: Spatial-temporal graph convolutional networks for sign language recognition. CoRR abs/1901.11164 (2019), http://arxiv.org/abs/1901.11164
Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7784–7793 (2018). https://doi.org/10.1109/CVPR.2018.00812
Cihan Camgöz, N., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. abs/2003.13830, pp. 10020–10030 (2020). https://doi.org/10.1109/CVPR42600.2020.01004
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1610–1618 (2017). https://doi.org/10.1109/CVPR.2017.175
Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. CoRR abs/1903.01945 (2019), http://arxiv.org/abs/1903.01945
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, vol. 2006, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Graves, A., et al.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009). https://doi.org/10.1109/TPAMI.2008.137
Article Google Scholar
Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. CoRR abs/1801.10111 (2018). http://arxiv.org/abs/1801.10111
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016). http://arxiv.org/abs/1609.02907
Koller, O., Camgoz, N.C., Ney, H., Bowden, R.: Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2306–2320 (2020). https://doi.org/10.1109/TPAMI.2019.2911077
Article Google Scholar
Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015). https://doi.org/10.1016/j.cviu.2015.09.013
Article Google Scholar
Koller, O., Zargaran, S., Ney, H.: Re-sign: re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3416–3424 (2017). https://doi.org/10.1109/CVPR.2017.364
Li, S.J., AbuFarha, Y., Liu, Y., Cheng, M.M., Gall, J.: MS-TCN++: multi-stage temporal convolutional network for action segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3021756
Article Google Scholar
Li, S., et al.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. CoRR abs/1907.00235 (2019). http://arxiv.org/abs/1907.00235
Niu, Z., Mak, B.: Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 172–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_11
Chapter Google Scholar
Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005). https://doi.org/10.1109/tpami.2005.112
Article Google Scholar
Pham, N., Nguyen, T., Niehues, J., Müller, M., Waibel, A.: Very deep self-attention networks for end-to-end speech recognition. CoRR abs/1904.13377 (2019). http://arxiv.org/abs/1904.13377
Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Vaswani, A., et al.:Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Wang, D., Hu, D., Li, X., Dou, D.: Temporal relational modeling with self-supervision for action segmentation. CoRR abs/2012.07508 (2020). https://arxiv.org/abs/2012.07508
Wei, C., Zhao, J., Zhou, W., Li, H.: Semantic boundary detection with reinforcement learning for continuous sign language recognition. IEEE Trans. Circ. Syst. Video Technol. 31, 1138–1149 (2020). https://doi.org/10.1109/TCSVT.2020.2999384
Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9956–9966 (2019). https://doi.org/10.1109/CVPR.2019.01020
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. CoRR abs/1801.07455 (2018). http://arxiv.org/abs/1801.07455
Yang, Z., Shi, Z., Shen, X., Tai, Y.: SF-Net: structured feature network for continuous sign language recognition. CoRR abs/1908.01341 (2019). http://arxiv.org/abs/1908.01341
Yin, F., Chai, X., Chen, X.: Iterative reference driven metric learning for signer independent isolated sign language recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 434–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_27
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, 300384, China
Jingjing Guo, Wanli Xue, Leming Guo & Shengyong Chen
Technical College for the Deaf, Tianjin University of Technology, Tianjin, 300384, China
Tiantian Yuan

Authors

Jingjing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Leming Guo
View author publications
You can also search for this author in PubMed Google Scholar
Tiantian Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shengyong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanli Xue .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, J., Xue, W., Guo, L., Yuan, T., Chen, S. (2022). Multi-level Temporal Relation Graph for Continuous Sign Language Recognition. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-18913-5_32
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18912-8
Online ISBN: 978-3-031-18913-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-level Temporal Relation Graph for Continuous Sign Language Recognition