Skip to main content

Multi-level Temporal Relation Graph for Continuous Sign Language Recognition

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13536))

Included in the following conference series:

Abstract

Temporal relation modeling is one of the key points to describe gesture changes in continuous sign language recognition. However, there are many similar gestures in sign language, therefore focusing only on global information can exacerbate the recognition ambiguity caused by various gesture combination. To alleviate this problem, we attempt to achieve the balance between the global information and the local information in gesture changes. Therefore, we construct a multi-level temporal relation graph (MLTRG). Specifically, the multi-level temporal relation graph of the video sequence is established according to different time spans, where the graph nodes are the corresponding visual features. Then the feature fusion and propagation of the multi-level temporal relation graph are performed by a graph convolutional network (GCN). Finally, we can reason and balance the global and the local temporal information of gesture changes in continuous sign language videos. We evaluate our method on the large-scale public datasets RWTH-PHOENIX-Weather-2014 and 2014T, the results prove the advantages and effectiveness of our method.

This work was supported in part by the National Natural Science Foundation of China under Grant 61906135, Grant 62020106004 and Grant 92048301.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. de Amorim, C.C., Macêdo, D., Zanchettin, C.: Spatial-temporal graph convolutional networks for sign language recognition. CoRR abs/1901.11164 (2019), http://arxiv.org/abs/1901.11164

  2. Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7784–7793 (2018). https://doi.org/10.1109/CVPR.2018.00812

  3. Cihan Camgöz, N., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). vol. abs/2003.13830, pp. 10020–10030 (2020). https://doi.org/10.1109/CVPR42600.2020.01004

  4. Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1610–1618 (2017). https://doi.org/10.1109/CVPR.2017.175

  5. Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. CoRR abs/1903.01945 (2019), http://arxiv.org/abs/1903.01945

  6. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning, vol. 2006, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891

  7. Graves, A., et al.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009). https://doi.org/10.1109/TPAMI.2008.137

    Article  Google Scholar 

  8. Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs (2014)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  10. Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. CoRR abs/1801.10111 (2018). http://arxiv.org/abs/1801.10111

  11. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. CoRR abs/1609.02907 (2016). http://arxiv.org/abs/1609.02907

  12. Koller, O., Camgoz, N.C., Ney, H., Bowden, R.: Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2306–2320 (2020). https://doi.org/10.1109/TPAMI.2019.2911077

    Article  Google Scholar 

  13. Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015). https://doi.org/10.1016/j.cviu.2015.09.013

    Article  Google Scholar 

  14. Koller, O., Zargaran, S., Ney, H.: Re-sign: re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3416–3424 (2017). https://doi.org/10.1109/CVPR.2017.364

  15. Li, S.J., AbuFarha, Y., Liu, Y., Cheng, M.M., Gall, J.: MS-TCN++: multi-stage temporal convolutional network for action segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https://doi.org/10.1109/TPAMI.2020.3021756

    Article  Google Scholar 

  16. Li, S., et al.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. CoRR abs/1907.00235 (2019). http://arxiv.org/abs/1907.00235

  17. Niu, Z., Mak, B.: Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 172–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_11

    Chapter  Google Scholar 

  18. Ong, S.C.W., Ranganath, S.: Automatic sign language analysis: a survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 873–891 (2005). https://doi.org/10.1109/tpami.2005.112

    Article  Google Scholar 

  19. Pham, N., Nguyen, T., Niehues, J., Müller, M., Waibel, A.: Very deep self-attention networks for end-to-end speech recognition. CoRR abs/1904.13377 (2019). http://arxiv.org/abs/1904.13377

  20. Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  21. Vaswani, A., et al.:Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  22. Wang, D., Hu, D., Li, X., Dou, D.: Temporal relational modeling with self-supervision for action segmentation. CoRR abs/2012.07508 (2020). https://arxiv.org/abs/2012.07508

  23. Wei, C., Zhao, J., Zhou, W., Li, H.: Semantic boundary detection with reinforcement learning for continuous sign language recognition. IEEE Trans. Circ. Syst. Video Technol. 31, 1138–1149 (2020). https://doi.org/10.1109/TCSVT.2020.2999384

  24. Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9956–9966 (2019). https://doi.org/10.1109/CVPR.2019.01020

  25. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. CoRR abs/1801.07455 (2018). http://arxiv.org/abs/1801.07455

  26. Yang, Z., Shi, Z., Shen, X., Tai, Y.: SF-Net: structured feature network for continuous sign language recognition. CoRR abs/1908.01341 (2019). http://arxiv.org/abs/1908.01341

  27. Yin, F., Chai, X., Chen, X.: Iterative reference driven metric learning for signer independent isolated sign language recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 434–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_27

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanli Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, J., Xue, W., Guo, L., Yuan, T., Chen, S. (2022). Multi-level Temporal Relation Graph for Continuous Sign Language Recognition. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18913-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18912-8

  • Online ISBN: 978-3-031-18913-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics