Skip to main content

TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation

  • Conference paper
  • First Online:
Advances in Artificial Intelligence – IBERAMIA 2022 (IBERAMIA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13788))

Included in the following conference series:

Abstract

Sign Language is the language that the Deaf adopted to communicate. However, most hearing people do not know how to communicate in sign language, creating natural barriers between these groups. Aiming to reduce such barriers, Sign Language Translation (SLT) interprets sign video sequences into spoken language sentences. A recent SLT model, called TSPNet, explored sign videos’ temporal and contextual semantic structures to learn more discriminative features. Although the TSPNet has reached promising results on the RWTH-PHOENIX-Weather 2014T (PHOENIX14T), the model only considers hand signs, ignoring facial and body information. Facial expressions reflect the extent of signs and play a relevant role in translation. The current work proposes the TSPNet-HandFace (TSPNet-HF), which considers hand and facial features in the SLT process. The proposal has two novel components: facial feature extraction and hand/face feature aggregation to combine hand and facial features. It was assessed on the PHOENIX14T in BLEU and ROUGE, and it was compared to TSPNet and CNN2dRNN. The results showed that the TSPNet-HF overcame the competing methods in all the translation metrics, showing that the inclusion of facial features positively impacts the SLT process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://google.github.io/mediapipe/.

  2. 2.

    https://pytorch.org/.

References

  1. Buehler, P., Zisserman, A., Everingham, M.: Learning sign language by watching tv (using weakly aligned subtitles). In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2961–2968. IEEE (2009)

    Google Scholar 

  2. Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018)

    Google Scholar 

  3. Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023–10033 (2020)

    Google Scholar 

  4. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  5. Farooq, U., Rahim, M.S.M., Sabir, N., Hussain, A., Abid, A.: Advances in machine translation for sign language: approaches, limitations, and challenges. In: Neural Computing and Applications, pp. 1–43 (2021)

    Google Scholar 

  6. Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)

  7. Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226 (2018)

  8. Li, D., Rodriguez, C., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1459–1469 (2020)

    Google Scholar 

  9. Li, D., et al.: Tspnet: hierarchical feature learning via temporal semantic pyramid for sign language translation. arXiv preprint arXiv:2010.05468 (2020)

  10. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

  11. Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)

  12. Pfau, R., Salzmann, M., Steinbach, M.: The syntax of sign language agreement: common ingredients, but unusual recipe. Glossa J. Gener. Linguist. 3(1) (2018)

    Google Scholar 

  13. Pohlert, T.: The pairwise multiple comparison of mean ranks package (pmcmr). R Package 27(2019), 9 (2014)

    Google Scholar 

  14. da Silva, E.P., Costa, P.D.P., Kumada, K.M.O., De Martino, J.M., Florentino, G.A.: Recognition of affective and grammatical facial expressions: a study for Brazilian sign language. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 218–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_16

    Chapter  Google Scholar 

  15. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  16. Zimmerman, D.W., Zumbo, B.D.: Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J. Exp. Educ. 62(1), 75–86 (1993)

    Article  Google Scholar 

Download references

Acknowledgment

Lenovo partially funded this research as part of its R &D investment under Brazil’s Informatics Law. The authors want to acknowledge the support of Lenovo R &D and CESAR Labs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Péricles B. C. Miranda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miranda, P.B.C. et al. (2022). TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation. In: Bicharra Garcia, A.C., Ferro, M., Rodríguez Ribón, J.C. (eds) Advances in Artificial Intelligence – IBERAMIA 2022. IBERAMIA 2022. Lecture Notes in Computer Science(), vol 13788. Springer, Cham. https://doi.org/10.1007/978-3-031-22419-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22419-5_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22418-8

  • Online ISBN: 978-3-031-22419-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics