TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation

Miranda, Péricles B. C.; Casadei, Vitor; Silva, Emely; Silva, Jayne; Alves, Manoel; Severo, Marianna; Freitas, João Paulo

doi:10.1007/978-3-031-22419-5_26

Péricles B. C. Miranda¹⁰,
Vitor Casadei¹¹,
Emely Silva¹²,
Jayne Silva¹¹,
Manoel Alves¹¹,
Marianna Severo¹¹ &
…
João Paulo Freitas¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13788))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

588 Accesses
1 Citations

Abstract

Sign Language is the language that the Deaf adopted to communicate. However, most hearing people do not know how to communicate in sign language, creating natural barriers between these groups. Aiming to reduce such barriers, Sign Language Translation (SLT) interprets sign video sequences into spoken language sentences. A recent SLT model, called TSPNet, explored sign videos’ temporal and contextual semantic structures to learn more discriminative features. Although the TSPNet has reached promising results on the RWTH-PHOENIX-Weather 2014T (PHOENIX14T), the model only considers hand signs, ignoring facial and body information. Facial expressions reflect the extent of signs and play a relevant role in translation. The current work proposes the TSPNet-HandFace (TSPNet-HF), which considers hand and facial features in the SLT process. The proposal has two novel components: facial feature extraction and hand/face feature aggregation to combine hand and facial features. It was assessed on the PHOENIX14T in BLEU and ROUGE, and it was compared to TSPNet and CNN2dRNN. The results showed that the TSPNet-HF overcame the competing methods in all the translation metrics, showing that the inclusion of facial features positively impacts the SLT process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Buehler, P., Zisserman, A., Everingham, M.: Learning sign language by watching tv (using weakly aligned subtitles). In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2961–2968. IEEE (2009)
Google Scholar
Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018)
Google Scholar
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023–10033 (2020)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Farooq, U., Rahim, M.S.M., Sabir, N., Hussain, A., Abid, A.: Advances in machine translation for sign language: approaches, limitations, and challenges. In: Neural Computing and Applications, pp. 1–43 (2021)
Google Scholar
Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)
Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226 (2018)
Li, D., Rodriguez, C., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1459–1469 (2020)
Google Scholar
Li, D., et al.: Tspnet: hierarchical feature learning via temporal semantic pyramid for sign language translation. arXiv preprint arXiv:2010.05468 (2020)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)
Pfau, R., Salzmann, M., Steinbach, M.: The syntax of sign language agreement: common ingredients, but unusual recipe. Glossa J. Gener. Linguist. 3(1) (2018)
Google Scholar
Pohlert, T.: The pairwise multiple comparison of mean ranks package (pmcmr). R Package 27(2019), 9 (2014)
Google Scholar
da Silva, E.P., Costa, P.D.P., Kumada, K.M.O., De Martino, J.M., Florentino, G.A.: Recognition of affective and grammatical facial expressions: a study for Brazilian sign language. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 218–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_16
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Zimmerman, D.W., Zumbo, B.D.: Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J. Exp. Educ. 62(1), 75–86 (1993)
Article Google Scholar

Download references

Acknowledgment

Lenovo partially funded this research as part of its R &D investment under Brazil’s Informatics Law. The authors want to acknowledge the support of Lenovo R &D and CESAR Labs.

Author information

Authors and Affiliations

Universidade Federal Rural de Pernambuco, Recife, Brazil
Péricles B. C. Miranda
CESAR, Recife, Brazil
Vitor Casadei, Jayne Silva, Manoel Alves, Marianna Severo & João Paulo Freitas
Universidade Estadual de Campinas, Campinas, Brazil
Emely Silva

Authors

Péricles B. C. Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Vitor Casadei
View author publications
You can also search for this author in PubMed Google Scholar
Emely Silva
View author publications
You can also search for this author in PubMed Google Scholar
Jayne Silva
View author publications
You can also search for this author in PubMed Google Scholar
Manoel Alves
View author publications
You can also search for this author in PubMed Google Scholar
Marianna Severo
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo Freitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Péricles B. C. Miranda .

Editor information

Editors and Affiliations

Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Ana Cristina Bicharra Garcia
Fluminense Federal University, Niterói, Brazil
Mariza Ferro
University of Cartagena, Cartagena de Indias, Colombia
Julio Cesar Rodríguez Ribón

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miranda, P.B.C. et al. (2022). TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation. In: Bicharra Garcia, A.C., Ferro, M., Rodríguez Ribón, J.C. (eds) Advances in Artificial Intelligence – IBERAMIA 2022. IBERAMIA 2022. Lecture Notes in Computer Science(), vol 13788. Springer, Cham. https://doi.org/10.1007/978-3-031-22419-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-22419-5_26
Published: 04 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22418-8
Online ISBN: 978-3-031-22419-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation