Abstract
Sign Language is the language that the Deaf adopted to communicate. However, most hearing people do not know how to communicate in sign language, creating natural barriers between these groups. Aiming to reduce such barriers, Sign Language Translation (SLT) interprets sign video sequences into spoken language sentences. A recent SLT model, called TSPNet, explored sign videos’ temporal and contextual semantic structures to learn more discriminative features. Although the TSPNet has reached promising results on the RWTH-PHOENIX-Weather 2014T (PHOENIX14T), the model only considers hand signs, ignoring facial and body information. Facial expressions reflect the extent of signs and play a relevant role in translation. The current work proposes the TSPNet-HandFace (TSPNet-HF), which considers hand and facial features in the SLT process. The proposal has two novel components: facial feature extraction and hand/face feature aggregation to combine hand and facial features. It was assessed on the PHOENIX14T in BLEU and ROUGE, and it was compared to TSPNet and CNN2dRNN. The results showed that the TSPNet-HF overcame the competing methods in all the translation metrics, showing that the inclusion of facial features positively impacts the SLT process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Buehler, P., Zisserman, A., Everingham, M.: Learning sign language by watching tv (using weakly aligned subtitles). In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2961–2968. IEEE (2009)
Camgoz, N.C., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018)
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023–10033 (2020)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Farooq, U., Rahim, M.S.M., Sabir, N., Hussain, A., Abid, A.: Advances in machine translation for sign language: approaches, limitations, and challenges. In: Neural Computing and Applications, pp. 1–43 (2021)
Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)
Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226 (2018)
Li, D., Rodriguez, C., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1459–1469 (2020)
Li, D., et al.: Tspnet: hierarchical feature learning via temporal semantic pyramid for sign language translation. arXiv preprint arXiv:2010.05468 (2020)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)
Pfau, R., Salzmann, M., Steinbach, M.: The syntax of sign language agreement: common ingredients, but unusual recipe. Glossa J. Gener. Linguist. 3(1) (2018)
Pohlert, T.: The pairwise multiple comparison of mean ranks package (pmcmr). R Package 27(2019), 9 (2014)
da Silva, E.P., Costa, P.D.P., Kumada, K.M.O., De Martino, J.M., Florentino, G.A.: Recognition of affective and grammatical facial expressions: a study for Brazilian sign language. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 218–236. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_16
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zimmerman, D.W., Zumbo, B.D.: Relative power of the wilcoxon test, the friedman test, and repeated-measures anova on ranks. J. Exp. Educ. 62(1), 75–86 (1993)
Acknowledgment
Lenovo partially funded this research as part of its R &D investment under Brazil’s Informatics Law. The authors want to acknowledge the support of Lenovo R &D and CESAR Labs.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Miranda, P.B.C. et al. (2022). TSPNet-HF: A Hand/Face TSPNet Method for Sign Language Translation. In: Bicharra Garcia, A.C., Ferro, M., Rodríguez Ribón, J.C. (eds) Advances in Artificial Intelligence – IBERAMIA 2022. IBERAMIA 2022. Lecture Notes in Computer Science(), vol 13788. Springer, Cham. https://doi.org/10.1007/978-3-031-22419-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-22419-5_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22418-8
Online ISBN: 978-3-031-22419-5
eBook Packages: Computer ScienceComputer Science (R0)