Understanding Motion in Sign Language: A New Structured Translation Dataset

Rodríguez, Jefferson; Chacón, Juan; Rangel, Edgar; Guayacán, Luis; Hernández, Claudia; Hernández, Luisa; Martínez, Fabio

doi:10.1007/978-3-030-69544-6_40

Jefferson Rodríguez ORCID: orcid.org/0000-0003-2394-5683^12,13,
Juan Chacón^12,13,
Edgar Rangel^12,13,
Luis Guayacán^12,13,
Claudia Hernández¹²,
Luisa Hernández¹² &
…
Fabio Martínez^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12627))

Included in the following conference series:

Asian Conference on Computer Vision

745 Accesses
1 Citations

Abstract

Sign languages are the main mechanism of communication and interaction in the Deaf community. These languages are highly variable in communication with divergences between gloss representation, sign configuration, and multiple variants, among others, due to cultural and regional aspects. Current methods for automatic and continuous sign translation include robust and deep-learning models that encode the visual signs representation. Despite the significant progress, the convergence of such models requires huge amounts of data to exploit sign representation, resulting in very complex models. This fact is associated to the highest variability but also to the shortage exploration of many language components that support communication. For instance, gesture motion and grammatical structure are fundamental components in communication, which can deal with visual and geometrical sign misinterpretations during video analysis. This work introduces a new Colombian sign language translation dataset (CoL-SLTD), that focuses on motion and structural information, and could be a significant resource to determine the contribution of several language components. Additionally, an encoder-decoder deep strategy is herein introduced to support automatic translation, including attention modules that capture short, long, and structural kinematic dependencies and their respective relationships with sign recognition. The evaluation in CoL-SLTD proves the relevance of the motion representation, allowing compact deep architectures to represent the translation. Also, the proposed strategy shows promising results in translation, achieving Bleu-4 scores of 35.81 and 4.65 in signer independent and unseen sentences tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For training, Adam optimizer was selected with a learning rate of 0.0001 and decay of 0.1 every 10 epochs. Also, batches of 1 sample and a dropout of 0.2 in dense and recurrent layers were herein configured. The convolutional weight decay was set to 0.0005 and gradient clipping with a threshold of 5 was also used.

References

WM Centre: Deafness and hearing loss (2020) Visited 28 April 2020
Google Scholar
WM Centre: Our work (2020) Visited 28 April 2020
Google Scholar
Joze, H.R.V., Koller, O.: Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053 (2018)
Li, D., Rodriguez, C., Yu, X., Li, H.: Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1459–1469 (2020)
Google Scholar
Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015)
Article Google Scholar
Cihan Camgoz, N., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018)
Google Scholar
Ko, S.K., Kim, C.J., Jung, H., Cho, C.: Neural sign language translation based on human keypoint estimation. arXiv preprint arXiv:1811.11436 (2018)
Guo, D., Zhou, W., Li, A., Li, H., Wang, M.: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation. IEEE Trans. Image Process. 29, 1575–1590 (2019)
Article MathSciNet Google Scholar
Athitsos, V., Neidle, C., Sclaroff, S., Nash, J., Stefan, A., Yuan, Q., Thangali, A.: The american sign language lexicon video dataset. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp. 1–8 (2008)
Google Scholar
Ronchetti, F., Quiroga, F., Estrebou, C., Lanzarini, L., Rosete, A.: Lsa64: a dataset of argentinian sign language. In: XX II Congreso Argentino de Ciencias de la Computación (CACIC) (2016)
Google Scholar
Von Agris, U., Kraiss, K.F.: Towards a video corpus for signer-independent continuous sign language recognition. In: Gesture in Human-Computer Interaction and Simulation, Lisbon, May 2007
Google Scholar
Forster, J., Schmidt, C., Hoyoux, T., Koller, O., Zelle, U., Piater, J.H., Ney, H.: Rwth-phoenix-weather: a large vocabulary sign language recognition and translation corpus. LREC 9, 3785–3789 (2012)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp. 3104–3112 (2014)
Google Scholar
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical lstm for sign language translation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Guo, D., Wang, S., Tian, Q., Wang, M.: Dense temporal convolution network for sign language translation. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, AAAI Press, pp. 744–750 (2019)
Google Scholar
Song, P., Guo, D., Xin, H., Wang, M.: Parallel temporal encoder for sign language translation. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp. 1915–1919 (2019)
Google Scholar
Wei, C., Zhou, W., Pu, J., Li, H.: Deep grammatical multi-classifier for continuous sign language recognition. In: 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), IEEE, pp. 435–442 (2019)
Google Scholar
Martínez, A.M., Wilbur, R.B., Shay, R., Kak, A.C.: Purdue rvl-slll asl database for automatic recognition of american sign language. In: Proceedings. Fourth IEEE International Conference on Multimodal Interfaces, IEEE, pp. 167–172 (2002)
Google Scholar
Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., Ney, H.: Speech recognition techniques for a sign language recognition system. In: Interspeech, Antwerp, Belgium, pp. 2513–2516 (2007). ISCA best student paper award Interspeech 2007
Google Scholar
Stokoe, W.C.: Sign language structure. Annu. Rev. Anthropol. 9, 365–390 (1980)
Article Google Scholar
Sandler, W.: The phonological organization of sign languages. Lang. Linguist. compass 6, 162–182 (2012)
Article Google Scholar
Supalla, T.: The classifier system in american sign language. Noun Classes Categorization 7, 181–214 (1986)
Article Google Scholar
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2010)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for computational linguistics, Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Out, T.S.B. (ed.) Barcelona, pp. 74–81. Association for Computational Linguistics, Spain (2004)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Google Scholar

Download references

Acknowledgments

This work was partially funded by the Universidad Industrial de Santander. The authors acknowledge the Vicerrectoriá de Investigación y Extensión (VIE) of the Universidad Industrial de Santander for supporting this research registered by the project: Reconocimiento continuo de expresiones cortas del lenguaje de señas, with SIVIE code 1293. Also, we gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research.

Author information

Authors and Affiliations

Universidad Industrial de Santander (UIS), Bucaramanga, Colombia
Jefferson Rodríguez, Juan Chacón, Edgar Rangel, Luis Guayacán, Claudia Hernández, Luisa Hernández & Fabio Martínez
Biomedical Imaging, Vision and Learning Laboratory (BivL2ab), Bucaramanga, Colombia
Jefferson Rodríguez, Juan Chacón, Edgar Rangel, Luis Guayacán & Fabio Martínez

Authors

Jefferson Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Juan Chacón
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Rangel
View author publications
You can also search for this author in PubMed Google Scholar
Luis Guayacán
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabio Martínez .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, J. et al. (2021). Understanding Motion in Sign Language: A New Structured Translation Dataset. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12627. Springer, Cham. https://doi.org/10.1007/978-3-030-69544-6_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-69544-6_40
Published: 26 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69543-9
Online ISBN: 978-3-030-69544-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics