Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment

Artiaga, Keren; Li, Yang; Kuruoglu, Ercan Engin; Chan, Wai Kin (Victor)

doi:10.1007/s11042-023-16703-0

Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment

1230: Sentient Multimedia Systems and Visual Intelligence
Published: 15 September 2023

Volume 83, pages 37025–37051, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Keren Artiaga ORCID: orcid.org/0000-0002-1233-5700¹,
Yang Li¹,
Ercan Engin Kuruoglu¹ &
…
Wai Kin (Victor) Chan¹

270 Accesses
1 Citation
Explore all metrics

Abstract

Sign language serves as a vital means of communication for individuals with hearing impairments, yet recognition resources for the over 100 distinct sign languages are severely lacking. In response, we present our work on sign language recognition using transfer learning and the domain adaptation method TA3N, which utilizes the Temporal Relational Network (TRN) module for aligning multi-scale temporal relations. Our findings highlight the superior performance of Domain Adaptation to neural network-based transfer learning, particularly in improving recognition of American Sign Language (ASL). Our research also identifies the effectiveness of aligning shorter-term temporal features between source and target domains. In addition to using RGB, we conducted experiments using Optical Flow mode for the sign language samples, ultimately determining that RGB outperforms Optical Flow in the majority of cases. Our work aims to improve accessibility and communication for individuals who rely on sign language as their primary mode of communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fostering inclusivity through effective communication: Real-time sign language to speech conversion system for the deaf and hard-of-hearing community

Article 18 October 2023

Recent progress in sign language recognition: a review

Article 21 October 2023

Efficient Continuous Sign Language Recognition with Temporal Shift and Channel Attention

Availability of data and materials

All data generated or analyzed during this study are included in these published articles [2, 4, 26,27,28] (and its supplementary information files). The subsets we used are detailed in Section 4.1. For additional guidance on extracting the subsets from their originating datasets, please contact the authors.

Code Availability

The codes used for domain adaptation are based on TA3N [24]. Our modification includes setting the batch size to 20, the mode of learning to supervised learning, and the value of num_segments to the N-multiscale TRN. The codes for converting videos into RGB and Optical Flow frames are available from this repository, https://doi.org/10.6084/m9.figshare.20223444 . For additional guidance, please contact the authors.

References

Farnebäck G (2003) Two-Frame Motion Estimation Based on Polynomial Expansion. SCIA 363-370
Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: A Dataset of Argentinian Sign Language. XX II Congreso Argentino de Ciencias de la Computación (CACIC). 794–803
Wang H, Chai X, Hong X, Zhao G, Chen X (2016) Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing 8(4):1–21. https://doi.org/10.1145/2897735
Article Google Scholar
Li D, Rodriguez C, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. The IEEE Winter Conference on Applications of Computer Vision. 1459–1469
Farhadi A, Forsyth D, White R (2007) Transfer Learning in Sign language. IEEE Conference on Computer Vision and Pattern Recognition 2007:1–8. https://doi.org/10.1109/cvpr.2007.383346
Article Google Scholar
Mocialov B, Turner G, Hastie HF (2020) Transfer Learning for British Sign Language Modelling. CoRR abs/2006.02144 https://arxiv.org/abs/2006.02144 https://dblp.org/rec/journals/corr/abs-2006-02144.bib https://dblp.org
Morocho-Cayamcela ME, Lim W (2019) Fine-tuning a pre-trained Convolutional Neural Network Model to translate American Sign Language in Real-time. 2019 International Conference on Computing, Networking and Communications (ICNC), 100–104
Nishat ZK, Shopon M (2020) Unsupervised Pretraining and Transfer Learning-Based Bangla Sign Language Recognition. Proceedings of International Joint Conference on Computational Intelligence Algorithms for Intelligent Systems 529–540. https://doi.org/10.1007/978-981-15-3607-6_42
Rathi D (2018) Optimization of Transfer Learning for Sign Language Recognition Targeting Mobile Platform. Int J Recent Innov Trends Comput Commun 6(4):198–203
Bird JJ, Ekárt A, Faria DR (2020) British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language. Sensors 20:5151
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556
Li D, Opazo CR, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) https://doi.org/10.1109/wacv45572.2020.9093512
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:770–778
Google Scholar
Kocmi T (2020) Exploring Benefits of Transfer Learning in Neural Machine Translation. ArXiv abs/2001.01622
Kocmi T, Bojar O (2018) Trivial Transfer Learning for Low-Resource Neural Machine Translation. WMT
Wang H, Stefan A, Athitsos V (2009) A Similarity Measure for Vision-Based Sign Recognition. HCI
Krishnan R, Sarkar S (2013) Similarity Measure between Two Gestures Using Triplets. IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2013:506–513
Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications abs/1704.04861
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2818–2826
Google Scholar
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning Transferable Architectures for Scalable Image Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8697–8710
Google Scholar
Bragg D, Koller O, Bellard M, Berke L, Boudreault P, Braffort A, Caselli NK, Huenerfauth M, Kacorri H, Verhoef T, Vogler C, Morris M (2019) Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. The 21st International ACM SIGACCESS Conference on Computers and Accessibility
Sevilla-Lara L, Liao Y, Göney F, Jampani V, Geiger A, Black MJ (2018) On the Integration of Optical Flow and Action Recognition. GCPR 281–297
Virk JS, Bathula DR (2021) Domain-Specific, Semi-Supervised Transfer Learning for Medical Imaging. 8th ACM IKDD CODS and 26th COMAD
Chen MH, Kira Z, Al-Regib G, Yoo J, Chen R, Zheng J (2019) Temporal Attentive Alignment for Large-Scale Video Domain Adaptation. IEEE/CVF International Conference on Computer Vision (ICCV) 2019:6320–6329
Google Scholar
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? 1411.1792, arXiv, cs.LG
Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive HMM. IEEE International Conference on Multimedia and Expo (ICME) 2016:1–6. https://doi.org/10.1109/ICME.2016.7552950
Pu J, Zhou W, Li H (2016) Sign Language Recognition with Multi-modal Features. In: PCM 252–261
Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-Based Sign Language Recognition without Temporal Segmentation. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, Louisiana, USA AAAI’18/IAAI’18/EAAI’18, 2257–2264
Kumar A, Thankachan K, Dominic MM (2016) Sign language recognition. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 422–428
Sultani W, Saleemi I (2014) Human Action Recognition across Datasets by Foreground-Weighted Histogram Decomposition. IEEE Conference on Computer Vision and Pattern Recognition 2014:764–771
Google Scholar
Xu T, Zhu F, Wong EK, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137
Article Google Scholar
Jamal A, Namboodiri VP, Deodhare D, Venkatesh KS (2018) Deep Domain Adaptation in Action Space. BMVC
Sahoo A, Shah R, Panda R, Saenko K, Das A (2021) Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan J (eds.) Advances in Neural Information Processing Systems 34:23386–23400
Soomro K, Zamir AR, Shah M (2012) UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. International Conference on Computer Vision 2011:2556–2563. https://doi.org/10.1109/ICCV.2011.6126543
Article Google Scholar
Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal Relational Reasoning in Videos. European Conference on Computer Vision, 831–846
Wang Y, Quanming Y, Tin-Yau Kwok J, Ni LM (2020) Generalizing from a Few Examples. ACM Computing Surveys (CSUR) 53:1–34
Google Scholar
Halvardsson G, Peterson J, Soto-Valero C, Baudry B (2021) Interpretation of Swedish Sign Language using Convolutional Neural Networks and Transfer Learning. SN Computer Science 207. https://doi.org/10.1007/s42979-021-00612-w
Rahman MM, Mdrafi R, Gurbuz AC, Malaia E, Crawford C, Griffin D, Gurbuz SZ (2021) Word-level Sign Language Recognition Using Linguistic Adaptation of 77 GHz FMCW Radar Data, 2021 IEEE Radar Conference (RadarConf21), 1–6 https://doi.org/10.1109/RadarConf2147009.2021.9455190
Abner N, Geraci C, Yu S, Lettieri J, Mertz J, Salgat A (2020) Getting the Upper Hand on Sign Language Families: Historical Analysis and Annotation Methods. FEAST. Formal and Experimental Advances in Sign language Theory. 3:17–29
Article Google Scholar
Vázquez-Enríquez M, Alba-Castro JL, Docío-Fernández L, Rodríguez-Banga E (2021) Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021:3457–3466. https://doi.org/10.1109/CVPRW53098.2021.00385
Zakariah M, Alotaibi YA, Koundal D, Guo Y, Elahi MM (2022) Sign Language Recognition for Arabic Alphabets Using Transfer Learning Technique. Computational Intelligence and Neuroscience, 2022
Shania S, Naufal MF, Prasetyo VR, Azmi MSB (2022) Translator of Indonesian Sign Language Video using Convolutional Neural Network with Transfer Learning. Indones J Inf Syst
Abdullayeva GG, Alishzade NO (2022) Transfer learning for Azerbaijani Sign Language Recognition. Informatics and Control Problems
Thakar S, Shah S, Shah B, Nimkar AV (2022) Sign Language to Text Conversion in Real Time using Transfer Learning. 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) 1–5
Das S, Imtiaz MS, Neom N, Siddique N, Wang H (2022) A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst Appl 213:118914
Jiang X, Hu B, Satapathy SC, Wang S, Zhang Y (2020) Fingerspelling Identification for Chinese Sign Language via AlexNet-Based Transfer Learning and Adam Optimizer. Sci Program 2020:3291426–3291426
Sharma CM, Tomar K, Mishra RK, Chariar VM (2021) Indian Sign Language Recognition Using Fine-tuned Deep Transfer Learning Model. SSRN Electron J
Suharjito, Thiracitta N, Gunawan H (2021) SIBI Sign Language Recognition Using Convolutional Neural Network Combined with Transfer Learning and non-trainable Parameters. Procedia Comput Sci 179:72–80

Download references

Funding

This research was funded by the Shenzhen Science and Technology Innovation Commission (JCYJ20210324135011030), Science and Technology Innovation Committee of Shenzhen-Platform and Carrier (International Science and Technology Information Center), High-end Foreign Expert Talent Introduction Plan (G2021032022L), Guangdong Pearl River Plan (2019QN01X890), and National Natural Science Foundation of China (Grant No. 71971127).

Author information

Authors and Affiliations

Tsinghua-Berkeley Shenzhen Insitute, Tsinghua Shenzhen International Graduate School, Tsinghua University, The University Town, Nanshan District, Shenzhen, 518055, People’s Republic of China
Keren Artiaga, Yang Li, Ercan Engin Kuruoglu & Wai Kin (Victor) Chan

Authors

Keren Artiaga
View author publications
You can also search for this author inPubMed Google Scholar
Yang Li
View author publications
You can also search for this author inPubMed Google Scholar
Ercan Engin Kuruoglu
View author publications
You can also search for this author inPubMed Google Scholar
Wai Kin (Victor) Chan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Keren Artiaga, Yang Li, Ercan Engin Kuruoglu, and Wai Kin (Victor) Chan. The first draft of the manuscript was written by Keren Artiaga and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Keren Artiaga.

Ethics declarations

Conflict of interest/Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Artiaga, K., Li, Y., Kuruoglu, E.E. et al. Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment. Multimed Tools Appl 83, 37025–37051 (2024). https://doi.org/10.1007/s11042-023-16703-0

Download citation

Received: 08 July 2022
Revised: 01 July 2023
Accepted: 27 August 2023
Published: 15 September 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-16703-0

Keywords

Part of a collection:

1230: Sentient Multimedia Systems and Visual Intelligence

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fostering inclusivity through effective communication: Real-time sign language to speech conversion system for the deaf and hard-of-hearing community

Recent progress in sign language recognition: a review

Efficient Continuous Sign Language Recognition with Temporal Shift and Channel Attention

Availability of data and materials

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now