Abstract
Sign language serves as a vital means of communication for individuals with hearing impairments, yet recognition resources for the over 100 distinct sign languages are severely lacking. In response, we present our work on sign language recognition using transfer learning and the domain adaptation method TA3N, which utilizes the Temporal Relational Network (TRN) module for aligning multi-scale temporal relations. Our findings highlight the superior performance of Domain Adaptation to neural network-based transfer learning, particularly in improving recognition of American Sign Language (ASL). Our research also identifies the effectiveness of aligning shorter-term temporal features between source and target domains. In addition to using RGB, we conducted experiments using Optical Flow mode for the sign language samples, ultimately determining that RGB outperforms Optical Flow in the majority of cases. Our work aims to improve accessibility and communication for individuals who rely on sign language as their primary mode of communication.













Similar content being viewed by others
Availability of data and materials
All data generated or analyzed during this study are included in these published articles [2, 4, 26,27,28] (and its supplementary information files). The subsets we used are detailed in Section 4.1. For additional guidance on extracting the subsets from their originating datasets, please contact the authors.
Code Availability
The codes used for domain adaptation are based on TA3N [24]. Our modification includes setting the batch size to 20, the mode of learning to supervised learning, and the value of num_segments to the N-multiscale TRN. The codes for converting videos into RGB and Optical Flow frames are available from this repository, https://doi.org/10.6084/m9.figshare.20223444 . For additional guidance, please contact the authors.
References
Farnebäck G (2003) Two-Frame Motion Estimation Based on Polynomial Expansion. SCIA 363-370
Ronchetti F, Quiroga F, Estrebou C, Lanzarini L, Rosete A (2016) LSA64: A Dataset of Argentinian Sign Language. XX II Congreso Argentino de Ciencias de la Computación (CACIC). 794–803
Wang H, Chai X, Hong X, Zhao G, Chen X (2016) Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing 8(4):1–21. https://doi.org/10.1145/2897735
Li D, Rodriguez C, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. The IEEE Winter Conference on Applications of Computer Vision. 1459–1469
Farhadi A, Forsyth D, White R (2007) Transfer Learning in Sign language. IEEE Conference on Computer Vision and Pattern Recognition 2007:1–8. https://doi.org/10.1109/cvpr.2007.383346
Mocialov B, Turner G, Hastie HF (2020) Transfer Learning for British Sign Language Modelling. CoRR abs/2006.02144 https://arxiv.org/abs/2006.02144https://dblp.org/rec/journals/corr/abs-2006-02144.bibhttps://dblp.org
Morocho-Cayamcela ME, Lim W (2019) Fine-tuning a pre-trained Convolutional Neural Network Model to translate American Sign Language in Real-time. 2019 International Conference on Computing, Networking and Communications (ICNC), 100–104
Nishat ZK, Shopon M (2020) Unsupervised Pretraining and Transfer Learning-Based Bangla Sign Language Recognition. Proceedings of International Joint Conference on Computational Intelligence Algorithms for Intelligent Systems 529–540. https://doi.org/10.1007/978-981-15-3607-6_42
Rathi D (2018) Optimization of Transfer Learning for Sign Language Recognition Targeting Mobile Platform. Int J Recent Innov Trends Comput Commun 6(4):198–203
Bird JJ, Ekárt A, Faria DR (2020) British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language. Sensors 20:5151
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556
Li D, Opazo CR, Yu X, Li H (2020) Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) https://doi.org/10.1109/wacv45572.2020.9093512
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:770–778
Kocmi T (2020) Exploring Benefits of Transfer Learning in Neural Machine Translation. ArXiv abs/2001.01622
Kocmi T, Bojar O (2018) Trivial Transfer Learning for Low-Resource Neural Machine Translation. WMT
Wang H, Stefan A, Athitsos V (2009) A Similarity Measure for Vision-Based Sign Recognition. HCI
Krishnan R, Sarkar S (2013) Similarity Measure between Two Gestures Using Triplets. IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2013:506–513
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications abs/1704.04861
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:2818–2826
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning Transferable Architectures for Scalable Image Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8697–8710
Bragg D, Koller O, Bellard M, Berke L, Boudreault P, Braffort A, Caselli NK, Huenerfauth M, Kacorri H, Verhoef T, Vogler C, Morris M (2019) Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. The 21st International ACM SIGACCESS Conference on Computers and Accessibility
Sevilla-Lara L, Liao Y, Göney F, Jampani V, Geiger A, Black MJ (2018) On the Integration of Optical Flow and Action Recognition. GCPR 281–297
Virk JS, Bathula DR (2021) Domain-Specific, Semi-Supervised Transfer Learning for Medical Imaging. 8th ACM IKDD CODS and 26th COMAD
Chen MH, Kira Z, Al-Regib G, Yoo J, Chen R, Zheng J (2019) Temporal Attentive Alignment for Large-Scale Video Domain Adaptation. IEEE/CVF International Conference on Computer Vision (ICCV) 2019:6320–6329
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? 1411.1792, arXiv, cs.LG
Zhang J, Zhou W, Xie C, Pu J, Li H (2016) Chinese sign language recognition with adaptive HMM. IEEE International Conference on Multimedia and Expo (ICME) 2016:1–6. https://doi.org/10.1109/ICME.2016.7552950
Pu J, Zhou W, Li H (2016) Sign Language Recognition with Multi-modal Features. In: PCM 252–261
Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-Based Sign Language Recognition without Temporal Segmentation. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. New Orleans, Louisiana, USA AAAI’18/IAAI’18/EAAI’18, 2257–2264
Kumar A, Thankachan K, Dominic MM (2016) Sign language recognition. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 422–428
Sultani W, Saleemi I (2014) Human Action Recognition across Datasets by Foreground-Weighted Histogram Decomposition. IEEE Conference on Computer Vision and Pattern Recognition 2014:764–771
Xu T, Zhu F, Wong EK, Fang Y (2016) Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition. Image Vis Comput 55:127–137
Jamal A, Namboodiri VP, Deodhare D, Venkatesh KS (2018) Deep Domain Adaptation in Action Space. BMVC
Sahoo A, Shah R, Panda R, Saenko K, Das A (2021) Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan J (eds.) Advances in Neural Information Processing Systems 34:23386–23400
Soomro K, Zamir AR, Shah M (2012) UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: A large video database for human motion recognition. International Conference on Computer Vision 2011:2556–2563. https://doi.org/10.1109/ICCV.2011.6126543
Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal Relational Reasoning in Videos. European Conference on Computer Vision, 831–846
Wang Y, Quanming Y, Tin-Yau Kwok J, Ni LM (2020) Generalizing from a Few Examples. ACM Computing Surveys (CSUR) 53:1–34
Halvardsson G, Peterson J, Soto-Valero C, Baudry B (2021) Interpretation of Swedish Sign Language using Convolutional Neural Networks and Transfer Learning. SN Computer Science 207. https://doi.org/10.1007/s42979-021-00612-w
Rahman MM, Mdrafi R, Gurbuz AC, Malaia E, Crawford C, Griffin D, Gurbuz SZ (2021) Word-level Sign Language Recognition Using Linguistic Adaptation of 77 GHz FMCW Radar Data, 2021 IEEE Radar Conference (RadarConf21), 1–6 https://doi.org/10.1109/RadarConf2147009.2021.9455190
Abner N, Geraci C, Yu S, Lettieri J, Mertz J, Salgat A (2020) Getting the Upper Hand on Sign Language Families: Historical Analysis and Annotation Methods. FEAST. Formal and Experimental Advances in Sign language Theory. 3:17–29
Vázquez-Enríquez M, Alba-Castro JL, Docío-Fernández L, Rodríguez-Banga E (2021) Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021:3457–3466. https://doi.org/10.1109/CVPRW53098.2021.00385
Zakariah M, Alotaibi YA, Koundal D, Guo Y, Elahi MM (2022) Sign Language Recognition for Arabic Alphabets Using Transfer Learning Technique. Computational Intelligence and Neuroscience, 2022
Shania S, Naufal MF, Prasetyo VR, Azmi MSB (2022) Translator of Indonesian Sign Language Video using Convolutional Neural Network with Transfer Learning. Indones J Inf Syst
Abdullayeva GG, Alishzade NO (2022) Transfer learning for Azerbaijani Sign Language Recognition. Informatics and Control Problems
Thakar S, Shah S, Shah B, Nimkar AV (2022) Sign Language to Text Conversion in Real Time using Transfer Learning. 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) 1–5
Das S, Imtiaz MS, Neom N, Siddique N, Wang H (2022) A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst Appl 213:118914
Jiang X, Hu B, Satapathy SC, Wang S, Zhang Y (2020) Fingerspelling Identification for Chinese Sign Language via AlexNet-Based Transfer Learning and Adam Optimizer. Sci Program 2020:3291426–3291426
Sharma CM, Tomar K, Mishra RK, Chariar VM (2021) Indian Sign Language Recognition Using Fine-tuned Deep Transfer Learning Model. SSRN Electron J
Suharjito, Thiracitta N, Gunawan H (2021) SIBI Sign Language Recognition Using Convolutional Neural Network Combined with Transfer Learning and non-trainable Parameters. Procedia Comput Sci 179:72–80
Funding
This research was funded by the Shenzhen Science and Technology Innovation Commission (JCYJ20210324135011030), Science and Technology Innovation Committee of Shenzhen-Platform and Carrier (International Science and Technology Information Center), High-end Foreign Expert Talent Introduction Plan (G2021032022L), Guangdong Pearl River Plan (2019QN01X890), and National Natural Science Foundation of China (Grant No. 71971127).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation, data collection, and analysis were performed by Keren Artiaga, Yang Li, Ercan Engin Kuruoglu, and Wai Kin (Victor) Chan. The first draft of the manuscript was written by Keren Artiaga and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Artiaga, K., Li, Y., Kuruoglu, E.E. et al. Cross-Sign Language Transfer Learning Using Domain Adaptation with Multi-scale Temporal Alignment. Multimed Tools Appl 83, 37025–37051 (2024). https://doi.org/10.1007/s11042-023-16703-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16703-0