skip to main content
research-article

Isolated Arabic Sign Language Recognition Using a Transformer-based Model and Landmark Keypoints

Published: 15 January 2024 Publication History

Abstract

Pose-based approaches for sign language recognition provide light-weight and fast models that can be adopted in real-time applications. This article presents a framework for isolated Arabic sign language recognition using hand and face keypoints. We employed MediaPipe pose estimator for extracting the keypoints of sign gestures in the video stream. Using the extracted keypoints, three models were proposed for sign language recognition: Long-Term Short Memory, Temporal Convolution Networks, and Transformer-based models. Moreover, we investigated the importance of non-manual features for sign language recognition systems and the obtained results showed that combining hand and face keypoints boosted the recognition accuracy by around 4% compared with only hand keypoints. The proposed models were evaluated on Arabic and Argentinian sign languages. Using the KArSL-100 dataset, the proposed pose-based Transformer achieved the highest accuracy of 99.74% and 68.2% in signer-dependent and -independent modes, respectively. Additionally, the Transformer was evaluated on the LSA64 dataset and obtained an accuracy of 98.25% and 91.09% in signer-dependent and -independent modes, respectively. Consequently, the pose-based Transformer outperformed the state-of-the-art techniques on both datasets using keypoints from the signer’s hands and face.

References

[1]
WHO. 2021. World Report On Hearing. Retrieved from https://www.who.int/publications/i/item/world-report-on-hearing.
[2]
Cole Gleason, Stephanie Valencia, Lynn Kirabo, Jason Wu, Anhong Guo, Elizabeth Jeanne Carter, Jeffrey Bigham, Cynthia Bennett, and Amy Pavel. 2020. Disability and the COVID-19 pandemic: Using Twitter to understand accessibility during rapid societal transition. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’20). Virtual Event. DOI:
[3]
Uzma Farooq, Mohd Shafry Mohd Rahim, Nabeel Sabir, Amir Hussain, and Adnan Abid. 2021. Advances in machine translation for sign language: Approaches, limitations, and challenges. Neural Comput. Appl. 33, 21 (2021), 14357–14399. DOI:
[4]
Valentin Belissen, Annelies Braffort, and Michèle Gouiffès. 2020. Experimenting the automatic recognition of non-conventionalized units in sign language. Algorithms 13, 12 (2020), 1–36. DOI:
[5]
Zora Jachova, Olivera Kovacheva, and Aleksandra Karovska. 2008. Differences between American Sign Language (ASL) and British Sign Language (BSL). J. Spec. Educ. Rehab. 1, 2 (2008), 41–54.
[6]
Kinda Al-Fityani and Carol Padden. 2010. Sign Languages in the Arab World. Cambridge University Press, 433–450. DOI:
[7]
Saudi Press Agency. 2007. Issuance of the Unified Arabic Dictionary for Sign Language. Retrieved from https://www.spa.gov.sa/viewstory.php?lang=en&newsid=473792.
[8]
Hamzah Luqman and El Sayed M. El-Alfy. 2021. Towards hybrid multimodal manual and non-manual arabic sign language recognition: Marsl database and pilot study. Electronics (Switz.) 10, 14 (2021), 1–16. DOI:
[9]
M. A. Abdel-Fattah. 2005. Arabic sign language: A perspective. J. Deaf Stud. Deaf Educ. 10, 2 (2005), 212–221. DOI:
[10]
Yousef Bin Sultan Al-Turki. 2017. A study of content analysis of the first and second arabic sign dictionaries for the deaf phonology (phonological) system in arabic deaf sign language. IUG J. Educ. Psychol. Stud. 25, 4 (2017), 284–313. DOI:
[11]
Neena Aloysius and M. Geetha. 2020. Understanding vision-based continuous sign language recognition. Multimedia Tools Appl. 79, 31-32 (2020), 22177–22209. DOI:
[12]
Md Atiqur Rahman Ahad, Masud Ahmed, Anindya Das Antar, Yasushi Makihara, and Yasushi Yagi. 2021. Action recognition using kinematics posture feature on 3D skeleton joint locations. Pattern Recogn. Lett. 145 (2021), 216–224.
[13]
Chuankun Li, Shuai Li, Yanbo Gao, Xiang Zhang, and Wanqing Li. 2021. A two-stream neural network for pose-based hand gesture recognition. IEEE Trans. Cogn. Dev. Syst. 14, 4 (2021), 1594–1603.
[14]
Matyáš Boháček and Marek Hrúz. 2022. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’22) Workshops.
[15]
Prem Selvaraj, Gokul NC, Pratyush Kumar, and Mitesh Khapra. 2021. OpenHands: Making sign language recognition accessible with pose-based pretrained models across languages. arxiv:2110.05877. Retrieved from http://arxiv.org/abs/2110.05877.
[16]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.
[17]
Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann. 2020. Blazepose: On-device real-time body pose tracking. arXiv:2006.10204. Retrieved from https://arxiv.org/abs/2006.10204.
[18]
Jungpil Shin, Akitaka Matsuoka, Md Hasan, Al Mehedi, and Azmain Yakin Srizon. 2021. American sign language alphabet recognition by extracting feature from hand pose estimation. Sensors 21, 17 (2021), 5856.
[19]
Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Izaro Goienetxea, and Basilio Sierra. 2021. Sign language recognition by means of common spatial patterns. In Proceedings of the 5th International Conference on Machine Learning and Soft Computing (ICMLSC’21). Association for Computing Machinery, New York, NY, 96–102. DOI:
[20]
Sreehari Sreenath, D. Ivan Daniels, Apparaju S. D. Ganesh, Yashaswi S. Kuruganti, and Rajeevlochana G. Chittawadigi. 2021. Monocular tracking of human hand on a smart phone camera using mediapipe and its application in robotics. In Proceedings of the IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC’21). 1–6. DOI:
[21]
Google. 2001. Mediapipe Solutions. Retrieved May 21, 2022 from https://google.github.io/mediapipe/solutions/solutions.html.
[22]
Wadood Abdul, Mansour Alsulaiman, Syed Umar Amin, Mohammed Faisal, Ghulam Muhammad, Fahad R. Albogamy, Mohamed A. Bencherif, and Hamid Ghaleb. 2021. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM. Comput. Electr. Eng. 95, (April 2021), 107395. DOI:
[23]
Ala Addin I. Sidig, Hamzah Luqman, Sabri Mahmoud, and Mohamed Mohandes. 2021. KArSL: Arabic sign language database. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 1 (2021), 1–19. DOI:
[24]
Wael Suliman, Mohamed Deriche, Hamzah Luqman, and Mohamed Mohandes. 2021. Arabic sign language recognition using deep machine learning. In Proceedings of the 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT’21). IEEE, 1–4. DOI:
[25]
Saleh Aly and Walaa Aly. 2020. DeepArSLR: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access 8 (2020), 83199–83212. DOI:
[26]
Ala Addin I. Sidig, Hamzah Luqman, and Sabri A. Mahmoud. 2019. Arabic sign language recognition using vision and hand tracking features with HMM. Int. J. Intell. Syst. Technol. Appl. 18, 5 (2019), 430–447. DOI:
[27]
Ala addin I. Sidig and Sabri A. Mahmoud. 2018. Trajectory based Arabic sign language recognition. Int. J. Adv. Comput. Sci. Appl. 9, 4 (2018), 283–291. DOI:
[28]
Mohamed A. Bencherif, Mohammed Algabri, Mohamed A. Mekhtiche, Mohammed Faisal, Mansour Alsulaiman, Hassan Mathkour, Muneer Al-Hammadi, and Hamid Ghaleb. 2021. Arabic sign language recognition system using 2D hands and body skeleton data. IEEE Access 9 (2021), 59612–59627. DOI:
[29]
Mathieu de Coster, Mieke van Herreweghe, and Joni Dambre. 2020. Sign language recognition with transformer networks. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC’20), 6018–6024.
[30]
Razieh Rastgoo, Kourosh Kiani, and Sergio Escalera. 2020. Video-based isolated hand sign language recognition using a deep cascaded model. Multimedia Tools Appl. 79, 31-32 (2020), 22965–22987. DOI:
[31]
Tamer Shanableh, Khaleh Assaleh, and M. Al-Rousan. 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic Sign Language. IEEE Trans. Syst. Man Cybernet. B: Cybernet. 37, 3 (2007), 641–650. DOI:
[32]
Sarfaraz Masood, Adhyan Srivastava, Harish Chandra Thuwal, and Musheer Ahmad. 2018. Real-time Sign Language Gesture (Word) Recognition from Video Sequences using CNN and RNN. Vol. 695. Springer, Singapore. 623–632 pages. DOI:
[33]
Marc Marais, Dane Brown, James Connan, and Alden Boby. 2022. An evaluation of hand-based algorithms for sign language recognition. In Proceedings of the 5th International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD’22). DOI:
[34]
Marc Marais, Dane Brown, James Connan, Alden Boby, and Luxolo Lethukuthula Kuhlane. 2022. Investigating signer-independent sign language recognition on the LSA64 dataset. In Proceedings of the Southern Africa Telecommunication Networks and Applications Conference (SATNAC’22). Fancourt, George, Western Cape, South Africa.
[35]
Jefferson Rodríguez and Fabio Martínez. 2018. Towards on-line sign language recognition using cumulative SD-VLAD descriptors. Commun. Comput. Inf. Sci. 885 (2018), 371–385. DOI:
[36]
Shujun Zhang and Qun Zhang. 2021. Sign language recognition based on global-local attention. J. Vis. Commun. Image Represent. 80, July (2021), 103280. DOI:
[37]
M. Suneetha, Prasad M. V. D., and Kishore P. V. V.2021. Multi-view motion modelled deep attention networks (M2DA-Net) for video based sign language recognition. J. Vis. Commun. Image Represent. 78, (2021), 103161. DOI:
[38]
Saeideh Ghanbari Azar and Hadi Seyedarabi. 2020. Trajectory-based recognition of dynamic Persian sign language using hidden Markov model. Comput. Speech Lang. 61 (2020), 101053. DOI:
[39]
Mariusz Oszust and Jakub Krupski. 2021. Isolated sign language recognition with depth cameras. Proc. Comput. Sci. 192 (2021), 2085–2094. DOI:
[40]
Xiangzu Han, Fei Lu, Jianqin Yin, Guohui Tian, and Jun Liu. 2022. Sign language recognition based on R(2+1)D with spatial—temporal—channel attention. IEEE Trans. Hum.-Mach. Syst. (2022), 1–12. DOI:
[41]
Xiangzu Han, Fei Lu, and Guohui Tian. 2022. Efficient 3D CNNs with knowledge transfer for sign language recognition. Multimedia Tools Appl. 7, 81 (2022), 10071–10090. DOI:
[42]
Hamid Reza Vaezi Joze and Oscar Koller. 2020. MS-ASL: A large-scale data set and benchmark for understanding American sign language. In Proceedings of the 30th British Machine Vision Conference (BMVC’19).
[43]
C. K. M. Lee, Kam K. H. Ng, Chun Hsien Chen, H. C. W. Lau, S. Y. Chung, and Tiffany Tsoi. 2021. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 167, (2021), 114403. DOI:
[44]
Geovane M.Ramos Neto, Geraldo Braz Junior, João Dallyson Sousa de Almeida, and Anselmo Cardoso de Paiva. 2018. Sign language recognition based on 3D convolutional neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10882, 399–407. DOI:
[45]
Mohammad H. Ismail, Shefa A. Dawwd, and Fakhrulddin H. Ali. 2021. Arabic sign language detection using deep learning based pose estimation. In Proceedings of the 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA’21). 161–166. DOI:
[46]
Anirudh Tunga, Sai Vidyaranya Nuthalapati, and Juan Wachs. 2021. Pose-based sign language recognition using GCN and BERT. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW’21), 31–40. DOI:
[47]
Anusorn Chaikaew, Kritsana Somkuan, and Thidalak Yuyen. 2021. Thai sign language recognition: An application of deep neural network. In Proceedings of the Joint 6th International Conference on Digital Arts, Media and Technology with 4th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering (ECTI/DAMT/NCON’21) (2021), 128–131. DOI:
[48]
Qinkun Xiao, Minying Qin, and Yuting Yin. 2020. Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw. 125 (2020), 41–55. DOI:
[49]
Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, and Yun Fu. 2021. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 3408–3418. DOI:
[50]
Zhenxing Zhou, King Shan Lui, Vincent W.L. Tam, and Edmund Y. Lam. 2020. Applying (3+2+1)D residual neural network with frame selection for Hong Kong sign language recognition. In Proceedings of the International Conference on Pattern Recognition, 4296–4302. DOI:
[51]
Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, and Hongdong Li. 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20), 1448–1458. DOI:
[52]
Jordan J. Bird, Anikó Ekárt, and Diego R. Faria. 2020. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors (Switz.) 20, 18 (2020), 1–19. DOI:
[53]
Hezhen Hu, Wengang Zhou, Junfu Pu, and Houqiang Li. 2021. Global-local enhancement network for NMF-aware sign language recognition. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3 (2021), 1–19.
[54]
Lu Meng and Ronghui Li. 2021. An attention-enhanced multi-scale and dual sign language recognition network based on a graph convolution network. Sensors (Switz.) 21, 4 (2021), 1–22. DOI:
[55]
Franco Ronchetti, Facundo Quiroga, and Laura Lanzarini. 2016. LSA64: An argentinian sign language dataset. Congr. Argent. Cienc. Comput. (2016), 794–803.
[56]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (November1997), 1735–1780. DOI:
[57]
Yushi Chen, Lin Zhu, Pedram Ghamisi, Xiuping Jia, Guoyu Li, and Liang Tang. 2017. Hyperspectral images classification with Gabor filtering and convolutional neural network. IEEE Geosci. Remote Sens. Lett. 14, 12 (2017), 2355–2359.
[58]
Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271. Retrieved from https://arxiv.org/abs/1803.01271.
[59]
Yi Zhang, Chong Wang, Ye Zheng, Jieyu Zhao, Yuqi Li, and Xijiong Xie. 2019. Short-term temporal convolutional networks for dynamic hand gesture recognition. arXiv:2001.05833. Retrieved from https://arxiv.org/abs/2001.05833.
[60]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499. Retrieved from https://arxiv.org/abs/1609.03499.
[61]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).
[62]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805.
[63]
Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2021. Transformers in vision: A survey. ACM Comput. Surv. 54, 10s (2021), 1–41.
[64]
Franco Ronchetti, Facundo Quiroga, César Estrebou, Laura Lanzarini, and Alejandro Rosete. 2016. Sign languague recognition without frame-sequencing constraints: A proof of concept on the argentinian sign language. In Ibero-American Conference on Artificial Intelligence. Springer, Berlin, 338–349.

Cited By

View all
  • (2024)Intelligent real-life key-pixel image detection system for early Arabic sign language learnersPeerJ Computer Science10.7717/peerj-cs.206310(e2063)Online publication date: 14-Jun-2024
  • (2024)Isolated sign language recognition through integrating pose data and motion history imagesPeerJ Computer Science10.7717/peerj-cs.205410(e2054)Online publication date: 21-May-2024
  • (2024)Efficient YOLO-Based Deep Learning Model for Arabic Sign Language RecognitionJournal of Disability Research10.57197/JDR-2024-00513:4Online publication date: 7-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 1
January 2024
385 pages
EISSN:2375-4702
DOI:10.1145/3613498
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 January 2024
Online AM: 21 February 2023
Accepted: 17 February 2023
Revised: 25 December 2022
Received: 28 July 2022
Published in TALLIP Volume 23, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sign language recognition
  2. arabic sign language
  3. gesture recognition
  4. pose recognition
  5. TCN
  6. transformer

Qualifiers

  • Research-article

Funding Sources

  • Saudi Data and AI Authority (SDAIA) and King Fahd University of Petroleum and Minerals (KFUPM) under the SDAIA-KFUPM Joint Research Center for Artificial Intelligence

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)639
  • Downloads (Last 6 weeks)70
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Intelligent real-life key-pixel image detection system for early Arabic sign language learnersPeerJ Computer Science10.7717/peerj-cs.206310(e2063)Online publication date: 14-Jun-2024
  • (2024)Isolated sign language recognition through integrating pose data and motion history imagesPeerJ Computer Science10.7717/peerj-cs.205410(e2054)Online publication date: 21-May-2024
  • (2024)Efficient YOLO-Based Deep Learning Model for Arabic Sign Language RecognitionJournal of Disability Research10.57197/JDR-2024-00513:4Online publication date: 7-May-2024
  • (2024)Multi-Stream Isolated Sign Language Recognition Based on Finger Features Derived from Pose DataElectronics10.3390/electronics1308159113:8(1591)Online publication date: 22-Apr-2024
  • (2024)Enhancing Signer-Independent Recognition of Isolated Sign Language through Advanced Deep Learning Techniques and Feature FusionElectronics10.3390/electronics1307118813:7(1188)Online publication date: 24-Mar-2024
  • (2024)Empowering Deaf Community in Healthcare Communication: 1D-CNN-Based Algerian Sign Language Recognition System2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS)10.1109/PAIS62114.2024.10541233(1-7)Online publication date: 24-Apr-2024
  • (2024)Video-based Sign Language Recognition with R(2+1)D and LSTM Networks2024 16th International Conference on Knowledge and Smart Technology (KST)10.1109/KST61284.2024.10499646(214-219)Online publication date: 28-Feb-2024
  • (2024)Deep Learning for Sign Language Recognition Utilizing VGG16 and ResNet50 Models2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS)10.1109/ICSCSS60660.2024.10624743(1355-1359)Online publication date: 10-Jul-2024
  • (2024)Real-time Sign Language Translation using Computer Vision and Machine Learning2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC - ROBINS)10.1109/ICC-ROBINS60238.2024.10533962(703-709)Online publication date: 17-Apr-2024
  • (2024)Sign to Speak: Real-time Recognition for Enhance Communication2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)10.1109/ICAAIC60222.2024.10575697(871-876)Online publication date: 5-Jun-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media