skip to main content
research-article

Isolated Arabic Sign Language Recognition Using a Transformer-based Model and Landmark Keypoints

Authors Info & Claims
Published:15 January 2024Publication History
Skip Abstract Section

Abstract

Pose-based approaches for sign language recognition provide light-weight and fast models that can be adopted in real-time applications. This article presents a framework for isolated Arabic sign language recognition using hand and face keypoints. We employed MediaPipe pose estimator for extracting the keypoints of sign gestures in the video stream. Using the extracted keypoints, three models were proposed for sign language recognition: Long-Term Short Memory, Temporal Convolution Networks, and Transformer-based models. Moreover, we investigated the importance of non-manual features for sign language recognition systems and the obtained results showed that combining hand and face keypoints boosted the recognition accuracy by around 4% compared with only hand keypoints. The proposed models were evaluated on Arabic and Argentinian sign languages. Using the KArSL-100 dataset, the proposed pose-based Transformer achieved the highest accuracy of 99.74% and 68.2% in signer-dependent and -independent modes, respectively. Additionally, the Transformer was evaluated on the LSA64 dataset and obtained an accuracy of 98.25% and 91.09% in signer-dependent and -independent modes, respectively. Consequently, the pose-based Transformer outperformed the state-of-the-art techniques on both datasets using keypoints from the signer’s hands and face.

REFERENCES

  1. [1] WHO. 2021. World Report On Hearing. Retrieved from https://www.who.int/publications/i/item/world-report-on-hearing.Google ScholarGoogle Scholar
  2. [2] Gleason Cole, Valencia Stephanie, Kirabo Lynn, Wu Jason, Guo Anhong, Carter Elizabeth Jeanne, Bigham Jeffrey, Bennett Cynthia, and Pavel Amy. 2020. Disability and the COVID-19 pandemic: Using Twitter to understand accessibility during rapid societal transition. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’20). Virtual Event. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Farooq Uzma, Rahim Mohd Shafry Mohd, Sabir Nabeel, Hussain Amir, and Abid Adnan. 2021. Advances in machine translation for sign language: Approaches, limitations, and challenges. Neural Comput. Appl. 33, 21 (2021), 1435714399. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Belissen Valentin, Braffort Annelies, and Gouiffès Michèle. 2020. Experimenting the automatic recognition of non-conventionalized units in sign language. Algorithms 13, 12 (2020), 136. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Jachova Zora, Kovacheva Olivera, and Karovska Aleksandra. 2008. Differences between American Sign Language (ASL) and British Sign Language (BSL). J. Spec. Educ. Rehab. 1, 2 (2008), 4154.Google ScholarGoogle Scholar
  6. [6] Al-Fityani Kinda and Padden Carol. 2010. Sign Languages in the Arab World. Cambridge University Press, 433450. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Agency Saudi Press. 2007. Issuance of the Unified Arabic Dictionary for Sign Language. Retrieved from https://www.spa.gov.sa/viewstory.php?lang=en&newsid=473792.Google ScholarGoogle Scholar
  8. [8] Luqman Hamzah and El-Alfy El Sayed M.. 2021. Towards hybrid multimodal manual and non-manual arabic sign language recognition: Marsl database and pilot study. Electronics (Switz.) 10, 14 (2021), 116. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Abdel-Fattah M. A.. 2005. Arabic sign language: A perspective. J. Deaf Stud. Deaf Educ. 10, 2 (2005), 212221. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Al-Turki Yousef Bin Sultan. 2017. A study of content analysis of the first and second arabic sign dictionaries for the deaf phonology (phonological) system in arabic deaf sign language. IUG J. Educ. Psychol. Stud. 25, 4 (2017), 284313. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Aloysius Neena and Geetha M.. 2020. Understanding vision-based continuous sign language recognition. Multimedia Tools Appl. 79, 31-32 (2020), 2217722209. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Ahad Md Atiqur Rahman, Ahmed Masud, Antar Anindya Das, Makihara Yasushi, and Yagi Yasushi. 2021. Action recognition using kinematics posture feature on 3D skeleton joint locations. Pattern Recogn. Lett. 145 (2021), 216224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Li Chuankun, Li Shuai, Gao Yanbo, Zhang Xiang, and Li Wanqing. 2021. A two-stream neural network for pose-based hand gesture recognition. IEEE Trans. Cogn. Dev. Syst. 14, 4 (2021), 15941603.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Boháček Matyáš and Hrúz Marek. 2022. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’22) Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Selvaraj Prem, NC Gokul, Kumar Pratyush, and Khapra Mitesh. 2021. OpenHands: Making sign language recognition accessible with pose-based pretrained models across languages. arxiv:2110.05877. Retrieved from http://arxiv.org/abs/2110.05877.Google ScholarGoogle Scholar
  16. [16] Cao Zhe, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 72917299.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Bazarevsky Valentin, Grishchenko Ivan, Raveendran Karthik, Zhu Tyler, Zhang Fan, and Grundmann Matthias. 2020. Blazepose: On-device real-time body pose tracking. arXiv:2006.10204. Retrieved from https://arxiv.org/abs/2006.10204.Google ScholarGoogle Scholar
  18. [18] Shin Jungpil, Matsuoka Akitaka, Hasan Md, Mehedi Al, and Srizon Azmain Yakin. 2021. American sign language alphabet recognition by extracting feature from hand pose estimation. Sensors 21, 17 (2021), 5856.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Rodríguez-Moreno Itsaso, Martínez-Otzeta José María, Goienetxea Izaro, and Sierra Basilio. 2021. Sign language recognition by means of common spatial patterns. In Proceedings of the 5th International Conference on Machine Learning and Soft Computing (ICMLSC’21). Association for Computing Machinery, New York, NY, 96102. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Sreenath Sreehari, Daniels D. Ivan, Ganesh Apparaju S. D., Kuruganti Yashaswi S., and Chittawadigi Rajeevlochana G.. 2021. Monocular tracking of human hand on a smart phone camera using mediapipe and its application in robotics. In Proceedings of the IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC’21). 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Google. 2001. Mediapipe Solutions. Retrieved May 21, 2022 from https://google.github.io/mediapipe/solutions/solutions.html.Google ScholarGoogle Scholar
  22. [22] Abdul Wadood, Alsulaiman Mansour, Amin Syed Umar, Faisal Mohammed, Muhammad Ghulam, Albogamy Fahad R., Bencherif Mohamed A., and Ghaleb Hamid. 2021. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM. Comput. Electr. Eng. 95, (April 2021), 107395. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Sidig Ala Addin I., Luqman Hamzah, Mahmoud Sabri, and Mohandes Mohamed. 2021. KArSL: Arabic sign language database. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 1 (2021), 119. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Suliman Wael, Deriche Mohamed, Luqman Hamzah, and Mohandes Mohamed. 2021. Arabic sign language recognition using deep machine learning. In Proceedings of the 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT’21). IEEE, 14. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Aly Saleh and Aly Walaa. 2020. DeepArSLR: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access 8 (2020), 8319983212. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Sidig Ala Addin I., Luqman Hamzah, and Mahmoud Sabri A.. 2019. Arabic sign language recognition using vision and hand tracking features with HMM. Int. J. Intell. Syst. Technol. Appl. 18, 5 (2019), 430447. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Sidig Ala addin I. and Mahmoud Sabri A.. 2018. Trajectory based Arabic sign language recognition. Int. J. Adv. Comput. Sci. Appl. 9, 4 (2018), 283291. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Bencherif Mohamed A., Algabri Mohammed, Mekhtiche Mohamed A., Faisal Mohammed, Alsulaiman Mansour, Mathkour Hassan, Al-Hammadi Muneer, and Ghaleb Hamid. 2021. Arabic sign language recognition system using 2D hands and body skeleton data. IEEE Access 9 (2021), 5961259627. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Coster Mathieu de, Herreweghe Mieke van, and Dambre Joni. 2020. Sign language recognition with transformer networks. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC’20), 60186024. Google ScholarGoogle Scholar
  30. [30] Rastgoo Razieh, Kiani Kourosh, and Escalera Sergio. 2020. Video-based isolated hand sign language recognition using a deep cascaded model. Multimedia Tools Appl. 79, 31-32 (2020), 2296522987. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Shanableh Tamer, Assaleh Khaleh, and Al-Rousan M.. 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic Sign Language. IEEE Trans. Syst. Man Cybernet. B: Cybernet. 37, 3 (2007), 641650. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Masood Sarfaraz, Srivastava Adhyan, Thuwal Harish Chandra, and Ahmad Musheer. 2018. Real-time Sign Language Gesture (Word) Recognition from Video Sequences using CNN and RNN. Vol. 695. Springer, Singapore. 623–632 pages. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Marais Marc, Brown Dane, Connan James, and Boby Alden. 2022. An evaluation of hand-based algorithms for sign language recognition. In Proceedings of the 5th International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD’22). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Marais Marc, Brown Dane, Connan James, Boby Alden, and Kuhlane Luxolo Lethukuthula. 2022. Investigating signer-independent sign language recognition on the LSA64 dataset. In Proceedings of the Southern Africa Telecommunication Networks and Applications Conference (SATNAC’22). Fancourt, George, Western Cape, South Africa.Google ScholarGoogle Scholar
  35. [35] Rodríguez Jefferson and Martínez Fabio. 2018. Towards on-line sign language recognition using cumulative SD-VLAD descriptors. Commun. Comput. Inf. Sci. 885 (2018), 371385. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zhang Shujun and Zhang Qun. 2021. Sign language recognition based on global-local attention. J. Vis. Commun. Image Represent. 80, July (2021), 103280. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Suneetha M., D. Prasad M. V., and V. Kishore P. V.2021. Multi-view motion modelled deep attention networks (M2DA-Net) for video based sign language recognition. J. Vis. Commun. Image Represent. 78, (2021), 103161. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Azar Saeideh Ghanbari and Seyedarabi Hadi. 2020. Trajectory-based recognition of dynamic Persian sign language using hidden Markov model. Comput. Speech Lang. 61 (2020), 101053. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Oszust Mariusz and Krupski Jakub. 2021. Isolated sign language recognition with depth cameras. Proc. Comput. Sci. 192 (2021), 20852094. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Han Xiangzu, Lu Fei, Yin Jianqin, Tian Guohui, and Liu Jun. 2022. Sign language recognition based on R(2+1)D with spatial—temporal—channel attention. IEEE Trans. Hum.-Mach. Syst. (2022), 112. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Han Xiangzu, Lu Fei, and Tian Guohui. 2022. Efficient 3D CNNs with knowledge transfer for sign language recognition. Multimedia Tools Appl. 7, 81 (2022), 1007110090. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Joze Hamid Reza Vaezi and Koller Oscar. 2020. MS-ASL: A large-scale data set and benchmark for understanding American sign language. In Proceedings of the 30th British Machine Vision Conference (BMVC’19).Google ScholarGoogle Scholar
  43. [43] Lee C. K. M., Ng Kam K. H., Chen Chun Hsien, Lau H. C. W., Chung S. Y., and Tsoi Tiffany. 2021. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 167, (2021), 114403. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Neto Geovane M.Ramos, Junior Geraldo Braz, Almeida João Dallyson Sousa de, and Paiva Anselmo Cardoso de. 2018. Sign language recognition based on 3D convolutional neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10882, 399407. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Ismail Mohammad H., Dawwd Shefa A., and Ali Fakhrulddin H.. 2021. Arabic sign language detection using deep learning based pose estimation. In Proceedings of the 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA’21). 161166. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Tunga Anirudh, Nuthalapati Sai Vidyaranya, and Wachs Juan. 2021. Pose-based sign language recognition using GCN and BERT. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW’21), 3140. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Chaikaew Anusorn, Somkuan Kritsana, and Yuyen Thidalak. 2021. Thai sign language recognition: An application of deep neural network. In Proceedings of the Joint 6th International Conference on Digital Arts, Media and Technology with 4th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering (ECTI/DAMT/NCON’21) (2021), 128131. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Xiao Qinkun, Qin Minying, and Yin Yuting. 2020. Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw. 125 (2020), 4155. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Jiang Songyao, Sun Bin, Wang Lichen, Bai Yue, Li Kunpeng, and Fu Yun. 2021. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 34083418. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Zhou Zhenxing, Lui King Shan, Tam Vincent W.L., and Lam Edmund Y.. 2020. Applying (3+2+1)D residual neural network with frame selection for Hong Kong sign language recognition. In Proceedings of the International Conference on Pattern Recognition, 42964302. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Li Dongxu, Opazo Cristian Rodriguez, Yu Xin, and Li Hongdong. 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20), 14481458. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Bird Jordan J., Ekárt Anikó, and Faria Diego R.. 2020. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors (Switz.) 20, 18 (2020), 119. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Hu Hezhen, Zhou Wengang, Pu Junfu, and Li Houqiang. 2021. Global-local enhancement network for NMF-aware sign language recognition. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3 (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Meng Lu and Li Ronghui. 2021. An attention-enhanced multi-scale and dual sign language recognition network based on a graph convolution network. Sensors (Switz.) 21, 4 (2021), 122. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Ronchetti Franco, Quiroga Facundo, and Lanzarini Laura. 2016. LSA64: An argentinian sign language dataset. Congr. Argent. Cienc. Comput. (2016), 794803.Google ScholarGoogle Scholar
  56. [56] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (November1997), 17351780. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Chen Yushi, Zhu Lin, Ghamisi Pedram, Jia Xiuping, Li Guoyu, and Tang Liang. 2017. Hyperspectral images classification with Gabor filtering and convolutional neural network. IEEE Geosci. Remote Sens. Lett. 14, 12 (2017), 23552359.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Bai Shaojie, Kolter J. Zico, and Koltun Vladlen. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271. Retrieved from https://arxiv.org/abs/1803.01271.Google ScholarGoogle Scholar
  59. [59] Zhang Yi, Wang Chong, Zheng Ye, Zhao Jieyu, Li Yuqi, and Xie Xijiong. 2019. Short-term temporal convolutional networks for dynamic hand gesture recognition. arXiv:2001.05833. Retrieved from https://arxiv.org/abs/2001.05833.Google ScholarGoogle Scholar
  60. [60] Oord Aaron van den, Dieleman Sander, Zen Heiga, Simonyan Karen, Vinyals Oriol, Graves Alex, Kalchbrenner Nal, Senior Andrew, and Kavukcuoglu Koray. 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499. Retrieved from https://arxiv.org/abs/1609.03499.Google ScholarGoogle Scholar
  61. [61] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google ScholarGoogle Scholar
  62. [62] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805.Google ScholarGoogle Scholar
  63. [63] Khan Salman, Naseer Muzammal, Hayat Munawar, Zamir Syed Waqas, Khan Fahad Shahbaz, and Shah Mubarak. 2021. Transformers in vision: A survey. ACM Comput. Surv. 54, 10s (2021), 141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Ronchetti Franco, Quiroga Facundo, Estrebou César, Lanzarini Laura, and Rosete Alejandro. 2016. Sign languague recognition without frame-sequencing constraints: A proof of concept on the argentinian sign language. In Ibero-American Conference on Artificial Intelligence. Springer, Berlin, 338349.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Isolated Arabic Sign Language Recognition Using a Transformer-based Model and Landmark Keypoints

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 1
          January 2024
          385 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3613498
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 January 2024
          • Online AM: 21 February 2023
          • Accepted: 17 February 2023
          • Revised: 25 December 2022
          • Received: 28 July 2022
          Published in tallip Volume 23, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text