Abstract
Pose-based approaches for sign language recognition provide light-weight and fast models that can be adopted in real-time applications. This article presents a framework for isolated Arabic sign language recognition using hand and face keypoints. We employed MediaPipe pose estimator for extracting the keypoints of sign gestures in the video stream. Using the extracted keypoints, three models were proposed for sign language recognition: Long-Term Short Memory, Temporal Convolution Networks, and Transformer-based models. Moreover, we investigated the importance of non-manual features for sign language recognition systems and the obtained results showed that combining hand and face keypoints boosted the recognition accuracy by around 4% compared with only hand keypoints. The proposed models were evaluated on Arabic and Argentinian sign languages. Using the KArSL-100 dataset, the proposed pose-based Transformer achieved the highest accuracy of 99.74% and 68.2% in signer-dependent and -independent modes, respectively. Additionally, the Transformer was evaluated on the LSA64 dataset and obtained an accuracy of 98.25% and 91.09% in signer-dependent and -independent modes, respectively. Consequently, the pose-based Transformer outperformed the state-of-the-art techniques on both datasets using keypoints from the signer’s hands and face.
- [1] . 2021. World Report On Hearing. Retrieved from https://www.who.int/publications/i/item/world-report-on-hearing.Google Scholar
- [2] . 2020. Disability and the COVID-19 pandemic: Using Twitter to understand accessibility during rapid societal transition. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS’20). Virtual Event.
DOI: Google ScholarDigital Library - [3] . 2021. Advances in machine translation for sign language: Approaches, limitations, and challenges. Neural Comput. Appl. 33, 21 (2021), 14357–14399.
DOI: Google ScholarDigital Library - [4] . 2020. Experimenting the automatic recognition of non-conventionalized units in sign language. Algorithms 13, 12 (2020), 1–36.
DOI: Google ScholarCross Ref - [5] . 2008. Differences between American Sign Language (ASL) and British Sign Language (BSL). J. Spec. Educ. Rehab. 1, 2 (2008), 41–54.Google Scholar
- [6] . 2010. Sign Languages in the Arab World. Cambridge University Press, 433–450.
DOI: Google ScholarCross Ref - [7] . 2007. Issuance of the Unified Arabic Dictionary for Sign Language. Retrieved from https://www.spa.gov.sa/viewstory.php?lang=en&newsid=473792.Google Scholar
- [8] . 2021. Towards hybrid multimodal manual and non-manual arabic sign language recognition: Marsl database and pilot study. Electronics (Switz.) 10, 14 (2021), 1–16.
DOI: Google ScholarCross Ref - [9] . 2005. Arabic sign language: A perspective. J. Deaf Stud. Deaf Educ. 10, 2 (2005), 212–221.
DOI: Google ScholarCross Ref - [10] . 2017. A study of content analysis of the first and second arabic sign dictionaries for the deaf phonology (phonological) system in arabic deaf sign language. IUG J. Educ. Psychol. Stud. 25, 4 (2017), 284–313.
DOI: Google ScholarCross Ref - [11] . 2020. Understanding vision-based continuous sign language recognition. Multimedia Tools Appl. 79, 31-32 (2020), 22177–22209.
DOI: Google ScholarCross Ref - [12] . 2021. Action recognition using kinematics posture feature on 3D skeleton joint locations. Pattern Recogn. Lett. 145 (2021), 216–224.Google ScholarDigital Library
- [13] . 2021. A two-stream neural network for pose-based hand gesture recognition. IEEE Trans. Cogn. Dev. Syst. 14, 4 (2021), 1594–1603.Google ScholarCross Ref
- [14] . 2022. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV’22) Workshops.Google ScholarCross Ref
- [15] . 2021. OpenHands: Making sign language recognition accessible with pose-based pretrained models across languages.
arxiv:2110.05877 . Retrieved from http://arxiv.org/abs/2110.05877.Google Scholar - [16] . 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291–7299.Google ScholarCross Ref
- [17] . 2020. Blazepose: On-device real-time body pose tracking. arXiv:2006.10204. Retrieved from https://arxiv.org/abs/2006.10204.Google Scholar
- [18] . 2021. American sign language alphabet recognition by extracting feature from hand pose estimation. Sensors 21, 17 (2021), 5856.Google ScholarCross Ref
- [19] . 2021. Sign language recognition by means of common spatial patterns. In Proceedings of the 5th International Conference on Machine Learning and Soft Computing (ICMLSC’21). Association for Computing Machinery, New York, NY, 96–102.
DOI: Google ScholarDigital Library - [20] . 2021. Monocular tracking of human hand on a smart phone camera using mediapipe and its application in robotics. In Proceedings of the IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC’21). 1–6.
DOI: Google ScholarCross Ref - [21] . 2001. Mediapipe Solutions. Retrieved May 21, 2022 from https://google.github.io/mediapipe/solutions/solutions.html.Google Scholar
- [22] . 2021. Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM. Comput. Electr. Eng. 95, (April 2021), 107395.
DOI: Google ScholarDigital Library - [23] . 2021. KArSL: Arabic sign language database. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 1 (2021), 1–19.
DOI: Google ScholarDigital Library - [24] . 2021. Arabic sign language recognition using deep machine learning. In Proceedings of the 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT’21). IEEE, 1–4.
DOI: Google ScholarCross Ref - [25] . 2020. DeepArSLR: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access 8 (2020), 83199–83212.
DOI: Google ScholarCross Ref - [26] . 2019. Arabic sign language recognition using vision and hand tracking features with HMM. Int. J. Intell. Syst. Technol. Appl. 18, 5 (2019), 430–447.
DOI: Google ScholarDigital Library - [27] . 2018. Trajectory based Arabic sign language recognition. Int. J. Adv. Comput. Sci. Appl. 9, 4 (2018), 283–291.
DOI: Google ScholarCross Ref - [28] . 2021. Arabic sign language recognition system using 2D hands and body skeleton data. IEEE Access 9 (2021), 59612–59627.
DOI: Google ScholarCross Ref - [29] . 2020. Sign language recognition with transformer networks. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC’20), 6018–6024. Google Scholar
- [30] . 2020. Video-based isolated hand sign language recognition using a deep cascaded model. Multimedia Tools Appl. 79, 31-32 (2020), 22965–22987.
DOI: Google ScholarCross Ref - [31] . 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic Sign Language. IEEE Trans. Syst. Man Cybernet. B: Cybernet. 37, 3 (2007), 641–650.
DOI: Google ScholarDigital Library - [32] . 2018. Real-time Sign Language Gesture (Word) Recognition from Video Sequences using CNN and RNN. Vol. 695. Springer, Singapore. 623–632 pages.
DOI: Google ScholarCross Ref - [33] . 2022. An evaluation of hand-based algorithms for sign language recognition. In Proceedings of the 5th International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD’22).
DOI: Google ScholarCross Ref - [34] . 2022. Investigating signer-independent sign language recognition on the LSA64 dataset. In Proceedings of the Southern Africa Telecommunication Networks and Applications Conference (SATNAC’22). Fancourt, George, Western Cape, South Africa.Google Scholar
- [35] . 2018. Towards on-line sign language recognition using cumulative SD-VLAD descriptors. Commun. Comput. Inf. Sci. 885 (2018), 371–385.
DOI: Google ScholarCross Ref - [36] . 2021. Sign language recognition based on global-local attention. J. Vis. Commun. Image Represent. 80, July (2021), 103280.
DOI: Google ScholarDigital Library - [37] 2021. Multi-view motion modelled deep attention networks (M2DA-Net) for video based sign language recognition. J. Vis. Commun. Image Represent. 78, (2021), 103161.
DOI: Google ScholarDigital Library - [38] . 2020. Trajectory-based recognition of dynamic Persian sign language using hidden Markov model. Comput. Speech Lang. 61 (2020), 101053.
DOI: Google ScholarDigital Library - [39] . 2021. Isolated sign language recognition with depth cameras. Proc. Comput. Sci. 192 (2021), 2085–2094.
DOI: Google ScholarDigital Library - [40] . 2022. Sign language recognition based on R(2+1)D with spatial—temporal—channel attention. IEEE Trans. Hum.-Mach. Syst. (2022), 1–12.
DOI: Google ScholarCross Ref - [41] . 2022. Efficient 3D CNNs with knowledge transfer for sign language recognition. Multimedia Tools Appl. 7, 81 (2022), 10071–10090.
DOI: Google ScholarDigital Library - [42] . 2020. MS-ASL: A large-scale data set and benchmark for understanding American sign language. In Proceedings of the 30th British Machine Vision Conference (BMVC’19).Google Scholar
- [43] . 2021. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 167, (2021), 114403.
DOI: Google ScholarDigital Library - [44] . 2018. Sign language recognition based on 3D convolutional neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10882, 399–407.
DOI: Google ScholarCross Ref - [45] . 2021. Arabic sign language detection using deep learning based pose estimation. In Proceedings of the 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA’21). 161–166.
DOI: Google ScholarCross Ref - [46] . 2021. Pose-based sign language recognition using GCN and BERT. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW’21), 31–40.
DOI: Google ScholarCross Ref - [47] . 2021. Thai sign language recognition: An application of deep neural network. In Proceedings of the Joint 6th International Conference on Digital Arts, Media and Technology with 4th ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering (ECTI/DAMT/NCON’21) (2021), 128–131.
DOI: Google ScholarCross Ref - [48] . 2020. Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people. Neural Netw. 125 (2020), 41–55.
DOI: Google ScholarCross Ref - [49] . 2021. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 3408–3418.
DOI: Google ScholarCross Ref - [50] . 2020. Applying (3+2+1)D residual neural network with frame selection for Hong Kong sign language recognition. In Proceedings of the International Conference on Pattern Recognition, 4296–4302.
DOI: Google ScholarCross Ref - [51] . 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20), 1448–1458.
DOI: Google ScholarCross Ref - [52] . 2020. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors (Switz.) 20, 18 (2020), 1–19.
DOI: Google ScholarCross Ref - [53] . 2021. Global-local enhancement network for NMF-aware sign language recognition. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3 (2021), 1–19.Google ScholarDigital Library
- [54] . 2021. An attention-enhanced multi-scale and dual sign language recognition network based on a graph convolution network. Sensors (Switz.) 21, 4 (2021), 1–22.
DOI: Google ScholarCross Ref - [55] . 2016. LSA64: An argentinian sign language dataset. Congr. Argent. Cienc. Comput. (2016), 794–803.Google Scholar
- [56] . 1997. Long short-term memory. Neural Comput. 9, 8 (
November 1997), 1735–1780.DOI: Google ScholarDigital Library - [57] . 2017. Hyperspectral images classification with Gabor filtering and convolutional neural network. IEEE Geosci. Remote Sens. Lett. 14, 12 (2017), 2355–2359.Google ScholarCross Ref
- [58] . 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271. Retrieved from https://arxiv.org/abs/1803.01271.Google Scholar
- [59] . 2019. Short-term temporal convolutional networks for dynamic hand gesture recognition. arXiv:2001.05833. Retrieved from https://arxiv.org/abs/2001.05833.Google Scholar
- [60] . 2016. Wavenet: A generative model for raw audio. arXiv:1609.03499. Retrieved from https://arxiv.org/abs/1609.03499.Google Scholar
- [61] . 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google Scholar
- [62] . 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805.Google Scholar
- [63] . 2021. Transformers in vision: A survey. ACM Comput. Surv. 54, 10s (2021), 1–41.Google ScholarDigital Library
- [64] . 2016. Sign languague recognition without frame-sequencing constraints: A proof of concept on the argentinian sign language. In Ibero-American Conference on Artificial Intelligence. Springer, Berlin, 338–349.Google ScholarDigital Library
Index Terms
- Isolated Arabic Sign Language Recognition Using a Transformer-based Model and Landmark Keypoints
Recommendations
SignsWorld Atlas; a benchmark Arabic Sign Language database
Research has increased notably in vision-based automatic sign language recognition (ASLR). However, there has been little attention given to building a uniform platform for these purposes. Sign language (SL) includes not only static hand gestures, ...
An integrated sign language recognition system
SAICSIT '13: Proceedings of the South African Institute for Computer Scientists and Information Technologists ConferenceThe South African Sign Language research group at the University of the Western Cape has created several systems to recognize Sign Language gestures using single parameters. Research has shown that five parameters are required to recognize any sign ...
Arabic sign language recognition using vision and hand tracking features with HMM
Sign language employs signs made by hands and facial expressions to convey meaning. Sign language recognition facilitates the communication between community and hearing-impaired people. This work proposes a recognition system for Arabic sign language ...
Comments