skip to main content
10.1145/3536221.3556623acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Towards Accessible Sign Language Assessment and Learning

Published:07 November 2022Publication History

ABSTRACT

Recently, a phonology-based sign language assessment approach has been proposed using sign language production acquired in 3D space using Kinect sensor. In order to scale the sign language assessment system to realistic application, there is need to reduce the dependency on Kinect, which is not accessible to wider community, and develop solutions that can potentially work with web-cameras. This paper takes a step in that direction by investigating sign language recognition and sign language assessment in 2D space either by dropping the depth coordinate in Kinect or using methods for skeleton estimation from videos. Experimental studies on Swiss German Sign Language corpus SMILE show that, while loss of depth information leads to considerable drop in sign language recognition performance, high level of sign language assessment performance can still be obtained.

References

  1. ISARA application. 2016. ISARA app. Retrieved May 2022 from https://isara.app/featuresGoogle ScholarGoogle Scholar
  2. G. Aradilla, H. Bourlard, and M. Magimai.-Doss. 2008. Using KL-Based Acoustic Models in a Large Vocabulary Recognition Task. In Proceedings of Interspeech. 928–931.Google ScholarGoogle Scholar
  3. G. Aradilla, J. Vepa, and H. Bourlard. 2007. An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features. In ICASSP. 657–660.Google ScholarGoogle Scholar
  4. O. Aran 2009. SignTutor: An Interactive System for Sign Language Tutoring. IEEE MultiMedia 16, 1 (2009), 81–93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Arendsen 2008. Acceptability ratings by humans and automatic gesture recognition for variations in sign productions. In Proc. of IEEE International Conference on Automatic Face and Gesture Recognition. 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  6. H. Brashear 2006. American Sign Language Recognition in Game Development for Deaf Children. In Proc. of the International ACM SIGACCESS Conference on Computers and Accessibility. 79–86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, and Richard Bowden. 2017. SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition. In 2017 IEEE International Conference on Computer Vision (ICCV). 3075–3084. https://doi.org/10.1109/ICCV.2017.332Google ScholarGoogle Scholar
  8. Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. 2018. Neural Sign Language Translation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7784–7793. https://doi.org/10.1109/CVPR.2018.00812Google ScholarGoogle Scholar
  9. Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020. Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  10. J. Christopher. 2012. SignAssess – Online Sign Language Training Assignments via the Browser, Desktop and Mobile. In Computers Helping People with Special Needs, Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 253–260.Google ScholarGoogle Scholar
  11. H. Cooper, B. Holt, and R. Bowden. 2011. Sign Language Recognition. In Visual Analysis of Humans, 2011. https://doi.org/10.1007/978-0-85729-997-0_27Google ScholarGoogle Scholar
  12. S. Ebling, N. C. Camgöz, P. Boyes Braem, K. Tissi, S. Sidler-Miserez, S. Stoll, S. Hadfield, T. Haug, R. Bowden, S. Tornay, M. Razavi, and M. Magimai-Doss. 2018. SMILE Swiss German sign language dataset. In Proc. of the Language Resources and Evaluation Conference.Google ScholarGoogle Scholar
  13. K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In Proc. of ICCV. 2980–2988.Google ScholarGoogle Scholar
  14. Matt Huenerfauth, Elaine Gale, Brian Penly, Sree Pillutla, Mackenzie Willard, and Dhananjai Hariharan. 2017. Evaluation of Language Feedback Methods for Student Videos of American Sign Language. ACM Trans. Access. Comput. 10, 1, Article 2 (apr 2017), 30 pages. https://doi.org/10.1145/3046788Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mohammed Kadous. 1996. Machine Recognition of Auslan Signs Using PowerGloves: Towards Large-Lexicon Recognition of Sign Language. In Procs. of Wkshp : Integration of Gesture in Language and Speech.Google ScholarGoogle Scholar
  16. M. Kocabas, N. Athanasiou, and M. J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proc. of CVPR.Google ScholarGoogle ScholarCross RefCross Ref
  17. Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (Dec. 2015), 108–125.Google ScholarGoogle Scholar
  18. O. Koller, H. Ney, and R. Bowden. 2016. Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled. In Proc. of CVPR.Google ScholarGoogle Scholar
  19. O Koller, O Zargaran, H Ney, and R Bowden. 2016. Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition. In Proceedings of the British Machine Vision Conference 2016.Google ScholarGoogle ScholarCross RefCross Ref
  20. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graph. 34, 6, Article 248 (oct 2015), 16 pages. https://doi.org/10.1145/2816795.2818013Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Katerina Papadimitriou and Gerasimos Potamianos. 2020. Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning. In INTERSPEECH.Google ScholarGoogle Scholar
  22. VN Pashaloudi and KG Margaritis. 2002. Hidden Markov model for sign language recognition: A review. In Proc. 2nd Hellenic Conf. AI, SETN-2002. 11–12.Google ScholarGoogle Scholar
  23. D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proc. of CVPR.Google ScholarGoogle Scholar
  24. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 91–99.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ben Saunders, Necati Cihan Camgoz, and Richard Bowden. 2021. Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks. Int. J. Comput. Vision 129, 7 (jul 2021), 2113–2135.Google ScholarGoogle Scholar
  26. G. Spaai 2005. Elo: An electronic learning environment for practising sign vocabulary by young deaf children. In Proc. of International Congress for Education of the Deaf.Google ScholarGoogle Scholar
  27. Stephanie Stoll, Necati Cihan Camgöz, Simon Hadfield, and Richard Bowden. 2020. Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks. Int. J. Comput. Vis. 128, 4 (2020), 891–908.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sandrine Tornay, Oya Aran, and Mathew Magimai.-Doss. 2020. An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition. In Proc. of the International Conference on Language Resources and Evaluation LREC 2020.Google ScholarGoogle Scholar
  29. S. Tornay, N. C. Camgoz, R. Bowden, and M. Magimai.-Doss. 2020. A Phonology-based Approach for Isolated Sign Production Assessment in Sign Language. In Companion Publication of the 2020 International Conference on Multimodal Interaction (ICMI ’20 Companion).Google ScholarGoogle Scholar
  30. S. Tornay and M. Magimai.-Doss. 2019. Subunits Inference and Lexicon Development Based on Pairwise Comparison of Utterances and Signs. Information 10(2019). https://doi.org/10.3390/info10100298Google ScholarGoogle Scholar
  31. S. Tornay, M. Razavi, N. C. Camgoz, R. Bowden, and M. Magimai.-Doss. 2019. HMM-based Approaches to Model Multichannel Information in Sign Language inspired from Articulatory Features-based Speech Processing. In Proc. in the IEEE ICASSP.Google ScholarGoogle Scholar
  32. SignAll Technologies Inc. (USA). 2021. A communication bridge between deaf and hearing - SIGNALL. Retrieved May 2022 from https://www.signall.usGoogle ScholarGoogle Scholar
  33. C. Vogler and D. Metaxas. 1998. ASL recognition based on a coupling between HMMs and 3D motion analysis. Procs. of ICCV, 363–369.Google ScholarGoogle Scholar
  34. C. Vogler and D. Metaxas. 1999. Parallel hidden Markov models for American sign language recognition. In Proc. of the Seventh IEEE International Conference on Computer Vision (ICCV), Vol. 1. 116–122 vol.1. https://doi.org/10.1109/ICCV.1999.791206Google ScholarGoogle ScholarCross RefCross Ref
  35. M.B. Waldron and Soowon Kim. 1995. Isolated ASL sign recognition system for deaf persons. IEEE Transactions on Rehabilitation Engineering 3, 3(1995), 261–271. https://doi.org/10.1109/86.413199Google ScholarGoogle ScholarCross RefCross Ref
  36. Louisa Willoughby, Stephanie Linder, Kirsten Ellis, and Julie Fisher. 2015. Errors and Feedback in the Beginner Auslan Classroom. Sign Language Studies 15(2015), 322 – 347.Google ScholarGoogle ScholarCross RefCross Ref
  37. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2016. Aggregated Residual Transformations for Deep Neural Networks. arXiv preprint arXiv:1611.05431(2016).Google ScholarGoogle Scholar
  38. Z. Zafrulla 2011. CopyCat: An American Sign Language game for deaf children. In IEEE International Conference on Automatic Face and Gesture Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  39. C. Zhe, S. Tomas, W. Shih-En, and S. Yaser. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proc. of CVPR.Google ScholarGoogle Scholar

Index Terms

  1. Towards Accessible Sign Language Assessment and Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
        November 2022
        830 pages
        ISBN:9781450393904
        DOI:10.1145/3536221

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%
      • Article Metrics

        • Downloads (Last 12 months)45
        • Downloads (Last 6 weeks)5

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format