Skip to main content

Phonologically-Meaningful Subunits for Deep Learning-Based Sign Language Recognition

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12536))

Included in the following conference series:

Abstract

The large majority of sign language recognition systems based on deep learning adopt a word model approach. Here we present a system that works with subunits, rather than word models. We propose a pipelined approach to deep learning that uses a factorisation algorithm to derive hand motion features, embedded within a low-rank trajectory space. Recurrent neural networks are then trained on these embedded features for subunit recognition, followed by a second-stage neural network for sign recognition. Our evaluation shows that our proposed solution compares well in accuracy against the state of the art, providing added benefits of better interpretability and phonologically-meaningful subunits that can operate across different signers and sign languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The value for speech is taken from a website which tracks the current state of the art in speech recognition on a number of standard benchmark datasets: http://github.com/syhw/wer_are_we. While the reported value for ASLR is obtained on one of the currently most challenging ‘real-life’ signing datasets available: http://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX/.

References

  1. von Agris, U., Knorr, M., Kraiss, K.: The significance of facial features for automatic sign language recognition. In: Proceedings of the 8th International Conference on Automatic Face & Gesture Recognition (FG). IEEE (2008)

    Google Scholar 

  2. Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Nonrigid structure from motion in trajectory space. In: Koller, D., et al. (eds.) Advances in Neural Information Processing Systems (NIPS), p. 41. Curran Associates Inc. (2009)

    Google Scholar 

  3. Akhter, I., Sheikh, Y., Khan, S., Kanade, T.: Trajectory space: a dual representation for nonrigid structure from motion. IEEE TPAMI 33(7), 1442–1456 (2011)

    Article  Google Scholar 

  4. Avola, D., Bernardi, M., Cinque, L., Foresti, G.L., Massaroni, C.: Exploiting recurrent neural networks and leap motion controller for the recognition of sign language and semaphoric hand gestures. IEEE Trans. Multimedia 21, 234–245 (2018)

    Article  Google Scholar 

  5. Awad, G., Han, J., Sutherland, A.: Novel boosting framework for subunit-based sign language recognition. In: Proceedings of the ICIP, pp. 2729–2732. IEEE (2009)

    Google Scholar 

  6. Bauer, B., Karl-Friedrich, K.: Towards an automatic sign language recognition system using subunits. In: Wachsmuth, I., Sowa, T. (eds.) GW 2001. LNCS (LNAI), vol. 2298, pp. 64–75. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47873-6_7

    Chapter  Google Scholar 

  7. Blackman, S.S.: Multiple hypothesis tracking for multiple target tracking. IEEE Aero. Electron. Syst. Mag. 19(1), 5–18 (2004)

    Article  Google Scholar 

  8. Borg, M., Camilleri, K.P.: Towards a transcription system of sign language video resources via motion trajectory factorisation. In: Proceedings of the 2017 ACM Symposium on Document Engineering, DocEng 2017, pp. 163–172. ACM, New York (2017). https://doi.org/10.1145/3103010.3103020

  9. Bowden, R., Windridge, D., Kadir, T., Zisserman, A., Brady, M.: A linguistic feature vector for the visual interpretation of sign language. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3021, pp. 390–401. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24670-1_30

    Chapter  Google Scholar 

  10. Camgöz, N.C., Hadfield, S., Koller, O., Bowden, R.: SubUNets: end-to-end hand shape and continuous sign language recognition. In: Proceedings of the ICCV. IEEE, October 2017

    Google Scholar 

  11. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In: arXiv preprint 1812.08008 (2018)

    Google Scholar 

  12. Charles, J., Pfister, T., Magee, D., Hogg, D., Zisserman, A.: Upper body pose estimation with temporal sequential forests. In: Proceedings of the BMVC (2014)

    Google Scholar 

  13. Cheok, M.J., Omar, Z., Hisham Jaward, M.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybernet. 10 (2017). https://doi.org/10.1007/s13042-017-0705-5

  14. Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. In: Proceedings of the BMVC (2009)

    Google Scholar 

  15. Cooper, H., Holt, B., Bowden, R.: Sign language recognition. In: Moeslund, T.B., et al. (eds.) Visual Analysis of Humans - Looking at People, pp. 539–562. Springer, London (2011). https://doi.org/10.1007/978-0-85729-997-0_27. No. 231135

    Chapter  Google Scholar 

  16. Crasborn, O., et al.: ECHO Data Set for Sign Language of the Netherlands (NGT) (2004)

    Google Scholar 

  17. Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the CVPR, pp. 1610–1618. IEEE, July 2017. https://doi.org/10.1109/CVPR.2017.175

  18. Efthimiou, E., et al.: Sign Language technologies and resources of the Dicta-Sign project. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), RPSL Workshop. ELRA (2012)

    Google Scholar 

  19. Fang, B., Co, J., Zhang, M.: DeepASL: enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In: Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys). ACM (2017). https://doi.org/10.1145/3131672.3131693

  20. Farag, I., Brock, H.: Learning motion disfluencies for automatic sign language segmentation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7360–7364, May 2019. https://doi.org/10.1109/ICASSP.2019.8683523

  21. Fenlon, J., Cormier, K., Brentari, D.: The Phonology of Sign Languages, pp. 453–475. Routledge (2017). https://doi.org/10.4324/9781315675428

  22. Gattupalli, S., Ghaderi, A., Athitsos, V.: Evaluation of deep learning based pose estimation for sign language recognition. In: Proceedings of the 9th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA). ACM (2016)

    Google Scholar 

  23. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence, vol. 385. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2

    Book  MATH  Google Scholar 

  24. Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 369–376 (2006)

    Google Scholar 

  25. Guo, D., Tang, S., Wang, M.: Connectionist temporal modeling of video and language: a joint model for translation and sign labeling. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI. pp. 751–757 (2019)

    Google Scholar 

  26. Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical LSTM for sign language translation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 6845–6852 (2018)

    Google Scholar 

  27. Guo, J., Wang, J., Bai, R., Zhang, Y., Li, Y.: A new moving object detection method based on frame-difference and background subtraction. IOP Conf. Ser. Mater. Sci. Eng. 242(1), 012115 (2017)

    Article  Google Scholar 

  28. Hanson, V.L.: Computing technologies for deaf and hard of hearing users. In: Sears, A., Jacko, J.A. (eds.) Human-Computer Interaction: Designing for Diverse Users and Domains, chap. 8, pp. 885–893. Taylor & Francis Group (2009). https://doi.org/10.1201/9781420088885

  29. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the ICCV, pp. 1026–1034 (2015). https://doi.org/10.1109/ICCV.2015.123

  30. Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: 32nd Conference on Artificial Intelligence (AAAI), pp. 2257–2264. AAAI (2018)

    Google Scholar 

  31. Kelly, D., McDonald, J., Markham, C.: Recognition of spatiotemporal gestures in sign language using gesture threshold HMMs. In: Wang L., Zhao G., Cheng L., Pietikäinen M. (eds.) Machine Learning for Vision-Based Motion Analysis. Advances in Pattern Recognition, pp. 307–348. Springer, London (2011). https://doi.org/10.1007/978-0-85729-057-1_12

  32. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015, p. 13 (2015)

    Google Scholar 

  33. Koller, O., Ney, H., Bowden, R.: Deep hand: how to train a CNN on 1 million hand images when your data is continuous and weakly labelled. In: Proceedings of the CVPR, pp. 3793–3802. IEEE, June 2016. https://doi.org/10.1109/CVPR.2016.412

  34. Koller, O., Zargaran, S., Ney, H.: Re-sign: re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMs. In: Proceedings of the CVPR, pp. 3416–3424. IEEE, July 2017. https://doi.org/10.1109/CVPR.2017.364

  35. Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015)

    Article  Google Scholar 

  36. Koller, O., Zargaran, S., Hermann, N., Bowden, R.: Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int. J. Comput. Vis. 126(12), 1311–1325 (2018)

    Article  Google Scholar 

  37. Koller, O., Zargaran, S., Ney, H., Bowden, R.: Deep sign: hybrid CNN-HMM for continuous sign language recognition. In: Proceedings of the BMVC (2016)

    Google Scholar 

  38. Lüscher, C., et al.: RWTH ASR systems for LibriSpeech: hybrid vs attention. In: Proceedings of the Interspeech 2019, pp. 231–235 (2019). https://doi.org/10.21437/Interspeech.2019-1780

  39. van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  40. Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. CoRR (2018)

    Google Scholar 

  41. Metaxas, D., Dilsizian, M., Neidle, C.: Linguistically-driven framework for computationally efficient and scalable sign recognition. In: Calzolari, N., et al. (eds.) Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC). ELRA (2018)

    Google Scholar 

  42. Oszust, M., Wysocki, M.: Modelling and recognition of signed expressions using subunits obtained by data–driven approach. In: Ramsay, A., Agre, G. (eds.) AIMSA 2012. LNCS (LNAI), vol. 7557, pp. 315–324. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33185-5_35

    Chapter  Google Scholar 

  43. Panzner, M., Cimiano, P.: Comparing hidden Markov models and long short term memory neural networks for learning action representations. In: Pardalos, P.M., Conca, P., Giuffrida, G., Nicosia, G. (eds.) MOD 2016. LNCS, vol. 10122, pp. 94–105. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-51469-7_8

    Chapter  Google Scholar 

  44. Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: Proceedings of the BMVC (2012)

    Google Scholar 

  45. Pigou, L., Herreweghe, M.V., Dambre, J.: Gesture and sign language recognition with temporal residual networks. In: Proceedings of the ICCV Workshops, pp. 3086–3093, October 2017. https://doi.org/10.1109/ICCVW.2017.365

  46. Pu, J., Zhou, W., Li, H.: Dilated convolutional network with iterative optimization for continuous sign language recognition. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), pp. 885–891 (2018)

    Google Scholar 

  47. Pu, J., Zhou, W., Zhang, J., Li, H.: Sign language recognition based on trajectory modeling with HMMs. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9516, pp. 686–697. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27671-7_58

    Chapter  Google Scholar 

  48. Sako, S., Kitamura, T.: Subunit modeling for japanese sign language recognition based on phonetically depend multi-stream hidden Markov models. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013. LNCS, vol. 8009, pp. 548–555. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39188-0_59

    Chapter  Google Scholar 

  49. Schirmer, B.R.: Psychological, Social, and Educational Dimensions of Deafness. Allyn & Bacon, Boston (2001)

    Google Scholar 

  50. Shi, J., Tomasi, C.: Good features to track. In: Proceedings of the CVPR, pp. 593–600 (1994)

    Google Scholar 

  51. Smith, S.L., Kindermans, P.J., Le, Q.V.: Don’t decay the learning rate, increase the batch size. In: International Conference on Learning Representations (2018)

    Google Scholar 

  52. Stokoe, W.C.: Sign language structure. Ann. Rev. Anthropol. 9(1), 365–390 (1980). https://doi.org/10.1146/annurev.an.09.100180.002053

    Article  Google Scholar 

  53. Sun, Z.L., Fang, Y., Shang, L., Zhu, X.G.: A missing data estimation approach for small size image sequence. In: 5th International Conference on Intelligent Control and Information Processing, pp. 479–481. IEEE, August 2014

    Google Scholar 

  54. Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. 9(2), 137–154 (1992)

    Article  Google Scholar 

  55. Van Staden, A., Badenhorst, G., Ridge, E.: The benefits of sign language for deaf learners with language challenges. Per Linguam 25(1), 44–60 (2009)

    Google Scholar 

  56. Vogler, C., Goldenstein, S.: Toward computational understanding of sign language. In: Technology and Disability, vol. 20, pp. 109–119. IOS Press (2008)

    Google Scholar 

  57. Wimmer, M., Radig, B.: Adaptive skin color classificator. In: Proceedings of the 1st ICGST International Conference on Graphics, Vision and Image Processing (GVIP), pp. 324–327 (2005)

    Google Scholar 

  58. Yang, R., Sarkar, S., Loeding, B.: Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming. IEEE TPAMI 32(3), 462–477 (2010)

    Article  Google Scholar 

  59. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)

    Article  Google Scholar 

  60. Zheng, L., Liang, B., Jiang, A.: Recent advances of deep learning for sign language recognition. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA), November 2017

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Borg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borg, M., Camilleri, K.P. (2020). Phonologically-Meaningful Subunits for Deep Learning-Based Sign Language Recognition. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66096-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66095-6

  • Online ISBN: 978-3-030-66096-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics