Skip to main content

Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

  • 298 Accesses

Abstract

With computer vision developing rapidly, sign language recognition (SLR) can be realized to bridge the communication gap for deaf people. In this paper, we propose a novel deep reinforcement learning model imitating the dynamic attention of humans for isolated SLR that selectively pays attention to keyframes of video and exclude noise from the redundant frames. We construct a Partially Observable Markov Decision Process (POMDP) to learn dynamic attention for SLR from the non-differentiable sequence of interactions. The proposed model adopts Inflated 3D ConvNets as the feature learner. Following the policy learned by the deep reinforcement learning method, the proposed model “observes” a clip from the video to infer the position of keyframes and move the focus for the following observation. As a result, dynamic attention excludes interference from redundant frames and improves performance. We validate the effectiveness of the proposed method and compare it with benchmark methods on the Chinese Sign Language dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chena, Y., Gao, W., Fang, G., Yang, C., Wang, Z.: CSLDS: Chinese sign language dialog system. In: IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE, vol. 2003, pp. 236–237 (2003)

    Google Scholar 

  2. Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)

    Article  Google Scholar 

  3. Vogler, C., Metaxas, D.: ASL recognition based on a coupling between HMMs and 3D motion analysis. In: Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 363–369. IEEE (1998)

    Google Scholar 

  4. Fang, G., Gao, W., Zhao, D.: Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 34(3), 305–314 (2004)

    Article  Google Scholar 

  5. Sun, C., Zhang, T., Bao, B.-K., Xu, C., Mei, T.: Discriminative exemplar coding for sign language recognition with kinect. IEEE Trans. Cybern. 43(5), 1418–1428 (2013)

    Article  Google Scholar 

  6. Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64(2), 107–123 (2005)

    Article  Google Scholar 

  7. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

    Google Scholar 

  8. Tang, A., Lu, K., Wang, Y., Huang, J., Li, H.: A real-time hand posture recognition system using deep neural networks. ACM Trans. Intell. Syst. Technol. (TIST) 6(2), 1–23 (2015)

    Article  Google Scholar 

  9. Pigou, L., Van Den Oord, A., Dieleman, S., Van Herreweghe, M., Dambre, J.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vision 126(2), 430–439 (2018)

    Article  MathSciNet  Google Scholar 

  10. Hu, H., Zhou, W., Pu, J., Li, H.: Global-local enhancement network for NMF-aware sign language recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 17(3), 1–19 (2021)

    Google Scholar 

  11. Yosinski, J., et al.: Advances in neural information processing systems. vol. 27 (2014)

    Google Scholar 

  12. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)

    Google Scholar 

  13. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv e-prints, pp. arXiv-1508 (2015)

    Google Scholar 

  14. Huang, J., Zhou, W., Li, H., Li, W.: Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Trans. Circuits Syst. Video Technol. 29(9), 2822–2832 (2018)

    Article  Google Scholar 

  15. Monahan, G.E.: State of the art-a survey of partially observable markov decision processes: theory, models, and algorithms. Manage. Sci. 28(1), 1–16 (1982)

    Article  MATH  Google Scholar 

  16. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  17. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  18. Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive HMM. In: IEEE International Conference on Multimedia and Expo (ICME). vol. 2016, pp. 1–6. IEEE (2016)

    Google Scholar 

  19. Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74936-3_22

    Chapter  Google Scholar 

  20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  21. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  22. Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1817–1824 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchun Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, S., Fang, Y., Wang, L. (2023). Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46308-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46307-5

  • Online ISBN: 978-3-031-46308-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics