Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning

Lin, Shiquan; Fang, Yuchun; Wang, Liangjun

doi:10.1007/978-3-031-46308-2_21

Shiquan Lin¹⁴,
Yuchun Fang¹⁴ &
Liangjun Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14356))

Included in the following conference series:

International Conference on Image and Graphics

298 Accesses

Abstract

With computer vision developing rapidly, sign language recognition (SLR) can be realized to bridge the communication gap for deaf people. In this paper, we propose a novel deep reinforcement learning model imitating the dynamic attention of humans for isolated SLR that selectively pays attention to keyframes of video and exclude noise from the redundant frames. We construct a Partially Observable Markov Decision Process (POMDP) to learn dynamic attention for SLR from the non-differentiable sequence of interactions. The proposed model adopts Inflated 3D ConvNets as the feature learner. Following the policy learned by the deep reinforcement learning method, the proposed model “observes” a clip from the video to infer the position of keyframes and move the focus for the following observation. As a result, dynamic attention excludes interference from redundant frames and improves performance. We validate the effectiveness of the proposed method and compare it with benchmark methods on the Chinese Sign Language dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chena, Y., Gao, W., Fang, G., Yang, C., Wang, Z.: CSLDS: Chinese sign language dialog system. In: IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE, vol. 2003, pp. 236–237 (2003)
Google Scholar
Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Article Google Scholar
Vogler, C., Metaxas, D.: ASL recognition based on a coupling between HMMs and 3D motion analysis. In: Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 363–369. IEEE (1998)
Google Scholar
Fang, G., Gao, W., Zhao, D.: Large vocabulary sign language recognition based on fuzzy decision trees. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 34(3), 305–314 (2004)
Article Google Scholar
Sun, C., Zhang, T., Bao, B.-K., Xu, C., Mei, T.: Discriminative exemplar coding for sign language recognition with kinect. IEEE Trans. Cybern. 43(5), 1418–1428 (2013)
Article Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64(2), 107–123 (2005)
Article Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Google Scholar
Tang, A., Lu, K., Wang, Y., Huang, J., Li, H.: A real-time hand posture recognition system using deep neural networks. ACM Trans. Intell. Syst. Technol. (TIST) 6(2), 1–23 (2015)
Article Google Scholar
Pigou, L., Van Den Oord, A., Dieleman, S., Van Herreweghe, M., Dambre, J.: Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int. J. Comput. Vision 126(2), 430–439 (2018)
Article MathSciNet Google Scholar
Hu, H., Zhou, W., Pu, J., Li, H.: Global-local enhancement network for NMF-aware sign language recognition. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 17(3), 1–19 (2021)
Google Scholar
Yosinski, J., et al.: Advances in neural information processing systems. vol. 27 (2014)
Google Scholar
Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)
Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv e-prints, pp. arXiv-1508 (2015)
Google Scholar
Huang, J., Zhou, W., Li, H., Li, W.: Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Trans. Circuits Syst. Video Technol. 29(9), 2822–2832 (2018)
Article Google Scholar
Monahan, G.E.: State of the art-a survey of partially observable markov decision processes: theory, models, and algorithms. Manage. Sci. 28(1), 1–16 (1982)
Article MATH Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive HMM. In: IEEE International Conference on Multimedia and Expo (ICME). vol. 2016, pp. 1–6. IEEE (2016)
Google Scholar
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L¹ optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74936-3_22
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1817–1824 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Shiquan Lin, Yuchun Fang & Liangjun Wang

Authors

Shiquan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yuchun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Liangjun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuchun Fang .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, S., Fang, Y., Wang, L. (2023). Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14356. Springer, Cham. https://doi.org/10.1007/978-3-031-46308-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-46308-2_21
Published: 30 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46307-5
Online ISBN: 978-3-031-46308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning