Visual feature segmentation with reinforcement learning for continuous sign language recognition

Fang, Yuchun; Wang, Liangjun; Lin, Shiquan; Ni, Lan

doi:10.1007/s13735-023-00302-8

Visual feature segmentation with reinforcement learning for continuous sign language recognition

Regular Paper
Published: 18 November 2023

Volume 12, article number 39, (2023)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Yuchun Fang¹,
Liangjun Wang¹,
Shiquan Lin¹ &
…
Lan Ni²

188 Accesses
Explore all metrics

Abstract

Continuous sign language recognition (CSLR) involves inputting a video that contains unbroken signs and outputting a prediction of the sign gloss sequence. Our research found that the visual features extracted from different signs in a sign language video show a noticeable disparity. As a result, we employed reinforcement learning (RL) to segment the visual features of the video into multiple groups to aid in model training. Compared to previous CSLR methods, our approach results in a more fine-tuned and supervised training process, leading to greater effective gradient backpropagation and improved model performance. We introduce a novel method named “Visual Feature Segmentation with Reinforcement Learning (VFS-RL)” for CSLR. Firstly, we constructed an end-to-end continuous sign language recognition network. Subsequently, we designed an auxiliary task of multi-class recognition to improve the model’s capability for extracting semantic information from sign video, which uses RL to group the video’s visual features. Finally, we conducted experiments on two public CSLR datasets, and the results of our ablation studies demonstrate the effectiveness of our proposed method. Our approach has competitive results compared to other methods in comparison tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning

Global-Temporal Enhancement for Sign Language Recognition

A transformer model for boundary detection in continuous sign language

Article 03 April 2024

Data availability

The public datasets supporting the findings of this study are accessible through references [22] and [18].

References

Adaloglou N, Chatzis T, Papastratis I et al (2021) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2021.3070438
Article Google Scholar
Al-Ayyoub M, Nuseir A, Alsmearat K et al (2018) Deep learning for Arabic NLP: a survey. J Comput Sci 26:522–531
Article Google Scholar
Cheng KL, Yang Z, Chen Q et al (2020) Fully convolutional networks for continuous sign language recognition. In: European conference on computer vision. Springer, pp 697–714
Cihan Camgoz N, Hadfield S, Koller O et al (2017) SubUNets: end-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3056–3065
Cui R, Liu H, Zhang C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7361–7369
Das S, Biswas SK, Purkayastha B (2023) A deep sign language recognition system for Indian sign language. Neural Comput Appl 35(2):1469–1481
Deng X, Yang S, Zhang Y et al (2017) Hand3D: hand pose estimation using 3d neural network. arXiv preprint arXiv:1704.02224
Dittmar T, Krull C, Horton G (2015) A new approach for touch gesture recognition: conversive hidden non-Markovian models. J Comput Sci 10:66–76
Article Google Scholar
Farajzadeh N, Hashemzadeh M (2021) A deep neural network based framework for restoring the damaged Persian pottery via digital inpainting. J Computat Sci 56(101):486
Google Scholar
Forster J, Ney H (2015) Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput Vis Image Underst CVIU 141:108–125
Article Google Scholar
Freeman WT, Roth M (1995) Orientation histograms for hand gesture recognition. International workshop on automatic face and gesture recognition. Zurich, Switzerland, pp 296–301
Google Scholar
Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649
Guo D, Wang S, Tian Q et al (2019) Dense temporal convolution network for sign language translation. In: IJCAI, pp 744–750
Guo J, Xue W, Guo L et al (2022) Multi-level temporal relation graph for continuous sign language recognition. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 408–419
Gupta B, Shukla P, Mittal A (2016) K-nearest correlated neighbor classification for Indian sign language gesture recognition using feature fusion. In: 2016 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–5
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hosseini A, Hashemzadeh M, Farajzadeh N (2022) UFS-Net: a unified flame and smoke detection method for early detection of fire in video surveillance applications using CNNs. J Comput Sci 61(101):638
Google Scholar
Huang J, Zhou W, Zhang Q et al (2018) Video-based sign language recognition without temporal segmentation. In: Proceedings of the AAAI conference on artificial intelligence
Huang S, Ye Z (2021) Boundary-adaptive encoder with attention method for Chinese sign language recognition. IEEE Access 9:70948–70960
Article Google Scholar
Ibrahim NB, Selim MM, Zayed HH (2018) An automatic Arabic sign language recognition system (ArSLRS). J King Saud Univ Comput Inf Sci 30(4):470–477
Google Scholar
KingaD A (2015) A methodforstochasticoptimization. Anon InternationalConferenceon Learning Representations SanDego: ICLR
Koller O, Forster J, Ney H (2015) Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput Vis Image Underst 141:108–125
Article Google Scholar
Koller O, Zargaran O, Ney H et al (2016) Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In: Proceedings of the British machine vision conference 2016
Koller O, Camgoz NC, Ney H et al (2019) Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE Trans Pattern Anal Mach Intell 42(9):2306–2320
Article Google Scholar
Li H, Wang W (2020) Reinterpreting CTC training as iterative fitting. Pattern Recognit 105(107):392
Google Scholar
Li R, Meng L (2022) Sign language recognition and translation network based on multi-view data. Appl Intell 52(13):14,624-14,638
Article Google Scholar
Liu H, Jin S, Zhang C (2018) Connectionist temporal classification with maximum entropy regularization. In: Advances in neural information processing systems, vol 31
Niu Z, Mak B (2020) Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. In: European conference on computer vision. Springer, pp 172–186
Pu J, Zhou W, Li H (2018) Dilated convolutional network with iterative optimization for continuous sign language recognition. In: IJCAI, p 7
Pu J, Zhou W, Li H (2019) Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4165–4174
Rao GA, Kishore P (2018) Selfie video based continuous Indian sign language recognition system. Ain Shams Eng J 9(4):1929–1939
Article Google Scholar
Shi B, Del Rio AM, Keane J et al (2018) American sign language fingerspelling recognition in the wild. In: 2018 IEEE spoken language technology workshop (SLT). IEEE, pp 145–152
Wahid MF, Tafreshi R, Al-Sowaidi M et al (2018) Subject-independent hand gesture recognition using normalization and machine learning algorithms. J Comput Sci 27:69–76
Article Google Scholar
Wang F, Du Y, Wang G et al (2022) (2+ 1) D-SLR: an efficient network for video sign language recognition. Neural Comput Appl 34(3):2413–2423
Article Google Scholar
Wang F, Li C, Liu Cw et al (2022b) An approach based on 1D fully convolutional network for continuous sign language recognition and labeling. Neural Comput Appl 34(20):17921–17935
Wei C, Zhao J, Zhou W et al (2020) Semantic boundary detection with reinforcement learning for continuous sign language recognition. IEEE Trans Circuits Syst Video Technol 31(3):1138–1149
Article Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Xie P, Zhao M, Hu X (2021) PiSLTRc: position-informed sign language transformer with content-aware convolution. IEEE Trans Multimed 24:3908–3919
Yang Z, Shi Z, Shen X et al (2019) Sf-net: Structured feature network for continuous sign language recognition. arXiv preprint arXiv:1908.01341
Zhang J, Zhou W, Xie C et al (2016) Chinese sign language recognition with adaptive HMM. In: 2016 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Zhang Z, Pu J, Zhuang L et al (2019) Continuous sign language recognition via reinforcement learning. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 285–289
Zhou H, Zhou W, Li H (2019) Dynamic pseudo label decoding for continuous sign language recognition. In: 2019 IEEE international conference on multimedia and Expo (ICME). IEEE, pp 1282–1287
Zhou H, Zhou W, Zhou Y et al (2020) Spatial-temporal multi-cue network for continuous sign language recognition. In: Proceedings of the AAAI conference on artificial intelligence, pp 13,009–13,016

Download references

Acknowledgements

We appreciate the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System No.: 19DZ2252600 for providing the computing resources.

Funding

The work is supported by the Humanities and Social Science Research Program issued by the Ministry of Education of China under Grant 17YJA40038, the Natural Science Foundation of Shanghai under Grant No.: 19ZR1419200, and the National Natural Science Foundation of China under Grant No.: 61976132, 61991411, and U1811461.

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
Yuchun Fang, Liangjun Wang & Shiquan Lin
College of Liberal Arts, Shanghai University, Shanghai, 200444, China
Lan Ni

Authors

Yuchun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Liangjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiquan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lan Ni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YF and LW wrote the main manuscript text and SL and LN prepared figures and datasets. All authors reviewed the manuscript.

Corresponding author

Correspondence to Lan Ni.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fang, Y., Wang, L., Lin, S. et al. Visual feature segmentation with reinforcement learning for continuous sign language recognition. Int J Multimed Info Retr 12, 39 (2023). https://doi.org/10.1007/s13735-023-00302-8

Download citation

Received: 14 February 2023
Revised: 26 September 2023
Accepted: 06 October 2023
Published: 18 November 2023
DOI: https://doi.org/10.1007/s13735-023-00302-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual feature segmentation with reinforcement learning for continuous sign language recognition

Abstract

Access this article

Similar content being viewed by others

Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning

Global-Temporal Enhancement for Sign Language Recognition

A transformer model for boundary detection in continuous sign language

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual feature segmentation with reinforcement learning for continuous sign language recognition

Abstract

Access this article

Similar content being viewed by others

Dynamic Attention for Isolated Sign Language Recognition with Reinforcement Learning

Global-Temporal Enhancement for Sign Language Recognition

A transformer model for boundary detection in continuous sign language

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation