An approach based on 1D fully convolutional network for continuous sign language recognition and labeling

Wang, Fei; Li, Chen; Liu, Chuan-wen; Zeng, Zhen; Xu, Ke; Wu, Jin-xiu

doi:10.1007/s00521-022-07415-x

An approach based on 1D fully convolutional network for continuous sign language recognition and labeling

Original Article
Published: 07 June 2022

Volume 34, pages 17921–17935, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Fei Wang ORCID: orcid.org/0000-0001-8296-8039¹,
Chen Li¹,
Chuan-wen Liu¹,
Zhen Zeng¹,
Ke Xu² &
…
Jin-xiu Wu²

488 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Sign language is the most important communication method for people with speech impairments, and automatic sign language recognition helps them communicate with normal people without barriers. For portability considerations, the device that integrates surface electromyography (sEMG) sensors and inertial measurement units (IMU) is used to collect and obtain 1D 14-channel sign language data. However, 1D data are not readable by humans. In order to accurately obtain effective sign language to better complete word-level and continuous sign language recognition, synchronized video and a lot of labor costs are needed. In this paper, we propose an approach based on 1D fully convolutional network (FCN) called as SignD-Net, which can be used for labeling and recognition of 1D time series sign language data. SignD-Net compares sign language labeling with object detection and uses YOLO as the basis to assign a bounding box to each predicted object. Using the optimal 1D-CNN model selected by the experiments, continuous sign language labeling and recognition can be realized. With limited data, the model is pre-trained with word-level sign language data and simulated sentence-level data, and at the end of the training, real collected and manually labeled sign language data are used. Through experiments on sign language test data, SignD-Net has been proven to have excellent capabilities, achieving a mean average precision (mAP) of 99.18% on the labeling task, and achieving a sentence-level accuracy of up to 98.74% on the recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2

Fig. 6

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Ninad Mehendale

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Traffic sign recognition based on deep learning

Article Open access 07 March 2022

Yanzhao Zhu & Wei Qi Yan

Availability of data and materials

Availability of data is temporarily not allowed by the authors.

Code availability

Code availability is temporarily not allowed by the authors.

References

Duarte SBR, Chaveiro N, de Freitas AR et al (2021) Validation of the WHOQOL-Bref instrument in Brazilian sign language (Libras). Qual Life Res 30(1):303–313. https://doi.org/10.1007/s11136-020-02611-5
Article Google Scholar
Perera AG, Law YW, Chahl J (2018) Human pose and path estimation from aerial video using dynamic classifier selection. Cognitive Comput 10:1019–1041. https://doi.org/10.1007/s12559-018-9577-6
Article Google Scholar
Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions Multimedia 21(7):1880–1891. https://doi.org/10.1109/TMM.2018.2889563
Article Google Scholar
N. Cihan Camgöz, O. Koller, S. Hadfield, et al. (2020) Sign language transformers: joint end-to-end sign language recognition and translation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp 10020-10030, https://doi.org/10.1109/CVPR42600.2020.01004.
Mummadi CK, Leo, FPP, Verma, KD et al. (2017) Real-time embedded recognition of sign language alphabet fingerspelling in an IMU-based glove In: Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction, pp 1-6 https://doi.org/10.1145/3134230.3134236
Saleh N, Farghaly M, Elshaaer E et al. (2020) Smart glove-based gestures recognition system for Arabic sign language In: International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), IEEE, pp 303-307 https://doi.org/10.1109/ITCE48509.2020.9047820
Zhou Z, Chen K, Li X et al (2020) Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat Electron 3:571–578. https://doi.org/10.1038/s41928-020-0428-6
Article Google Scholar
Hu Y, Wong Y, Wei W et al (2018) A novel attention-based hybrid CNN- RNN architecture for sEMG-based gesture recognition. PLOS ONE. https://doi.org/10.1371/journal.pone.0206049
Article Google Scholar
Zhang, Q, Wang, D, Zhao, R et al. (2019) MyoSign: enabling End-to-End Sign Language Recognition with Wearables In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp 650-660 https://doi.org/10.1145/3301275.3302296
Wang F, Li C, Zeng Z et al (2021) Cornerstone network with feature extractor: a metric-based few-shot model for chinese natural sign language. Appl Intell 51:7139–7150. https://doi.org/10.1007/s10489-020-02170-9
Article Google Scholar
Wang Z, Zhao T, Ma J et al., Hear sign language: a real-time end-to-end sign language recognition system IEEE Transactions Mobile Comput https://doi.org/10.1109/TMC.2020.3038303.
Hou J, Li XY, Zhu P et al. (2019) SignSpeaker: a real-time, high-precision smartwatch-based sign language translator In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom ’19) Association for Computing Machinery, Article 24, 1-15 https://doi.org/10.1145/3300061.3300117
Wang F, Zhao S, Zhou X et al (2019) An recognition-verification mechanism for real-time chinese sign language recognition based on multi-information fusion. Sensors 19(11):2495. https://doi.org/10.3390/s19112495
Article Google Scholar
Cui, R, Liu, H , Zhang, C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7361-7369 https://doi.org/10.1109/CVPR.2017.175
Pu J, Zhou W and Li H (2019) Iterative alignment network for continuous sign language recognition, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4160-4169, https://doi.org/10.1109/CVPR.2019.00429
Koller O, Zargaran S, Ney H et al (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Computer V 126(12):1311–1325. https://doi.org/10.1007/s11263-018-1121-3
Article Google Scholar
Zhou H, Zhou W and Li H (2019) Dynamic pseudo label decoding for continuous sign language recognition In: IEEE International Conference on Multimedia and Expo (ICME), pp 1282-1287 https://doi.org/10.1109/ICME.2019.00223
Suri K, Gupta R (2019) Continuous sign language recognition from wearable IMUs using deep capsule networks and game theory. Computers Electr Eng 78:493–503. https://doi.org/10.1016/j.compeleceng.2019.08.006
Article Google Scholar
Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779-788 https://doi.org/10.1109/CVPR.2016.91
Liang Z, Liao S, Hu B (2018) 3D convolutional neural networks for dynamic sign language recognition. Computer J 61(11):1724–1736. https://doi.org/10.1093/comjnl/bxy049
Article Google Scholar
Rao GA, Syamala K, Kishore P et al. (2018) Deep convolutional neural networks for sign language recognition In: 2018 Conference on signal processing and communication engineering systems (SPACES), IEEE, pp 194-197 https://doi.org/10.1109/SPACES.2018.8316344
Li D, Rodriguez C, Yu X et al. (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459-1469 https://doi.org/10.1109/WACV45572.2020.9093512
Gao L, Li H, Liu Z et al (2021) RNN-transducer based Chinese sign language recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.12.006
Article Google Scholar
Zhou H, Zhou W, Zhou Y and Li H Spatial-temporal multi-cue network for sign language recognition and translation IEEE Transactions Multimedia https://doi.org/10.1109/TMM.2021.3059098
Camgoz NC, Hadfield S, Koller O et al (2018) Neural Sign Language Translation. IEEE/CVF Conference on computer vision and pattern recognition 2018: 7784–7793 https://doi.org/10.1109/CVPR.2018.00812
Venugopalan S, Rohrbach M, Donahue J et al. (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534-4542 https://doi.org/10.1109/ICCV.2015.515
Guo D, Zhou W, Li H et al. (2018) Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI conference on artificial intelligence
Huang, J, Zhou, W, Zhang, Q et al. (2018) Video-based sign language recognition without temporal segmentation In: Thirty-Second AAAI Conference on artificial intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/11903
Girshick R, Donahue J, Darrell T et al. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580-587 https://doi.org/10.1109/CVPR.2014.81
Girshick R (2015) Fast r-cnn In: Proceedings of the IEEE international conference on computer vision pp 1440-1448 https://doi.org/10.1109/ICCV.2015.169
Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
He K, Gkioxari G, Dollar P, Girshick R (2017) Proceedings of the IEEE International Conference on Computer Vision (ICCV) pp 2961-2969
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection, 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6154-6162, https://doi.org/10.1109/CVPR.2018.00644.
Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779-788 https://doi.org/10.1109/CVPR.2016.91
Redmon J and Farhadi A (2017) YOLO9000: better, Faster, stronger, 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6517-6525, https://doi.org/10.1109/CVPR.2017.690
Redmon J and Farhadi A (2018) Yolov3: an incremental improvement arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection arXiv preprint arXiv:2004.10934
Cai Y, Li H, Yuan G, Niu W, Li Y, Tang X, Ren B, and Wang Y (2020) YOLObile: real-time object detection on mobile devices via compression-compilation co-design
Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp 13039-13048
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021 arXiv 2021, arXiv:2107.08430
Simonyan K and Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, in International Conference on Learning Representations
Lin M, Chen Q, and Yan S (2014) Network in network In Proceedings of the IEEE International Conference on Learning Representations
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization, In Proceedings of the IEEE international conference on learning representations. 2015: 1–15

Download references

Funding

This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, the Central Government Guides the Local Science and Technology Development Special Fund 2021JH6/10500129, Innovative Talents Support Program of Liaoning Provincial Universities under LR2020047.

Author information

Authors and Affiliations

Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China
Fei Wang, Chen Li, Chuan-wen Liu & Zhen Zeng
College of Information Science and Engineering, Northeastern University, Shenyang, China
Ke Xu & Jin-xiu Wu

Authors

Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuan-wen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jin-xiu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This article does not contain any studies with human participants performed by any of the authors.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publication

The authors declare that they consent to publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Li, C., Liu, Cw. et al. An approach based on 1D fully convolutional network for continuous sign language recognition and labeling. Neural Comput & Applic 34, 17921–17935 (2022). https://doi.org/10.1007/s00521-022-07415-x

Download citation

Received: 17 July 2021
Accepted: 09 May 2022
Published: 07 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00521-022-07415-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An approach based on 1D fully convolutional network for continuous sign language recognition and labeling

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Traffic sign recognition based on deep learning

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent to publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An approach based on 1D fully convolutional network for continuous sign language recognition and labeling

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

Traffic sign recognition based on deep learning

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent to publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation