Skip to main content
Log in

An approach based on 1D fully convolutional network for continuous sign language recognition and labeling

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Sign language is the most important communication method for people with speech impairments, and automatic sign language recognition helps them communicate with normal people without barriers. For portability considerations, the device that integrates surface electromyography (sEMG) sensors and inertial measurement units (IMU) is used to collect and obtain 1D 14-channel sign language data. However, 1D data are not readable by humans. In order to accurately obtain effective sign language to better complete word-level and continuous sign language recognition, synchronized video and a lot of labor costs are needed. In this paper, we propose an approach based on 1D fully convolutional network (FCN) called as SignD-Net, which can be used for labeling and recognition of 1D time series sign language data. SignD-Net compares sign language labeling with object detection and uses YOLO as the basis to assign a bounding box to each predicted object. Using the optimal 1D-CNN model selected by the experiments, continuous sign language labeling and recognition can be realized. With limited data, the model is pre-trained with word-level sign language data and simulated sentence-level data, and at the end of the training, real collected and manually labeled sign language data are used. Through experiments on sign language test data, SignD-Net has been proven to have excellent capabilities, achieving a mean average precision (mAP) of 99.18% on the labeling task, and achieving a sentence-level accuracy of up to 98.74% on the recognition task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Availability of data and materials

Availability of data is temporarily not allowed by the authors.

Code availability

Code availability is temporarily not allowed by the authors.

References

  1. Duarte SBR, Chaveiro N, de Freitas AR et al (2021) Validation of the WHOQOL-Bref instrument in Brazilian sign language (Libras). Qual Life Res 30(1):303–313. https://doi.org/10.1007/s11136-020-02611-5

    Article  Google Scholar 

  2. Perera AG, Law YW, Chahl J (2018) Human pose and path estimation from aerial video using dynamic classifier selection. Cognitive Comput 10:1019–1041. https://doi.org/10.1007/s12559-018-9577-6

    Article  Google Scholar 

  3. Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions Multimedia 21(7):1880–1891. https://doi.org/10.1109/TMM.2018.2889563

    Article  Google Scholar 

  4. N. Cihan Camgöz, O. Koller, S. Hadfield, et al. (2020) Sign language transformers: joint end-to-end sign language recognition and translation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp 10020-10030, https://doi.org/10.1109/CVPR42600.2020.01004.

  5. Mummadi CK, Leo, FPP, Verma, KD et al. (2017) Real-time embedded recognition of sign language alphabet fingerspelling in an IMU-based glove In: Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction, pp 1-6 https://doi.org/10.1145/3134230.3134236

  6. Saleh N, Farghaly M, Elshaaer E et al. (2020) Smart glove-based gestures recognition system for Arabic sign language In: International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), IEEE, pp 303-307 https://doi.org/10.1109/ITCE48509.2020.9047820

  7. Zhou Z, Chen K, Li X et al (2020) Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat Electron 3:571–578. https://doi.org/10.1038/s41928-020-0428-6

    Article  Google Scholar 

  8. Hu Y, Wong Y, Wei W et al (2018) A novel attention-based hybrid CNN- RNN architecture for sEMG-based gesture recognition. PLOS ONE. https://doi.org/10.1371/journal.pone.0206049

    Article  Google Scholar 

  9. Zhang, Q, Wang, D, Zhao, R et al. (2019) MyoSign: enabling End-to-End Sign Language Recognition with Wearables In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp 650-660 https://doi.org/10.1145/3301275.3302296

  10. Wang F, Li C, Zeng Z et al (2021) Cornerstone network with feature extractor: a metric-based few-shot model for chinese natural sign language. Appl Intell 51:7139–7150. https://doi.org/10.1007/s10489-020-02170-9

    Article  Google Scholar 

  11. Wang Z, Zhao T, Ma J et al., Hear sign language: a real-time end-to-end sign language recognition system IEEE Transactions Mobile Comput https://doi.org/10.1109/TMC.2020.3038303.

  12. Hou J, Li XY, Zhu P et al. (2019) SignSpeaker: a real-time, high-precision smartwatch-based sign language translator In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom ’19) Association for Computing Machinery, Article 24, 1-15 https://doi.org/10.1145/3300061.3300117

  13. Wang F, Zhao S, Zhou X et al (2019) An recognition-verification mechanism for real-time chinese sign language recognition based on multi-information fusion. Sensors 19(11):2495. https://doi.org/10.3390/s19112495

    Article  Google Scholar 

  14. Cui, R, Liu, H , Zhang, C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7361-7369 https://doi.org/10.1109/CVPR.2017.175

  15. Pu J, Zhou W and Li H (2019) Iterative alignment network for continuous sign language recognition, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4160-4169, https://doi.org/10.1109/CVPR.2019.00429

  16. Koller O, Zargaran S, Ney H et al (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Computer V 126(12):1311–1325. https://doi.org/10.1007/s11263-018-1121-3

    Article  Google Scholar 

  17. Zhou H, Zhou W and Li H (2019) Dynamic pseudo label decoding for continuous sign language recognition In: IEEE International Conference on Multimedia and Expo (ICME), pp 1282-1287 https://doi.org/10.1109/ICME.2019.00223

  18. Suri K, Gupta R (2019) Continuous sign language recognition from wearable IMUs using deep capsule networks and game theory. Computers Electr Eng 78:493–503. https://doi.org/10.1016/j.compeleceng.2019.08.006

    Article  Google Scholar 

  19. Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779-788 https://doi.org/10.1109/CVPR.2016.91

  20. Liang Z, Liao S, Hu B (2018) 3D convolutional neural networks for dynamic sign language recognition. Computer J 61(11):1724–1736. https://doi.org/10.1093/comjnl/bxy049

    Article  Google Scholar 

  21. Rao GA, Syamala K, Kishore P et al. (2018) Deep convolutional neural networks for sign language recognition In: 2018 Conference on signal processing and communication engineering systems (SPACES), IEEE, pp 194-197 https://doi.org/10.1109/SPACES.2018.8316344

  22. Li D, Rodriguez C, Yu X et al. (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459-1469 https://doi.org/10.1109/WACV45572.2020.9093512

  23. Gao L, Li H, Liu Z et al (2021) RNN-transducer based Chinese sign language recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.12.006

    Article  Google Scholar 

  24. Zhou H, Zhou W, Zhou Y and Li H Spatial-temporal multi-cue network for sign language recognition and translation IEEE Transactions Multimedia https://doi.org/10.1109/TMM.2021.3059098

  25. Camgoz NC, Hadfield S, Koller O et al (2018) Neural Sign Language Translation. IEEE/CVF Conference on computer vision and pattern recognition 2018: 7784–7793 https://doi.org/10.1109/CVPR.2018.00812

  26. Venugopalan S, Rohrbach M, Donahue J et al. (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534-4542 https://doi.org/10.1109/ICCV.2015.515

  27. Guo D, Zhou W, Li H et al. (2018) Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI conference on artificial intelligence

  28. Huang, J, Zhou, W, Zhang, Q et al. (2018) Video-based sign language recognition without temporal segmentation In: Thirty-Second AAAI Conference on artificial intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/11903

  29. Girshick R, Donahue J, Darrell T et al. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580-587 https://doi.org/10.1109/CVPR.2014.81

  30. Girshick R (2015) Fast r-cnn In: Proceedings of the IEEE international conference on computer vision pp 1440-1448 https://doi.org/10.1109/ICCV.2015.169

  31. Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  32. He K, Gkioxari G, Dollar P, Girshick R (2017) Proceedings of the IEEE International Conference on Computer Vision (ICCV) pp 2961-2969

  33. Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection, 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6154-6162, https://doi.org/10.1109/CVPR.2018.00644.

  34. Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779-788 https://doi.org/10.1109/CVPR.2016.91

  35. Redmon J and Farhadi A (2017) YOLO9000: better, Faster, stronger, 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6517-6525, https://doi.org/10.1109/CVPR.2017.690

  36. Redmon J and Farhadi A (2018) Yolov3: an incremental improvement arXiv preprint arXiv:1804.02767

  37. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection arXiv preprint arXiv:2004.10934

  38. Cai Y, Li H, Yuan G, Niu W, Li Y, Tang X, Ren B, and Wang Y (2020) YOLObile: real-time object detection on mobile devices via compression-compilation co-design

  39. Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp 13039-13048

  40. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021 arXiv 2021, arXiv:2107.08430

  41. Simonyan K and Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, in International Conference on Learning Representations

  42. Lin M, Chen Q, and Yan S (2014) Network in network In Proceedings of the IEEE International Conference on Learning Representations

  43. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization, In Proceedings of the IEEE international conference on learning representations. 2015: 1–15

Download references

Funding

This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, the Central Government Guides the Local Science and Technology Development Special Fund 2021JH6/10500129, Innovative Talents Support Program of Liaoning Provincial Universities under LR2020047.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This article does not contain any studies with human participants performed by any of the authors.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publication

The authors declare that they consent to publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Li, C., Liu, Cw. et al. An approach based on 1D fully convolutional network for continuous sign language recognition and labeling. Neural Comput & Applic 34, 17921–17935 (2022). https://doi.org/10.1007/s00521-022-07415-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07415-x

Keywords

Navigation