Abstract
Sign language is the most important communication method for people with speech impairments, and automatic sign language recognition helps them communicate with normal people without barriers. For portability considerations, the device that integrates surface electromyography (sEMG) sensors and inertial measurement units (IMU) is used to collect and obtain 1D 14-channel sign language data. However, 1D data are not readable by humans. In order to accurately obtain effective sign language to better complete word-level and continuous sign language recognition, synchronized video and a lot of labor costs are needed. In this paper, we propose an approach based on 1D fully convolutional network (FCN) called as SignD-Net, which can be used for labeling and recognition of 1D time series sign language data. SignD-Net compares sign language labeling with object detection and uses YOLO as the basis to assign a bounding box to each predicted object. Using the optimal 1D-CNN model selected by the experiments, continuous sign language labeling and recognition can be realized. With limited data, the model is pre-trained with word-level sign language data and simulated sentence-level data, and at the end of the training, real collected and manually labeled sign language data are used. Through experiments on sign language test data, SignD-Net has been proven to have excellent capabilities, achieving a mean average precision (mAP) of 99.18% on the labeling task, and achieving a sentence-level accuracy of up to 98.74% on the recognition task.
Similar content being viewed by others
Availability of data and materials
Availability of data is temporarily not allowed by the authors.
Code availability
Code availability is temporarily not allowed by the authors.
References
Duarte SBR, Chaveiro N, de Freitas AR et al (2021) Validation of the WHOQOL-Bref instrument in Brazilian sign language (Libras). Qual Life Res 30(1):303–313. https://doi.org/10.1007/s11136-020-02611-5
Perera AG, Law YW, Chahl J (2018) Human pose and path estimation from aerial video using dynamic classifier selection. Cognitive Comput 10:1019–1041. https://doi.org/10.1007/s12559-018-9577-6
Cui R, Liu H, Zhang C (2019) A deep neural framework for continuous sign language recognition by iterative training. IEEE Transactions Multimedia 21(7):1880–1891. https://doi.org/10.1109/TMM.2018.2889563
N. Cihan Camgöz, O. Koller, S. Hadfield, et al. (2020) Sign language transformers: joint end-to-end sign language recognition and translation, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp 10020-10030, https://doi.org/10.1109/CVPR42600.2020.01004.
Mummadi CK, Leo, FPP, Verma, KD et al. (2017) Real-time embedded recognition of sign language alphabet fingerspelling in an IMU-based glove In: Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction, pp 1-6 https://doi.org/10.1145/3134230.3134236
Saleh N, Farghaly M, Elshaaer E et al. (2020) Smart glove-based gestures recognition system for Arabic sign language In: International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), IEEE, pp 303-307 https://doi.org/10.1109/ITCE48509.2020.9047820
Zhou Z, Chen K, Li X et al (2020) Sign-to-speech translation using machine-learning-assisted stretchable sensor arrays. Nat Electron 3:571–578. https://doi.org/10.1038/s41928-020-0428-6
Hu Y, Wong Y, Wei W et al (2018) A novel attention-based hybrid CNN- RNN architecture for sEMG-based gesture recognition. PLOS ONE. https://doi.org/10.1371/journal.pone.0206049
Zhang, Q, Wang, D, Zhao, R et al. (2019) MyoSign: enabling End-to-End Sign Language Recognition with Wearables In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp 650-660 https://doi.org/10.1145/3301275.3302296
Wang F, Li C, Zeng Z et al (2021) Cornerstone network with feature extractor: a metric-based few-shot model for chinese natural sign language. Appl Intell 51:7139–7150. https://doi.org/10.1007/s10489-020-02170-9
Wang Z, Zhao T, Ma J et al., Hear sign language: a real-time end-to-end sign language recognition system IEEE Transactions Mobile Comput https://doi.org/10.1109/TMC.2020.3038303.
Hou J, Li XY, Zhu P et al. (2019) SignSpeaker: a real-time, high-precision smartwatch-based sign language translator In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom ’19) Association for Computing Machinery, Article 24, 1-15 https://doi.org/10.1145/3300061.3300117
Wang F, Zhao S, Zhou X et al (2019) An recognition-verification mechanism for real-time chinese sign language recognition based on multi-information fusion. Sensors 19(11):2495. https://doi.org/10.3390/s19112495
Cui, R, Liu, H , Zhang, C (2017) Recurrent convolutional neural networks for continuous sign language recognition by staged optimization In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7361-7369 https://doi.org/10.1109/CVPR.2017.175
Pu J, Zhou W and Li H (2019) Iterative alignment network for continuous sign language recognition, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4160-4169, https://doi.org/10.1109/CVPR.2019.00429
Koller O, Zargaran S, Ney H et al (2018) Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. Int J Computer V 126(12):1311–1325. https://doi.org/10.1007/s11263-018-1121-3
Zhou H, Zhou W and Li H (2019) Dynamic pseudo label decoding for continuous sign language recognition In: IEEE International Conference on Multimedia and Expo (ICME), pp 1282-1287 https://doi.org/10.1109/ICME.2019.00223
Suri K, Gupta R (2019) Continuous sign language recognition from wearable IMUs using deep capsule networks and game theory. Computers Electr Eng 78:493–503. https://doi.org/10.1016/j.compeleceng.2019.08.006
Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779-788 https://doi.org/10.1109/CVPR.2016.91
Liang Z, Liao S, Hu B (2018) 3D convolutional neural networks for dynamic sign language recognition. Computer J 61(11):1724–1736. https://doi.org/10.1093/comjnl/bxy049
Rao GA, Syamala K, Kishore P et al. (2018) Deep convolutional neural networks for sign language recognition In: 2018 Conference on signal processing and communication engineering systems (SPACES), IEEE, pp 194-197 https://doi.org/10.1109/SPACES.2018.8316344
Li D, Rodriguez C, Yu X et al. (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1459-1469 https://doi.org/10.1109/WACV45572.2020.9093512
Gao L, Li H, Liu Z et al (2021) RNN-transducer based Chinese sign language recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.2020.12.006
Zhou H, Zhou W, Zhou Y and Li H Spatial-temporal multi-cue network for sign language recognition and translation IEEE Transactions Multimedia https://doi.org/10.1109/TMM.2021.3059098
Camgoz NC, Hadfield S, Koller O et al (2018) Neural Sign Language Translation. IEEE/CVF Conference on computer vision and pattern recognition 2018: 7784–7793 https://doi.org/10.1109/CVPR.2018.00812
Venugopalan S, Rohrbach M, Donahue J et al. (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534-4542 https://doi.org/10.1109/ICCV.2015.515
Guo D, Zhou W, Li H et al. (2018) Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI conference on artificial intelligence
Huang, J, Zhou, W, Zhang, Q et al. (2018) Video-based sign language recognition without temporal segmentation In: Thirty-Second AAAI Conference on artificial intelligence. https://ojs.aaai.org/index.php/AAAI/article/view/11903
Girshick R, Donahue J, Darrell T et al. (2014) Rich feature hierarchies for accurate object detection and semantic segmentation In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580-587 https://doi.org/10.1109/CVPR.2014.81
Girshick R (2015) Fast r-cnn In: Proceedings of the IEEE international conference on computer vision pp 1440-1448 https://doi.org/10.1109/ICCV.2015.169
Ren S, He K, Girshick R et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
He K, Gkioxari G, Dollar P, Girshick R (2017) Proceedings of the IEEE International Conference on Computer Vision (ICCV) pp 2961-2969
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection, 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6154-6162, https://doi.org/10.1109/CVPR.2018.00644.
Redmon J, Divvala S, Girshick R et al. (2016) You only look once: unified, real-time object detection In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779-788 https://doi.org/10.1109/CVPR.2016.91
Redmon J and Farhadi A (2017) YOLO9000: better, Faster, stronger, 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 6517-6525, https://doi.org/10.1109/CVPR.2017.690
Redmon J and Farhadi A (2018) Yolov3: an incremental improvement arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection arXiv preprint arXiv:2004.10934
Cai Y, Li H, Yuan G, Niu W, Li Y, Tang X, Ren B, and Wang Y (2020) YOLObile: real-time object detection on mobile devices via compression-compilation co-design
Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp 13039-13048
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021 arXiv 2021, arXiv:2107.08430
Simonyan K and Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, in International Conference on Learning Representations
Lin M, Chen Q, and Yan S (2014) Network in network In Proceedings of the IEEE International Conference on Learning Representations
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization, In Proceedings of the IEEE international conference on learning representations. 2015: 1–15
Funding
This work was supported in part by the Foundation of National Natural Science Foundation of China under Grant 61973065, 52075531, the Fundamental Research Funds for the Central Universities of China under Grant N2104008, the Central Government Guides the Local Science and Technology Development Special Fund 2021JH6/10500129, Innovative Talents Support Program of Liaoning Provincial Universities under LR2020047.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
This article does not contain any studies with human participants performed by any of the authors.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Consent to publication
The authors declare that they consent to publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, F., Li, C., Liu, Cw. et al. An approach based on 1D fully convolutional network for continuous sign language recognition and labeling. Neural Comput & Applic 34, 17921–17935 (2022). https://doi.org/10.1007/s00521-022-07415-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07415-x