Simsiam Network Based Self-supervised Model for Sign Language Recognition

Kothadiya, Deep R.; Bhatt, Chintan M.; Rida, Imad

doi:10.1007/978-3-031-46338-9_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1941))

Included in the following conference series:

International Conference on Intelligent Systems and Pattern Recognition

145 Accesses

Abstract

To make a more accurate and robust deep learning model, more labeled data is required. Unfortunately, in many areas, it’s very difficult to manage properly labeled data. Sign language recognition is one of the challenging areas of computer vision, to make a successful deep learning model to recognize sign gestures in real-time, a huge amount of labeled data is needed. Authors have proposed a self-supervised learning approach to address this problem. The proposed architecture used Resnet50 v1 backbone-based simsiam encoder network to learn the similarity between two different images of the same class. Calculated cosine similarity passes to MLP head for further classification. The proposed study uses Indian and American Sign Language detests for simulation. The proposed methodology successfully achieve 74.59% of accuracy. Authors have also demonstrated the impact of other self-supervised deep learning models for sign language recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th international conference on Machine learning, pp. 792–799 (2008)
Google Scholar
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26 (2013)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Kothadiya, D., Chaudhari, A., Macwan, R., Patel, K., Bhatt, C.: The convergence of deep learning and computer vision: Smart city applications and research challenges. In: 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021), pp. 14–22, Atlantis Press (2021)
Google Scholar
Kothadiya, D., Bhatt, C., Sapariya, K., Patel, K., Gil-González, A.-B., Corchado, J. M.: Deepsign: sign language detection and recognition using deep learning. Electronics 11(11) (2022)
Google Scholar
de L’Epée, C.-M.: Institution des sourds et muets, par la voie des signes méthodiques: ouvrage qui contient le project d’une langue universelle, par l’entremise des signes naturels assujettis à une méthode, vol. 1. Chez Nyon l’ainé, 1776
Google Scholar
Li, P., et al.: SelfDoc: self-supervised document representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5652–5660 (2021)
Google Scholar
Preciado-Grijalva, A., Wehbe, B., Firvida, M.B., Valdenegro-Toro, M.: Self-supervised learning for sonar image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1499–1508 (2022)
Google Scholar
Gao, Y., Sun, X., Liu, C.: A general self-supervised framework for remote sensing image classification. Remote Sensing 14(19), 4824 (2022)
Article Google Scholar
Cheng, G., Han, J., Lu, X.: Remote sensing image scene classification: benchmark and state of the art. Proc. IEEE 105(10), 1865–1883 (2017)
Article Google Scholar
Xia, G.-S., et al.: Aid: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55(7), 3965–3981 (2017)
Article Google Scholar
Zou, Q., Ni, L., Zhang, T., Wang, Q.: Deep learning based feature selection for remote sensing scene classification. IEEE Geosci. Remote Sens. Lett. 12(11), 2321–2325 (2015)
Article Google Scholar
Liu, B., Gao, K., Yu, A., Ding, L., Qiu, C., Li, J.: ES2FL: ensemble self-supervised feature learning for small sample classification of hyperspectral images. Remote Sensing 14(17), 4236 (2022)
Article Google Scholar
Song, L., Feng, Z., Yang, S., Zhang, X., Jiao, L.: Self-supervised assisted semi-supervised residual network for hyperspectral image classification. Remote Sensing 14(13), 2997 (2022)
Article Google Scholar
Zhao, Y., Liu, J., Yang, J., Wu, Z.: Remote sensing image scene classification via self-supervised learning and knowledge distillation. Remote Sensing 14(19), 4813 (2022)
Article Google Scholar
Patrona, F., Mademlis, I., Pitas, I.: Gesture recognition by self-supervised moving interest point completion for CNN-LSTMs. In: 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), pp. 1–5, IEEE (2022)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)
Wu, Y., Huang, T.S.: Self-supervised learning for visual tracking and recognition of human hand. In: AAAI/IAAI, pp. 243–248 (2000)
Google Scholar
Dietz, A., Pösch, A., Reithmeier, E.: Hand hygiene monitoring based on segmentation of interacting hands with convolutional networks. In: Medical Imaging 2018: Imaging Informatics for Healthcare, Research, and Applications, vol. 10579, pp. 273–278, SPIE (2018)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607, PMLR (2020)
Google Scholar
Caramalau, R., Bhattarai, B., Stoyanov, D., Kim, T.-K.: MoBYv2AL: self-supervised active learning for image classification. arXiv preprint arXiv:2301.01531 (2023)
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? Adv. Neural. Inf. Process. Syst. 33, 6827–6839 (2020)
Google Scholar
Grill, J.-B.,et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer vision, pp. 9650–9660 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

U & P U. Patel Department of Computer Engineering, Chandubhai S. Patel Institute of Technology (CSPIT), Faculty of Technology (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, India
Deep R. Kothadiya
Department of Computer Science and Engineering, School of Engineering and Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Chintan M. Bhatt
Laboratoire Biomécanique et Bioingénierie UMR 7338, Centre de Recherches de Royallieu, Université de Technologie de Compiègne, Compiègne, France
Imad Rida

Authors

Deep R. Kothadiya
View author publications
You can also search for this author in PubMed Google Scholar
Chintan M. Bhatt
View author publications
You can also search for this author in PubMed Google Scholar
Imad Rida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chintan M. Bhatt .

Editor information

Editors and Affiliations

Larbi Tebessi University, Tebessa, Algeria
Akram Bennour
Sharjah University, Sharjah, United Arab Emirates
Ahmed Bouridane
University of Toulouse, Toulouse, France
Lotfi Chaari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kothadiya, D.R., Bhatt, C.M., Rida, I. (2024). Simsiam Network Based Self-supervised Model for Sign Language Recognition. In: Bennour, A., Bouridane, A., Chaari, L. (eds) Intelligent Systems and Pattern Recognition. ISPR 2023. Communications in Computer and Information Science, vol 1941. Springer, Cham. https://doi.org/10.1007/978-3-031-46338-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-46338-9_1
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46337-2
Online ISBN: 978-3-031-46338-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simsiam Network Based Self-supervised Model for Sign Language Recognition