Video Captioning Based on Sign Language Using YOLOV8 Model

Vidhyasagar, B. S.; Lakshmanan, An Sakthi; Abishek, M. K.; Kalimuthu, Sivakumar

doi:10.1007/978-3-031-45878-1_21

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 683))

Included in the following conference series:

IFIP International Internet of Things Conference

638 Accesses
5 Citations

Abstract

One of the fastest-growing research areas is the recognition of sign language. In this field, many novel techniques have lately been created. People who are deaf-dumb primarily communicate using sign language. Real-time sign language is essential for people who cannot hear or speak (the dumb and the deaf). Hand gestures are one of the non-verbal communication methods used in sign language. People must be aware of these people's language because it is their only means of communication. In this work, we suggest creating and implementing a model to offer transcripts of the sign language that disabled individuals use during a live meeting or video conference. The dataset utilized in this study is downloaded from the Roboflow website and used for training and testing the data. Transfer Learning is a key idea in this situation since a trained model is utilized to identify the hand signals. The YOLOv8 model, created by Ultralytics, is employed for this purpose and instantly translates the letters of the alphabet (A-Z) into their corresponding texts. In our method, the 26 ASL signs are recognized by first extracting the essential components of each sign from the real-time input video, which is then fed into the Yolo-v8 deep learning model to identify the sign. The output will be matched to the signs contained in the neural network and classified into the appropriate signs based on a comparison between the features retrieved and the original signs present in the database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dynamic Text Generation from Sign Language—Review

Moroccan Sign Language Video Recognition with Deep Learning

Speech Recognition and Transformation in Sign Language Using CNN

References

Mehta, A., Solanki, K., Trupti Rathod, T.: Automatic translate real-time voice to sign language conversion for deaf and dumb people. Int. J. Eng. Res. Technol. (IJERT) ICRADL – 2021 9(5), 174–177 (2021)
Google Scholar
Rani, R.S., Rumana, R., Prema, R.: A review paper on sign language recognition for the deaf and dumb. Int. J. Eng. Res. Technol. (IJERT) 10(10), (2021)
Google Scholar
Khallikkunaisa, Kulsoom A.A., Chandan Y.P, Fathima Farheen, F., Halima, N.: Real time sign language recognition and translation to text for vocally and hearing impaired people. Int. J. Eng. Res. Technol. (IJERT) IETE 8(11) (2020)
Google Scholar
Kodandaram, S.R., Kumar, N., Gl, S.: Sign language recognition. Turkish J. Comput. Math. Educ. (TURCOMAT). 12, 994–1009 (2021)
Google Scholar
Muralidharan, N. T. , R. R. S., R. M. R., S. N. M., H. M. E.: Modelling of sign language smart glove based on bit equivalent implementation using flex sensor. In: 2022 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET), Chennai, India, pp. 99–104 (2022). https://doi.org/10.1109/WiSPNET54241.2022.9767137
Singh, A.K., John, B.P., Subramanian, S.R.V., Kumar, A.S., Nair, B.B.: A low-cost wearable Indian sign language interpretation system. In: 2016 International Conference on Robotics and Automation for Humanitarian Applications (RAHA), Amritapuri, India, , pp. 1–6, (2016) https://doi.org/10.1109/RAHA.2016.7931873
Kartik, P.V.S.M.S., Sumanth, K.B.V.N.S., Ram, V.N.V.S., Prakash, P.: Sign language to text conversion using deep learning. In: Ranganathan, G., Chen, J., Rocha, Á. (eds) Inventive Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol 145. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7345-3_18
Islam, S., Dash, A., Seum, A., et al.: Exploring video captioning techniques: a comprehensive survey on deep learning methods. SN COMPUT. SCI. 2, 120 (2021)
Article Google Scholar
Tamilselvan, K.S., Balakumar, P., Rajalakshmi, B., Roshini, C., Suthagar, S.: Translation of sign language for deaf and dumb people. Int. J. Recent Technol. Eng. 8, 2277–3878 (2020). https://doi.org/10.35940/ijrte.E6555.018520
Juju, R.: Video captioning and sign language interpreting (2022)
Google Scholar
earn2Sign: Sign Language Recognition and Translation using Human Keypoint Estimation and Transformer Model - September 09, 2020 - Institute for Natural Language Processing University of Stuttgart Pfaffenwaldring 5 bD-70569 Stuttgart
Google Scholar
Vidhyasagar, B.S., Raja, J., Marudhamuthu, Krishnamurthy.: A novel oppositional chaotic flower pollination optimization algorithm for automatic tuning of hadoop configuration parameters. Big Data. 8, 218–234 (2020). https://doi.org/10.1089/big.2019.0111
Cihan Camgöz, N., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 10020–10030 (2020), https://doi.org/10.1109/CVPR42600.2020.01004
Bantupalli, K., Xie, Y.: American sign language recognition using deep learning and computer vision. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA (2018)
Google Scholar
Suri, K., Gupta, R.: Convolutional neural network array for sign language recognition using wearable signal processing and integrated networks. In: 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India (2019)
Google Scholar
Byeongkeun, K., Tripathi, S., Nguyen, T.Q.: Real- time sign language fingerspelling recognition using convolutional neural networks from depth map. arXiv preprint https://arxiv.org/abs/1509.03001 (2015)
Anand, M.S., Kumar, N.M., Kumaresan, A.: An efficient framework for indian sign language recognition using wavelet transform. Circuits Syst. 7, 1874- 1883 (2016)
Google Scholar
Kumud, T., Baranwal, N., Nandi, G.C.: Continuous dynamic indian sign language gesture recognition with invariant backgrounds. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2211–2216 (2015)
Google Scholar
Manikandan, A.B.R.K.: AAyush patidar, and pallav walia, hand gesture detection and conversion to speech and text. Int J. Pure Appl Math. 120(6), 1347–1362 (2018)
MathSciNet Google Scholar
Dutta, K.K., Swamy, S.A.: Double Handed Indian Sign Language to Speech and Text. In: 2015 Third International Conference on Image Information Processing (ICIIP), pp. 374–377 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India
B. S. Vidhyasagar, An Sakthi Lakshmanan & M. K. Abishek
Faculty of Computer Science and Information Technology (FOCSIT), University Putra Malaysia, Seri Kembangan, Malaysia
Sivakumar Kalimuthu

Authors

B. S. Vidhyasagar
View author publications
You can also search for this author in PubMed Google Scholar
An Sakthi Lakshmanan
View author publications
You can also search for this author in PubMed Google Scholar
M. K. Abishek
View author publications
You can also search for this author in PubMed Google Scholar
Sivakumar Kalimuthu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. S. Vidhyasagar .

Editor information

Editors and Affiliations

Khalifa University, Abu Dhabi, United Arab Emirates
Deepak Puthal
University of North Texas, Denton, TX, USA
Saraju Mohanty
University of Missouri at Kansas City, Kansas City, MO, USA
Baek-Young Choi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vidhyasagar, B.S., Lakshmanan, A.S., Abishek, M.K., Kalimuthu, S. (2024). Video Captioning Based on Sign Language Using YOLOV8 Model. In: Puthal, D., Mohanty, S., Choi, BY. (eds) Internet of Things. Advances in Information and Communication Technology. IFIPIoT 2023. IFIP Advances in Information and Communication Technology, vol 683. Springer, Cham. https://doi.org/10.1007/978-3-031-45878-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-45878-1_21
Published: 26 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45877-4
Online ISBN: 978-3-031-45878-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)