Skip to main content

Real Time Captioning and Notes Making of Online Classes

  • Conference paper
  • First Online:
Computational Intelligence in Data Science (ICCIDS 2022)

Abstract

Due to the COVID-19 pandemic, all activities have turned online. The people who are hard of hearing are facing high difficulty to continue their education. So, the presented system supports them in attending the online classes by providing the real time captions. Additionally, it provides summarized notes for all the students so that they can refer to them before the next class. Google Speech to Text API is used to convert the speech to text, for providing real time captions. Three text summarization models were explored, namely BART, Seq2Seq model and the TextRank algorithm. The BART and the Seq2Seq models require a labelled dataset for training, whereas the TextRank algorithm is an unsupervised learning algorithm. For BART, the dataset is built using semi supervised methods. We evaluated all these models with rouge score evaluation metrics, among these BART proves to be best for our dataset with the following scores of 0.47, 0.30, 0.48 for rouge-1, rouge-2 and rouge-l respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural. Inf. Process. Syst. 33, 2541–2551 (2020)

    Google Scholar 

  2. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3862–3872 (2017)

    Google Scholar 

  3. Chiu, C.-C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. IEEE Trans. Learn. Technol. 5(3), 1206–1214

    Google Scholar 

  4. Zaheer, M.: Big bird: transformers for longer sequences. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 505–516 (2020)

    Google Scholar 

  5. Lewis, M., et al.: Facebook AI ‘BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension’, pp. 7871–7880. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

  6. Yoshioka, T., et al.: Advances in online audio-visual meeting transcription. IEEE Trans. Learn. Technol. 4(2), 1181–1192 (2019)

    Google Scholar 

  7. Mihalcea, R., Tarau, P: TextRank: Bringing Order into Texts, pp. 404–411. Association for Computational Linguistics (2004)

    Google Scholar 

  8. Renals, S., Simpson, M.N., Bell, P.J., Barrett, J.: Just-in-time prepared captioning for live transmissions, pp 27–35. IET Publication (2016)

    Google Scholar 

  9. Ranchal, R., et al.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)

    Article  Google Scholar 

  10. Shadiev, R., Hwang, W.-Y., Chen, N.-S., Huang, Y.-M.: Review of speech-to-text recognition technology for enhancing learning. Educ. Technol. Soc. 17, 65–84 (2014)

    Google Scholar 

  11. Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders, pp. 3730–3740. Association for Computational Linguistics (ACL) (2019)

    Google Scholar 

  12. Yan, Y., et al.: ProphetNet: predicting future n-gram for sequence-to-sequence pre-training. In: Findings of the Association for Computational Linguistics, EMNLP 2020, pp. 2401–2410 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thenmozhi Durairaj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vasantha Raman, A., Sanjay Thiruvengadam, V., Santhosh, J., Durairaj, T. (2022). Real Time Captioning and Notes Making of Online Classes. In: Kalinathan, L., R., P., Kanmani, M., S., M. (eds) Computational Intelligence in Data Science. ICCIDS 2022. IFIP Advances in Information and Communication Technology, vol 654. Springer, Cham. https://doi.org/10.1007/978-3-031-16364-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16364-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16363-0

  • Online ISBN: 978-3-031-16364-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics