Abstract
Due to the COVID-19 pandemic, all activities have turned online. The people who are hard of hearing are facing high difficulty to continue their education. So, the presented system supports them in attending the online classes by providing the real time captions. Additionally, it provides summarized notes for all the students so that they can refer to them before the next class. Google Speech to Text API is used to convert the speech to text, for providing real time captions. Three text summarization models were explored, namely BART, Seq2Seq model and the TextRank algorithm. The BART and the Seq2Seq models require a labelled dataset for training, whereas the TextRank algorithm is an unsupervised learning algorithm. For BART, the dataset is built using semi supervised methods. We evaluated all these models with rouge score evaluation metrics, among these BART proves to be best for our dataset with the following scores of 0.47, 0.30, 0.48 for rouge-1, rouge-2 and rouge-l respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural. Inf. Process. Syst. 33, 2541–2551 (2020)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3862–3872 (2017)
Chiu, C.-C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. IEEE Trans. Learn. Technol. 5(3), 1206–1214
Zaheer, M.: Big bird: transformers for longer sequences. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 505–516 (2020)
Lewis, M., et al.: Facebook AI ‘BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension’, pp. 7871–7880. Association for Computational Linguistics (ACL) (2019)
Yoshioka, T., et al.: Advances in online audio-visual meeting transcription. IEEE Trans. Learn. Technol. 4(2), 1181–1192 (2019)
Mihalcea, R., Tarau, P: TextRank: Bringing Order into Texts, pp. 404–411. Association for Computational Linguistics (2004)
Renals, S., Simpson, M.N., Bell, P.J., Barrett, J.: Just-in-time prepared captioning for live transmissions, pp 27–35. IET Publication (2016)
Ranchal, R., et al.: Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Trans. Learn. Technol. 6(4), 299–311 (2013)
Shadiev, R., Hwang, W.-Y., Chen, N.-S., Huang, Y.-M.: Review of speech-to-text recognition technology for enhancing learning. Educ. Technol. Soc. 17, 65–84 (2014)
Liu, Y., Lapata, M.: Text Summarization with Pretrained Encoders, pp. 3730–3740. Association for Computational Linguistics (ACL) (2019)
Yan, Y., et al.: ProphetNet: predicting future n-gram for sequence-to-sequence pre-training. In: Findings of the Association for Computational Linguistics, EMNLP 2020, pp. 2401–2410 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Vasantha Raman, A., Sanjay Thiruvengadam, V., Santhosh, J., Durairaj, T. (2022). Real Time Captioning and Notes Making of Online Classes. In: Kalinathan, L., R., P., Kanmani, M., S., M. (eds) Computational Intelligence in Data Science. ICCIDS 2022. IFIP Advances in Information and Communication Technology, vol 654. Springer, Cham. https://doi.org/10.1007/978-3-031-16364-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-16364-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16363-0
Online ISBN: 978-3-031-16364-7
eBook Packages: Computer ScienceComputer Science (R0)