Skip to main content
Log in

Live TV subtitling through respeaking with remote cutting-edge technology

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This article presents an original system for live TV subtitling using respeaking and automatic speech recognition. Unlike several commercially available live subtitling solutions, the technology presented in this article comprises a speech recognition system specifically designed for live subtitling, realizing the full potential of state-of-the-art speech technology. The enhancements implemented in our remote live subtitling system architecture are described and accompanied by real-world parameters obtained during several years of deployment at the public service broadcaster in the Czech Republic. This article also presents our four-phase respeaker training system and some new techniques related to the whole life cycle of live subtitles, such as a method for automatic live subtitle retiming or a technique for live subtitle delay elimination. This article can serve as an inspiration for how to deal with live subtitling, especially in minor languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The accuracy of live subtitles produced by respeaking is assessed using the formula (N - E - R) / N * 100, where N is the total number of words in live subtitles, and E and R represent edition and recognition errors, respectively. Three different severity of errors are considered. A NER value of 100 indicates that the content was subtitled entirely correctly. Subtitles with an accuracy rate over 98% are considered acceptable with the NER model. However, a human analysis of the NER results is determinative.

References

  1. Braunschweiler N, Gales MJ, Buchholz S (2010) Lightly supervised recognition for automatic alignment of large coherent speech recordings. in INTERSPEECH

  2. de Castro M, Carrero D, Puente L, Ruiz B (2011) Real-time subtitle synchronization in live television programs. In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

  3. Dragon NaturallySpeaking (2018) Available: https://www.nuance.com/dragon.html.

  4. Evans MJ (2003) WHP 065. British Broadcasting Corporation

  5. FAB (2018). Available: https://www.fab-online.com/eng/subtitling/production/subtlive.htm

  6. Hrúz M, Pražák A, Bušta M (2018) Multimodal name recognition in live TV subtitling. in INTERSPEECH

  7. IBM Desktop ViaVoice (2018) Available: https://en.wikipedia.org/wiki/IBM_ViaVoice

  8. Lehečka J, Pražák A (2018) Online LDA-Based Language Model Adaptation. in TSD 2018: Text, Speech and Dialogue

    Chapter  Google Scholar 

  9. Levin K, Ponomareva I, Bulusheva A, Chernykh G, Medennikov I, Merkin N, Prudnikov A, Tomashenko NA (2014) Automated closed captioning for Russian live broadcasting. in INTERSPEECH

  10. Ofcom (2017) Ofcom’s code on television access services. Ofcom

  11. Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovský J, Stemmer G, Veselý K (2011) The Kaldi speech recognition toolkit. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding

  12. Pražák A, Loose Z, Trmal J, Psutka JV, Psutka J (2012) Novel approach to live captioning through re-speaking: Tailoring speech recognition to re-speaker’s needs. in INTERSPEECH

  13. Pražák A, Psutka JV, Hoidekr J, Kanis J, Müller L, Psutka J (2006) Automatic online subtitling of the Czech parliament meetings. in TSD 2006: Text, Speech and Dialogue

    Chapter  Google Scholar 

  14. Psutka JV, Pražák A, Psutka J, Radová V (2014) Captioning of live TV commentaries from the Olympic Games in Sochi: Some interesting insights. in TSD 2014: Text, Speech and Dialogue

    Chapter  Google Scholar 

  15. Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. VIAL - Vigo International Journal of Applied Linguistics 6:109–133

    Google Scholar 

  16. Romero-Fresco P (2015) The reception of subtitles for the deaf and hard of hearing in Europe. Peter Lang, Berlin

    Google Scholar 

  17. Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the UK. Language & Communication 49:56–69

  18. Romero-Fresco P, Pérez M (2015) Accuracy rate in live subtitling: The NER model. in Audiovisual Translation in a Global Context, Palgrave Macmillan, pp. 28–50.

  19. SpeechTech MegaWord (2018) Available: https://www.speechtech.cz/en/speechtech-megaword-en/

  20. Stan A, Bell P, King S (2012) A grapheme-based method for automatic alignment of speech and text data. In 2012 IEEE Spoken Language Technology Workshop (SLT)

  21. Stenograph L.L.C. (2018) Available: http://www.stenograph.com/

  22. Švec J, Lehečka J, Ircing P, Skorkovská L, Pražák A, Vavruška J, Stanislav P, Hoidekr J (2014) General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Language Resources and Evaluation 48(2):227–248

    Article  Google Scholar 

  23. Van Waes L, Leijten M, Remael A (2011) Live subtitling with speech recognition: Causes and consequences of revisions in the production process. Third International symposium on live subtitling: Exploring new avenues and new contexts, Antwerpen

    Google Scholar 

  24. Vaněk J, Trmal J, Psutka JV, Psutka J (2012) Optimized acoustic likelihoods computation for NVIDIA and ATI/AMD graphics processors. IEEE Transactions on Audio, Speech, and Language Processing 20(6):1818–1828

    Article  Google Scholar 

  25. Vaněk J, Zelinka J, Soutner D, Psutka J (2017) A Regularization Post Layer: An Additional Way how to Make Deep Neural Networks Robust. in SLSP

    Chapter  Google Scholar 

  26. Velotype (2018) Available: http://www.velotype.com/

  27. Ware T, Simpson M (2016) WHP 318. British Broadcasting Corporation

  28. WINCAPS Q-LIVE (2018) Available: https://subtitling.com/products/subtitle-create/create/q-live/

Download references

Acknowledgments

This work was supported by European structural and investment funds (ESIF) (No. CZ.02.1.01/0.0/0.0/17_048/0007267).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aleš Pražák.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pražák, A., Loose, Z., Psutka, J.V. et al. Live TV subtitling through respeaking with remote cutting-edge technology. Multimed Tools Appl 79, 1203–1220 (2020). https://doi.org/10.1007/s11042-019-08235-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08235-3

Keywords

Navigation