Abstract
This article presents an original system for live TV subtitling using respeaking and automatic speech recognition. Unlike several commercially available live subtitling solutions, the technology presented in this article comprises a speech recognition system specifically designed for live subtitling, realizing the full potential of state-of-the-art speech technology. The enhancements implemented in our remote live subtitling system architecture are described and accompanied by real-world parameters obtained during several years of deployment at the public service broadcaster in the Czech Republic. This article also presents our four-phase respeaker training system and some new techniques related to the whole life cycle of live subtitles, such as a method for automatic live subtitle retiming or a technique for live subtitle delay elimination. This article can serve as an inspiration for how to deal with live subtitling, especially in minor languages.




Similar content being viewed by others
Notes
The accuracy of live subtitles produced by respeaking is assessed using the formula (N - E - R) / N * 100, where N is the total number of words in live subtitles, and E and R represent edition and recognition errors, respectively. Three different severity of errors are considered. A NER value of 100 indicates that the content was subtitled entirely correctly. Subtitles with an accuracy rate over 98% are considered acceptable with the NER model. However, a human analysis of the NER results is determinative.
References
Braunschweiler N, Gales MJ, Buchholz S (2010) Lightly supervised recognition for automatic alignment of large coherent speech recordings. in INTERSPEECH
de Castro M, Carrero D, Puente L, Ruiz B (2011) Real-time subtitle synchronization in live television programs. In 2011 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)
Dragon NaturallySpeaking (2018) Available: https://www.nuance.com/dragon.html.
Evans MJ (2003) WHP 065. British Broadcasting Corporation
FAB (2018). Available: https://www.fab-online.com/eng/subtitling/production/subtlive.htm
Hrúz M, Pražák A, Bušta M (2018) Multimodal name recognition in live TV subtitling. in INTERSPEECH
IBM Desktop ViaVoice (2018) Available: https://en.wikipedia.org/wiki/IBM_ViaVoice
Lehečka J, Pražák A (2018) Online LDA-Based Language Model Adaptation. in TSD 2018: Text, Speech and Dialogue
Levin K, Ponomareva I, Bulusheva A, Chernykh G, Medennikov I, Merkin N, Prudnikov A, Tomashenko NA (2014) Automated closed captioning for Russian live broadcasting. in INTERSPEECH
Ofcom (2017) Ofcom’s code on television access services. Ofcom
Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlíček P, Qian Y, Schwarz P, Silovský J, Stemmer G, Veselý K (2011) The Kaldi speech recognition toolkit. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding
Pražák A, Loose Z, Trmal J, Psutka JV, Psutka J (2012) Novel approach to live captioning through re-speaking: Tailoring speech recognition to re-speaker’s needs. in INTERSPEECH
Pražák A, Psutka JV, Hoidekr J, Kanis J, Müller L, Psutka J (2006) Automatic online subtitling of the Czech parliament meetings. in TSD 2006: Text, Speech and Dialogue
Psutka JV, Pražák A, Psutka J, Radová V (2014) Captioning of live TV commentaries from the Olympic Games in Sochi: Some interesting insights. in TSD 2014: Text, Speech and Dialogue
Romero-Fresco P (2009) More haste less speed: Edited versus verbatim respoken subtitles. VIAL - Vigo International Journal of Applied Linguistics 6:109–133
Romero-Fresco P (2015) The reception of subtitles for the deaf and hard of hearing in Europe. Peter Lang, Berlin
Romero-Fresco P (2016) Accessing communication: The quality of live subtitles in the UK. Language & Communication 49:56–69
Romero-Fresco P, Pérez M (2015) Accuracy rate in live subtitling: The NER model. in Audiovisual Translation in a Global Context, Palgrave Macmillan, pp. 28–50.
SpeechTech MegaWord (2018) Available: https://www.speechtech.cz/en/speechtech-megaword-en/
Stan A, Bell P, King S (2012) A grapheme-based method for automatic alignment of speech and text data. In 2012 IEEE Spoken Language Technology Workshop (SLT)
Stenograph L.L.C. (2018) Available: http://www.stenograph.com/
Švec J, Lehečka J, Ircing P, Skorkovská L, Pražák A, Vavruška J, Stanislav P, Hoidekr J (2014) General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Language Resources and Evaluation 48(2):227–248
Van Waes L, Leijten M, Remael A (2011) Live subtitling with speech recognition: Causes and consequences of revisions in the production process. Third International symposium on live subtitling: Exploring new avenues and new contexts, Antwerpen
Vaněk J, Trmal J, Psutka JV, Psutka J (2012) Optimized acoustic likelihoods computation for NVIDIA and ATI/AMD graphics processors. IEEE Transactions on Audio, Speech, and Language Processing 20(6):1818–1828
Vaněk J, Zelinka J, Soutner D, Psutka J (2017) A Regularization Post Layer: An Additional Way how to Make Deep Neural Networks Robust. in SLSP
Velotype (2018) Available: http://www.velotype.com/
Ware T, Simpson M (2016) WHP 318. British Broadcasting Corporation
WINCAPS Q-LIVE (2018) Available: https://subtitling.com/products/subtitle-create/create/q-live/
Acknowledgments
This work was supported by European structural and investment funds (ESIF) (No. CZ.02.1.01/0.0/0.0/17_048/0007267).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pražák, A., Loose, Z., Psutka, J.V. et al. Live TV subtitling through respeaking with remote cutting-edge technology. Multimed Tools Appl 79, 1203–1220 (2020). https://doi.org/10.1007/s11042-019-08235-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08235-3