Automating live and batch subtitling of multimedia contents for several European languages

Álvarez, Aitor; Mendes, Carlos; Raffaelli, Matteo; Luís, Tiago; Paulo, Sérgio; Piccinini, Nicola; Arzelus, Haritz; Neto, João; Aliprandi, Carlo; del Pozo, Arantza

doi:10.1007/s11042-015-2794-z

Automating live and batch subtitling of multimedia contents for several European languages

Published: 11 July 2015

Volume 75, pages 10823–10853, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Aitor Álvarez¹,
Carlos Mendes²,
Matteo Raffaelli³,
Tiago Luís²,
Sérgio Paulo²,
Nicola Piccinini³,
Haritz Arzelus¹,
João Neto²,
Carlo Aliprandi³ &
…
Arantza del Pozo¹

707 Accesses
11 Citations
Explore all metrics

Abstract

The subtitling demand of multimedia content has grown quickly over the last years, especially after the adoption of the new European audiovisual legislation, which forces to make multimedia content accessible to all. As a result, TV channels have been moved to produce subtitles for a high percentage of their broadcast content. Consequently, the market has been seeking subtitling alternatives more productive than the traditional manual process. The large effort dedicated by the research community to the development of Large Vocabulary Continuous Speech Recognition (LVCSR) over the last decade has resulted in significant improvements on multimedia transcription, becoming the most powerful technology for automatic intralingual subtitling. This article contains a detailed description of the live and batch automatic subtitling applications developed by the SAVAS consortium for several European languages based on proprietary LVCSR technology specifically tailored to the subtitling needs, together with results of their quality evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Lightly supervised alignment of subtitles on multi-genre broadcasts

Article Open access 29 May 2018

Oscar Saz, Salil Deena, … Thomas Hain

Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

Notes

References

Abad A (2007) The L2F language recognition system for NIST LRE 2011. In: The 2011 NIST language recognition evaluation (LRE11) workshop
AENOR (2003) Spanish Technical Standards. Standard UNE 153010:2003: Subtitled Through Teletext. http://www.aenor.es
Ajot J, Fiscus J (2009) The rich transcription 2009 speech-to-text (STT) and speaker attributed STT results. Tech. rep., NIST - National Institute of Standards and Technology, Rich Transcription Evaluation Workshop, Melbourne, Florida
Aliprandi C, et al. (2003) RAI voice subtitle: how the lexical approach can improve quality in Speech Recognition Systems. https://www.voiceproject.eu/
Álvarez A, Arzelus H, Etchegoyhen T (2014) Towards customized automatic segmentation of subtitles. In: Advances in speech and language technologies for Iberian languages. Springer, pp 229–238
Batista F, Caseiro D, Mamede N, Trancoso I (2008) Recovering capitalization and punctuation marks for automatic speech recognition: case study for Portuguese broadcast news. Speech Comm 50(10):847–862
Article Google Scholar
Caseiro D, Trancoso I (2006) A specialized on-the-fly algorithm for lexicon and language model composition. IEEE Trans Audio Speech Lang Process 14(4):1281–1291
Article Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Del Pozo A, Aliprandi C, Álvarez A, Mendes C, Neto J, Paulo S, Piccinini N, Raffaelli M (2014) SAVAS: collecting, annotating and sharing audiovisual language resources for automatic subtitling. In: LREC 2014. Proceedings of the 9th international conference on language resources and evaluation
Díaz-Cintas J, Orero P, Remael A (2007) Media for all: subtitling for the deaf, audio description, and sign language, vol 30. Rodopi
eCaption: http://www.ecaption.eu/
FAB - Teletext & Subtitling Systems: FAB Subtitler Live Edition. http://www.fab-online.com/eng/subtitling/production/subtlive.htm
Fiscus J, Garofolo J, Ajot J, Michet M (2006) Rt-06s speaker diarization results and speech activity detection results. In: NIST 2006 spring rich transcrition evaluation workshop, Washington DC
Flanagan M (2009) Recycling texts: human evaluation of example-based machine translation subtitles for DVD. Ph.D. thesis, School of applied language and intercultural studies. Dublin City University, Dublin
Galliano S, Geoffrois E, Gravier G, Bonastre JF, Mostefa D, Choukri K (2006) Corpus description of the ester evaluation campaign for the rich transcription of french broadcast news. In: Proceedings of LREC, vol 6, pp 315–320
Gauvain JL, Lamel L, Adda G (2001) Audio partitioning and transcription for broadcast data indexation. Multimedia Tools Appl 14(2):187–200
Article Google Scholar
Google: Automatic captions in youtube. https://googleblog.blogspot.com/2009/11/automatic-captions-in-youtube.html (2009)
Google: Translate youtube captions. https://www.mattcutts.com/blog/youtube-subtitle-captions/ (2009)
Grass Valeey: Subtitle and Caption Creation. http://www.grassvalley.com/products/subcat-subtitle_and_caption_creation
IBM: Viavoice. http://www-01.ibm.com/software/pervasive/viavoice.html
Koemei: https://www.koemei.com/
Lambourne A, Hewitt J, Lyon C, Warren S (2004) Speech-based real-time subtitling services. Int J Speech Technol 7(4):269–279
Article Google Scholar
Lan ZZ, Bao L, Yu SI, Liu W, Hauptmann AG (2013) Multimedia classification and event detection using double fusion. Multimedia Tools Appl 1–15
Lööf J, Gollan C, Hahn S, Heigold G, Hoffmeister B, Plahl C, Rybach D, Schlüter R, Ney H (2007) The RWTH 2007 TC-STAR evaluation system for european English and Spanish. In: INTERSPEECH, pp 2145–2148
Meignier S, Merlin T (2010) LIUM SpkDiarization: an open source toolkit for diarization. In: CMU SPUD workshop, vol 2010, Dallas
Meinedo H, Abad A, Pellegrini T, Trancoso I, Neto J (2010) The L2F broadcast news speech recognition system. Proc Fala 93–96
Meinedo H, Caseiro D, Neto J, Trancoso I (2003) Audimus.media: a broadcast news speech recognition system for the european portuguese language. In: Computational Processing of the Portuguese Language. Springer, pp 9–17
Meinedo H, Neto JP (2005) A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ann models. In: INTERSPEECH. Citeseer, pp 237–240
Meinedo H, Viveiros M, Neto JP (2008) Evaluation of a live broadcast news subtitling system for portuguese. In: INTERSPEECH, pp 508–511
Microsoft: windows speech recognition. http://www.windows.microsoft.com/en-us/windows7/dictate-text-using-speech-recognition
Neto J, Meinedo H, Viveiros M, Cassaca R, Martins C, Caseiro D (2008) Broadcast news subtitling system in portuguese. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 1561–1564
Nuance: Dragon Naturally Speaking. http://www.nuance.com/index.htm
Obach M, Lehr M, Arruti A (2007) Automatic speech recognition for live TV subtitling for hearing-impaired people. Challenges for Assistive Technology: AAATE 07 20:286
Google Scholar
Sail Labs: http://www.sail-labs.com/
Screen Systems: WinCAPS Q-live for live and news subtitling and captioning. http://www.screensystems.tv/products/wincaps-q-live/
Screen Systems: WINCAPS QU4NTUM subtitling software. http://www.screensystems.tv/products/wincaps-subtitling-software/
Starfish Technologies: Subtitling and closed captioning systems. http://www.starfish.tv/captioning-and-subtitling/
SyncWords: https://www.syncwords.com/
Ubertitles: http://www.ubertitles.com/
Vecsys: http://www.vecsys-technologies.fr/en/
Verbio: https://www.verbio.com/
Vu NT, Imseng D, Povey D, Motlicek P, Schultz T, Bourlard H (2014) Multilingual deep neural network based acoustic modeling for rapid language adaptation. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 7639–7643
Woodland PC (2002) The development of the HTK broadcast news transcription system: an overview. Speech Comm 37(1):47–67
Article MATH Google Scholar
Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 215–219
Zibert J, Mihelic F, Martens JP, Meinedo H, Neto J, Docio L, García-Mateo C, David P, Zdansky J, Pleva M et al (2005) The COST278 broadcast news segmentation and speaker clustering evaluation: overview, methodology, systems, results. In: 6th Annual conference of the international speech communication association (Interspeech 2005); 9th European conference on speech communication and technology (Eurospeech), vol 2005. International Speech Communication Association (ISCA), pp 629–632

Download references

Acknowledgments

This work was funded by the FP7-ICT-2011-SME-DCL project 296371 - SAVAS (Sharing Audiovisual contents for Automatic Subtitling). http://www.fp7-savas.eu

Author information

Authors and Affiliations

Department of Human Speech and Language Technologies, Vicomtech-IK4 Foundation, San Sebastian-Donostia, Spain
Aitor Álvarez, Haritz Arzelus & Arantza del Pozo
VoiceInteraction-Speech Processing Technologies, SA, Lisbon, Portugal
Carlos Mendes, Tiago Luís, Sérgio Paulo & João Neto
Synthema-Language and Semantic Technologies, Pisa, Italy
Matteo Raffaelli, Nicola Piccinini & Carlo Aliprandi

Authors

Aitor Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Raffaelli
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Luís
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio Paulo
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Piccinini
View author publications
You can also search for this author in PubMed Google Scholar
Haritz Arzelus
View author publications
You can also search for this author in PubMed Google Scholar
João Neto
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Aliprandi
View author publications
You can also search for this author in PubMed Google Scholar
Arantza del Pozo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aitor Álvarez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Álvarez, A., Mendes, C., Raffaelli, M. et al. Automating live and batch subtitling of multimedia contents for several European languages. Multimed Tools Appl 75, 10823–10853 (2016). https://doi.org/10.1007/s11042-015-2794-z

Download citation

Received: 22 December 2014
Revised: 22 June 2015
Accepted: 29 June 2015
Published: 11 July 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11042-015-2794-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automating live and batch subtitling of multimedia contents for several European languages

Abstract

Access this article

Similar content being viewed by others

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Lightly supervised alignment of subtitles on multi-genre broadcasts

Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automating live and batch subtitling of multimedia contents for several European languages

Abstract

Access this article

Similar content being viewed by others

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Lightly supervised alignment of subtitles on multi-genre broadcasts

Automatic Transcription and Subtitling of Slovak Multi-genre Audiovisual Recordings

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation