Automatic Transcription of Polish Radio and Television Broadcast Audio

Koržinek, Danijel; Marasek, Krzysztof; Brocki, Łukasz

doi:10.1007/978-3-642-35647-6_29

Danijel Koržinek⁶,
Krzysztof Marasek⁶ &
Łukasz Brocki⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

939 Accesses

Abstract

This paper describes a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of television and radio broadcast audio in Polish. This work is one of the first attempts of speech recognition of broadcast audio in Polish. The system uses a hybrid, connectionist recognizer based on a recurrent neural network architecture. The training is based on an extensive set of manually transcribed and verified recordings of television and radio shows. This is further boosted by a large collection of textual data available from online sources, mostly up-to-date news articles. The paper describes and evaluates some of the key components of the architecture. The system is also compared to a conventional HMM-based architecture. An application of the described system in indexing and search of terms within audio and video transcripts is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pallett, D.S., Fiscus, J.G., Garofolo, J.S., Martin, A., Przybocki, M.: Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures. In: Proceedings of 1999 DARPA Broadcast News Workshop (1998)
Google Scholar
Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast news subtitling system in portuguese. In: Proc. ICASSP 2008, Las Vegas, USA (2008)
Google Scholar
Meinedo, H., Abad, A., Pellegrini, T., Trancoso, I., Neto, J.: The L2F Broadcast News Speech Recognition System. In: Proc. Fala 2010, Vigo, Spain (2010)
Google Scholar
Dharanipragada, S., Franz, M., Roukos, S.: Audio-Indexing For Broadcast News. In: Proceedings of TREC6 (1997)
Google Scholar
Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., Gubrynowicz, R.: User-Centered Design for a Voice Portal. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS (LNAI), vol. 5070, pp. 273–293. Springer, Heidelberg (2009)
Chapter Google Scholar
Koržinek, D., Brocki, Ł.: Grammar based automatic speech recognition system for the Polish language. In: Recent Advances in Mechatronics (2007)
Google Scholar
Brocki, Ł., Koržinek, D., Marasek, K.: Voice Portal for Public City Transportation, Interfejs użytkownika - Kansei w praktyce (2009)
Google Scholar
Szymański, M., Klessa, K., Lange, M., Rapp, B., Grocholewski, S., Demenko, G.: Opracowanie modeli akustycznych na potrzeby systemu rozpoznawania mowy ciągłej z zastosowaniem dużych leksykalnych baz danych, Best Practices - Nauka w obliczu społeczeństwa Cyfrowego, Poznań, pp. 280–289 (2010) ISBN 978-83-7712-032-3
Google Scholar
EU-Bridge - Bridges Across the Language Divide, European Union project in the 7th Framework Programme, under grant agreement n^o287658, http://www.eu-bridge.eu
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania (2006)
Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Google Scholar
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework. Cognitive Computation (2010)
Google Scholar
Kemp, T., Jusek, A.: Modelling unknown words in spontaneous speech. In: ICASSP Conference Proceedings, Acoustics, Speech, and Signal Processing (1996)
Google Scholar
Marasek, K., Gubrynowicz, R.: Multi-level Annotation in SpeeCon Polish Speech Database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)
Chapter Google Scholar
Marasek, K., Gubrynowicz, R.: Design and Data Collection for Spoken Polish Dialogs Database. In: Language Resources and Evaluation Conference 2008, Marrakech Morroco (2008)
Google Scholar
Przepiórkowski, A.: Korpus IPI PAN. Wersja wstępna, The IPI PAN Corpus: Preliminary version. IPI PAN, Warszawa (2004)
Google Scholar
Korpus Rzeczpospolitej, http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Stolcke, A.: SRILM – An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901–904. Denver (2002)
Google Scholar
Young, S.J., Young, S.J.: The HTK Hidden Markov Model Toolkit: Design and Philosophy. Entropic Cambridge Research Laboratory, Ltd. (1994)
Google Scholar
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. In: Proceedings of Interspeech, Brisbane, Australia (2008)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691–1694 (2001)
Google Scholar
Lööf, J., Gollan, C., Ney, H.: Cross-Language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: INTERSPEECH 2009, Brighton, United Kingdom, September 6-10 (2009)
Google Scholar
Michalewicz, Z.: Genetic algorithms + Data Structures = Evolution Programs. Springer (1994)
Google Scholar
Michalewicz, Z., Fogel, D.B.: How to Solve It: Modern Heuristics. Springer (1999)
Google Scholar
Marasek, K.: Large vocabulary continuous speech recognition system for Polish. Archives of Acoustics 28(4), 293–304 (2003)
Google Scholar
W3C, HTML 5 Working Draft March 29, 2012, revision 1.5612, section 4.8.9 The track element (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Polish-Japanese Institute of Information Technology, Warsaw, Poland
Danijel Koržinek, Krzysztof Marasek & Łukasz Brocki

Authors

Danijel Koržinek
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Marasek
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Brocki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Robert Bembenik
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Lukasz Skonieczny
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Henryk Rybinski
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Marzena Kryszkiewicz
, Interdisciplinary Centre for, University of Warsaw, Pawińskiego 5a bl. D, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Koržinek, D., Marasek, K., Brocki, Ł. (2013). Automatic Transcription of Polish Radio and Television Broadcast Audio. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-35647-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics