Abstract
This paper describes a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of television and radio broadcast audio in Polish. This work is one of the first attempts of speech recognition of broadcast audio in Polish. The system uses a hybrid, connectionist recognizer based on a recurrent neural network architecture. The training is based on an extensive set of manually transcribed and verified recordings of television and radio shows. This is further boosted by a large collection of textual data available from online sources, mostly up-to-date news articles. The paper describes and evaluates some of the key components of the architecture. The system is also compared to a conventional HMM-based architecture. An application of the described system in indexing and search of terms within audio and video transcripts is also described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pallett, D.S., Fiscus, J.G., Garofolo, J.S., Martin, A., Przybocki, M.: Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures. In: Proceedings of 1999 DARPA Broadcast News Workshop (1998)
Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast news subtitling system in portuguese. In: Proc. ICASSP 2008, Las Vegas, USA (2008)
Meinedo, H., Abad, A., Pellegrini, T., Trancoso, I., Neto, J.: The L2F Broadcast News Speech Recognition System. In: Proc. Fala 2010, Vigo, Spain (2010)
Dharanipragada, S., Franz, M., Roukos, S.: Audio-Indexing For Broadcast News. In: Proceedings of TREC6 (1997)
Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., Gubrynowicz, R.: User-Centered Design for a Voice Portal. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS (LNAI), vol. 5070, pp. 273–293. Springer, Heidelberg (2009)
Koržinek, D., Brocki, Ł.: Grammar based automatic speech recognition system for the Polish language. In: Recent Advances in Mechatronics (2007)
Brocki, Ł., Koržinek, D., Marasek, K.: Voice Portal for Public City Transportation, Interfejs użytkownika - Kansei w praktyce (2009)
Szymański, M., Klessa, K., Lange, M., Rapp, B., Grocholewski, S., Demenko, G.: Opracowanie modeli akustycznych na potrzeby systemu rozpoznawania mowy ciągłej z zastosowaniem dużych leksykalnych baz danych, Best Practices - Nauka w obliczu społeczeństwa Cyfrowego, Poznań, pp. 280–289 (2010) ISBN 978-83-7712-032-3
EU-Bridge - Bridges Across the Language Divide, European Union project in the 7th Framework Programme, under grant agreement no287658, http://www.eu-bridge.eu
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania (2006)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework. Cognitive Computation (2010)
Kemp, T., Jusek, A.: Modelling unknown words in spontaneous speech. In: ICASSP Conference Proceedings, Acoustics, Speech, and Signal Processing (1996)
Marasek, K., Gubrynowicz, R.: Multi-level Annotation in SpeeCon Polish Speech Database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)
Marasek, K., Gubrynowicz, R.: Design and Data Collection for Spoken Polish Dialogs Database. In: Language Resources and Evaluation Conference 2008, Marrakech Morroco (2008)
Przepiórkowski, A.: Korpus IPI PAN. Wersja wstępna, The IPI PAN Corpus: Preliminary version. IPI PAN, Warszawa (2004)
Korpus Rzeczpospolitej, http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Stolcke, A.: SRILM – An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901–904. Denver (2002)
Young, S.J., Young, S.J.: The HTK Hidden Markov Model Toolkit: Design and Philosophy. Entropic Cambridge Research Laboratory, Ltd. (1994)
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. In: Proceedings of Interspeech, Brisbane, Australia (2008)
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691–1694 (2001)
Lööf, J., Gollan, C., Ney, H.: Cross-Language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: INTERSPEECH 2009, Brighton, United Kingdom, September 6-10 (2009)
Michalewicz, Z.: Genetic algorithms + Data Structures = Evolution Programs. Springer (1994)
Michalewicz, Z., Fogel, D.B.: How to Solve It: Modern Heuristics. Springer (1999)
Marasek, K.: Large vocabulary continuous speech recognition system for Polish. Archives of Acoustics 28(4), 293–304 (2003)
W3C, HTML 5 Working Draft March 29, 2012, revision 1.5612, section 4.8.9 The track element (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Koržinek, D., Marasek, K., Brocki, Ł. (2013). Automatic Transcription of Polish Radio and Television Broadcast Audio. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-35647-6_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)