Skip to main content

Automatic Transcription of Polish Radio and Television Broadcast Audio

  • Chapter
Book cover Intelligent Tools for Building a Scientific Information Platform

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

  • 939 Accesses

Abstract

This paper describes a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of television and radio broadcast audio in Polish. This work is one of the first attempts of speech recognition of broadcast audio in Polish. The system uses a hybrid, connectionist recognizer based on a recurrent neural network architecture. The training is based on an extensive set of manually transcribed and verified recordings of television and radio shows. This is further boosted by a large collection of textual data available from online sources, mostly up-to-date news articles. The paper describes and evaluates some of the key components of the architecture. The system is also compared to a conventional HMM-based architecture. An application of the described system in indexing and search of terms within audio and video transcripts is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pallett, D.S., Fiscus, J.G., Garofolo, J.S., Martin, A., Przybocki, M.: Broadcast News Benchmark Test Results: English and Non-English Word Error Rate Performance Measures. In: Proceedings of 1999 DARPA Broadcast News Workshop (1998)

    Google Scholar 

  2. Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast news subtitling system in portuguese. In: Proc. ICASSP 2008, Las Vegas, USA (2008)

    Google Scholar 

  3. Meinedo, H., Abad, A., Pellegrini, T., Trancoso, I., Neto, J.: The L2F Broadcast News Speech Recognition System. In: Proc. Fala 2010, Vigo, Spain (2010)

    Google Scholar 

  4. Dharanipragada, S., Franz, M., Roukos, S.: Audio-Indexing For Broadcast News. In: Proceedings of TREC6 (1997)

    Google Scholar 

  5. Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., Gubrynowicz, R.: User-Centered Design for a Voice Portal. In: Marciniak, M., Mykowiecka, A. (eds.) Aspects of Natural Language Processing. LNCS (LNAI), vol. 5070, pp. 273–293. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Koržinek, D., Brocki, Ł.: Grammar based automatic speech recognition system for the Polish language. In: Recent Advances in Mechatronics (2007)

    Google Scholar 

  7. Brocki, Ł., Koržinek, D., Marasek, K.: Voice Portal for Public City Transportation, Interfejs użytkownika - Kansei w praktyce (2009)

    Google Scholar 

  8. Szymański, M., Klessa, K., Lange, M., Rapp, B., Grocholewski, S., Demenko, G.: Opracowanie modeli akustycznych na potrzeby systemu rozpoznawania mowy ciągłej z zastosowaniem dużych leksykalnych baz danych, Best Practices - Nauka w obliczu społeczeństwa Cyfrowego, Poznań, pp. 280–289 (2010) ISBN 978-83-7712-032-3

    Google Scholar 

  9. EU-Bridge - Bridges Across the Language Divide, European Union project in the 7th Framework Programme, under grant agreement no287658, http://www.eu-bridge.eu

  10. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, Pennsylvania (2006)

    Google Scholar 

  11. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)

    Google Scholar 

  12. Wöllmer, M., Eyben, F., Graves, A., Schuller, B., Rigoll, G.: Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework. Cognitive Computation (2010)

    Google Scholar 

  13. Kemp, T., Jusek, A.: Modelling unknown words in spontaneous speech. In: ICASSP Conference Proceedings, Acoustics, Speech, and Signal Processing (1996)

    Google Scholar 

  14. Marasek, K., Gubrynowicz, R.: Multi-level Annotation in SpeeCon Polish Speech Database. In: Bolc, L., Michalewicz, Z., Nishida, T. (eds.) IMTCI 2004. LNCS (LNAI), vol. 3490, pp. 58–67. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Marasek, K., Gubrynowicz, R.: Design and Data Collection for Spoken Polish Dialogs Database. In: Language Resources and Evaluation Conference 2008, Marrakech Morroco (2008)

    Google Scholar 

  16. Przepiórkowski, A.: Korpus IPI PAN. Wersja wstępna, The IPI PAN Corpus: Preliminary version. IPI PAN, Warszawa (2004)

    Google Scholar 

  17. Korpus Rzeczpospolitej, http://www.cs.put.poznan.pl/dweiss/rzeczpospolita

  18. Stolcke, A.: SRILM – An Extensible Language Modeling Toolkit. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901–904. Denver (2002)

    Google Scholar 

  19. Young, S.J., Young, S.J.: The HTK Hidden Markov Model Toolkit: Design and Philosophy. Entropic Cambridge Research Laboratory, Ltd. (1994)

    Google Scholar 

  20. Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models. In: Proceedings of Interspeech, Brisbane, Australia (2008)

    Google Scholar 

  21. Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691–1694 (2001)

    Google Scholar 

  22. Lööf, J., Gollan, C., Ney, H.: Cross-Language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: INTERSPEECH 2009, Brighton, United Kingdom, September 6-10 (2009)

    Google Scholar 

  23. Michalewicz, Z.: Genetic algorithms + Data Structures = Evolution Programs. Springer (1994)

    Google Scholar 

  24. Michalewicz, Z., Fogel, D.B.: How to Solve It: Modern Heuristics. Springer (1999)

    Google Scholar 

  25. Marasek, K.: Large vocabulary continuous speech recognition system for Polish. Archives of Acoustics 28(4), 293–304 (2003)

    Google Scholar 

  26. W3C, HTML 5 Working Draft March 29, 2012, revision 1.5612, section 4.8.9 The track element (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Koržinek, D., Marasek, K., Brocki, Ł. (2013). Automatic Transcription of Polish Radio and Television Broadcast Audio. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35647-6_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35646-9

  • Online ISBN: 978-3-642-35647-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics