Skip to main content

Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News

  • Conference paper
Text, Speech and Dialogue (TSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

  • 1507 Accesses

Abstract

In this paper we introduce a methodology leading to the extension of speakers’ database used in the process of automatic transcription of spoken documents stored in the largest Czech Radio audio archive. We address the issue of the conversion of spoken speech to written texts – the automatic detection of speakers and their names. We work with a subset of the archive that consists of 8,020 hours of broadcasting news and 58,914,179 words within the years 1968–2011. We observed the occurrence of thousands of speakers’ names during the period and therefore it is necessary to use their automatic or semi-automatic identification. Another investigated issue leading to the extension of speakers’ database is the co-occurrence of a speaker’s name in a specific phrase in the text transcription linked with the speaker’s change in the audio recording.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nouza, J., et al.: Making Czech Historical Radio Archive Accessible and Searchable for Wide Public. Journal of Multimedia 7(2012), 159–169 (2012)

    Google Scholar 

  2. Cerva, P., Silovsky, J., Zdansky, J., Nouza, J., Seps, L.: Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives. Speech Communication 55(10), 1033–1046 (2013)

    Article  Google Scholar 

  3. Seps, L.: NanoTrans – Editor for orthographic and phonetic transcriptions. In: 36th International Conference on Tel. and Signal Processing (TSP), pp. 479–483 (2013)

    Google Scholar 

  4. Kuchařová, M., Škodová, S., Šeps, L., Lábus, V., Nouza, J., Boháč, M.: On the quantitative and qualitative speech changes of the Czech radio broadcasts news within years 1969-2005. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 360–368. Springer, Heidelberg (2013)

    Google Scholar 

  5. Soltys, O.: Verba dicendi a metajazyková informace. Ústav pro jazyk český, Praha (1983)

    Google Scholar 

  6. Hirschova, M.: Česká verba dicendi v performativním užití: Příspěvek ke zkoumání komunikativních funkcí výpovědi. FF UPOL, Olomouc (1988)

    Google Scholar 

  7. Lopatkova, M., Zabokrtsky, Z., Kettnerova, V.: Valenční slovník českých sloves. Karolinum, Praha (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kuchařová, M., Škodová, S., Šeps, L., Boháč, M. (2014). Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_50

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics