Abstract
In this paper we introduce a methodology leading to the extension of speakers’ database used in the process of automatic transcription of spoken documents stored in the largest Czech Radio audio archive. We address the issue of the conversion of spoken speech to written texts – the automatic detection of speakers and their names. We work with a subset of the archive that consists of 8,020 hours of broadcasting news and 58,914,179 words within the years 1968–2011. We observed the occurrence of thousands of speakers’ names during the period and therefore it is necessary to use their automatic or semi-automatic identification. Another investigated issue leading to the extension of speakers’ database is the co-occurrence of a speaker’s name in a specific phrase in the text transcription linked with the speaker’s change in the audio recording.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nouza, J., et al.: Making Czech Historical Radio Archive Accessible and Searchable for Wide Public. Journal of Multimedia 7(2012), 159–169 (2012)
Cerva, P., Silovsky, J., Zdansky, J., Nouza, J., Seps, L.: Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives. Speech Communication 55(10), 1033–1046 (2013)
Seps, L.: NanoTrans – Editor for orthographic and phonetic transcriptions. In: 36th International Conference on Tel. and Signal Processing (TSP), pp. 479–483 (2013)
Kuchařová, M., Škodová, S., Šeps, L., Lábus, V., Nouza, J., Boháč, M.: On the quantitative and qualitative speech changes of the Czech radio broadcasts news within years 1969-2005. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 360–368. Springer, Heidelberg (2013)
Soltys, O.: Verba dicendi a metajazyková informace. Ústav pro jazyk český, Praha (1983)
Hirschova, M.: Česká verba dicendi v performativním užití: Příspěvek ke zkoumání komunikativních funkcí výpovědi. FF UPOL, Olomouc (1988)
Lopatkova, M., Zabokrtsky, Z., Kettnerova, V.: Valenční slovník českých sloves. Karolinum, Praha (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kuchařová, M., Škodová, S., Šeps, L., Boháč, M. (2014). Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)