Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News

Kuchařová, Michaela; Škodová, Svatava; Šeps, Ladislav; Boháč, Marek

doi:10.1007/978-3-319-10816-2_50

Michaela Kuchařová²¹,
Svatava Škodová²²,
Ladislav Šeps²¹ &
…
Marek Boháč²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1507 Accesses

Abstract

In this paper we introduce a methodology leading to the extension of speakers’ database used in the process of automatic transcription of spoken documents stored in the largest Czech Radio audio archive. We address the issue of the conversion of spoken speech to written texts – the automatic detection of speakers and their names. We work with a subset of the archive that consists of 8,020 hours of broadcasting news and 58,914,179 words within the years 1968–2011. We observed the occurrence of thousands of speakers’ names during the period and therefore it is necessary to use their automatic or semi-automatic identification. Another investigated issue leading to the extension of speakers’ database is the co-occurrence of a speaker’s name in a specific phrase in the text transcription linked with the speaker’s change in the audio recording.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Nouza, J., et al.: Making Czech Historical Radio Archive Accessible and Searchable for Wide Public. Journal of Multimedia 7(2012), 159–169 (2012)
Google Scholar
Cerva, P., Silovsky, J., Zdansky, J., Nouza, J., Seps, L.: Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives. Speech Communication 55(10), 1033–1046 (2013)
Article Google Scholar
Seps, L.: NanoTrans – Editor for orthographic and phonetic transcriptions. In: 36th International Conference on Tel. and Signal Processing (TSP), pp. 479–483 (2013)
Google Scholar
Kuchařová, M., Škodová, S., Šeps, L., Lábus, V., Nouza, J., Boháč, M.: On the quantitative and qualitative speech changes of the Czech radio broadcasts news within years 1969-2005. In: Habernal, I. (ed.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 360–368. Springer, Heidelberg (2013)
Google Scholar
Soltys, O.: Verba dicendi a metajazyková informace. Ústav pro jazyk český, Praha (1983)
Google Scholar
Hirschova, M.: Česká verba dicendi v performativním užití: Příspěvek ke zkoumání komunikativních funkcí výpovědi. FF UPOL, Olomouc (1988)
Google Scholar
Lopatkova, M., Zabokrtsky, Z., Kettnerova, V.: Valenční slovník českých sloves. Karolinum, Praha (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology and Electronics, Technical University of Liberec, Studentskáa 2, 461 17, Liberec, Czech Republic
Michaela Kuchařová, Ladislav Šeps & Marek Boháč
Department of the Czech Language and Literature, Technical University of Liberec, Studentská 2, 461 17, Liberec, Czech Republic
Svatava Škodová

Authors

Michaela Kuchařová
View author publications
You can also search for this author in PubMed Google Scholar
Svatava Škodová
View author publications
You can also search for this author in PubMed Google Scholar
Ladislav Šeps
View author publications
You can also search for this author in PubMed Google Scholar
Marek Boháč
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuchařová, M., Škodová, S., Šeps, L., Boháč, M. (2014). Study on Phrases Used for Semi-automatic Text-Based Speakers Names Extraction in the Czech Radio Broadcasts News. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics