Abstract
Slavic languages pose a big challenge for researchers dealing with speech technology. They exhibit a large degree of inflection, namely declension of nouns, pronouns and adjectives, and conjugation of verbs. This has a large impact on the size of lexical inventories in these languages, and significantly complicates the design of text-to-speech and, in particular, speech-to-text systems. In the paper, we demonstrate some of the typical features of the Slavic languages and show how they can be handled in the development of practical speech processing systems. We present our solutions we applied in the design of voice dictation and broadcast speech transcription systems developed for Czech. Furthermore, we demonstrate how these systems can be converted to another similar Slavic language, in our case Slovak. All the presented systems operate in real time with very large vocabularies (350K words in Czech, 170K words in Slovak) and some of them have been already deployed in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gauvain, J.L., Lamel, L., Adda, G., Jardino, M.: The LIMSI 1998 HUB-4E Transcription System. In: Proc. of the DARPA Broadcast News Workshop, Herndon, pp. 99–104 (1999)
Os, E., Boves, L., Lamel, L., Baggia, P.: Overview of the ARISE Project. In: Proceedings of Eurospeech 1999, Budapest, pp. 1527–1530 (1999)
Tan, Z.-H., Lindberg, B. (eds.): Automatic speech recognition on mobile devices and over communication networks. Springer, London (2008)
Tronconi, A., Billi, M.: New technologies for physically disabled individuals. European Transactions on Telecommunications (6), 633–640 (2008)
Hajic, J.: Disambiguation of Rich Inflection-Computational Morphology of Czech. Karolinum Charles University Press, Prague (2004)
Nejedlova, D., Nouza, J.: Building of a Vocabulary for the Automatic Voice-Dictation System. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 301–308. Springer, Heidelberg (2003)
Nouza, J., Zdansky, J., David, P., Cerva, P., Kolorenc, J., Nejedlova, D.: Fully Automated System for Czech Spoken Broadcast Transcription with Very Large (300K+) Lexicon. In: Proc. of Interspeech 2005, Lisbon (September 2005)
Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited Vocabulary Speech Recognition with Morph Language Models Applied to Finnish. Computer Speech & Language 20(4), 515–541 (2006)
Byrne, W., Hajic, J., Ircing, P., Krbec, P., Psutka, J.: Morpheme Based Language Models for Speech Recognition of Czech. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 139–162. Springer, Heidelberg (2000)
Kolorenc, J., Nouza, J., Cerva, P.: Multi-words in the Czech TV/radio News Transcription system. In: Proc. of Specom 2006 conference, St. Petersburg, pp. 70–74 (2006)
Nouza, J., Psutka, J., Uhlir, J.: Phonetic Alphabet for Speech Recognition of Czech. Radioengineering 6(4), 16–20 (1997)
Cerva, P., Nouza, J.: Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 203–210. Springer, Heidelberg (2005)
Nouza, J.: Strategies for developing a real-time continuous speech recognition system for czech language. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 189–196. Springer, Heidelberg (2002)
Nouza, J., Drabkova, J.: Combining Lexical and Morphological knowledge in language model for Inflectional (Czech) Language. In: Proc. of 6th Int. Conference on Spoken Language Processing (ICSLP 2002), Denver, September 2002, pp. 705–708 (2002)
Nouza, J., Zdansky, J., Cerva, P., Kolorenc, J.: Continual On-line Monitoring of Czech Spoken Broadcast Programs. In: Proc. of 7th International Conference on Spoken Language Processing (ICSLP 2006), Pittsburgh, September 2006, pp. 1650–1653 (2006)
Nouza, J.: Discrete and Fluent Voice Dictation in Czech Language. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 273–280. Springer, Heidelberg (2005)
Cerva, P., Nouza, J.: Design and Development of Voice Controlled Aids for Motor-Handicapped Persons. In: Proc. of Interspeech, Antwerp, pp. 2521–2524 (2007)
Nouza, J., Zdansky, J., Cerva, P., Kolorenc, J.: A system for information retrieval from large records of czech spoken data. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 401–408. Springer, Heidelberg (2006)
Chaloupka, J.: Visual Speech Segmentation and Speaker Recognition for Transcription of TV News. In: Proc. of Interspeech 2006, Denver, September 2006, pp. 1284–1287 (2006)
Callejas, Z., Nouza, J., Cerva, P., López-Cózar, R.: Cost-efficient cross-lingual adaptation of a speech recognition system. In: Advances in Intelligent and Soft Computing. Springer, Heidelberg (2009)
Ivanecky, J.: Automatic speech transcription and segmentation. PhD thesis, Kosice (December 2003) (in Slovak)
Nouza, J., Silovsky, J., Zdansky, J., Cerva, P., Kroul, M., Chaloupka, J.: Czech-to-Slovak Adapted Broadcast News Transcription System. In: Proc. of Interspeech 2008, Brisbane, September 2008, pp. 2683–2686 (2008)
Rotovnik, T., Sepesy Maucec, M., Kacic, Z.: Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Communication 49(6), 437–452 (2007)
Pleva, M., Cizmar, A., Juhár, J., Ondas, J., Michal, M.: Towards Slovak Broadcast News Automatic Recording and Transcribing Service. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 158–168. Springer, Heidelberg (2008)
Korzinek, D., Brocki, L.: Grammar Based Automatic Speech Recognition System for the Polish Language. In: Recent Advances in Mechatronics. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nouza, J., Zdansky, J., Cerva, P., Silovsky, J. (2010). Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak). In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds) Development of Multimodal Interfaces: Active Listening and Synchrony. Lecture Notes in Computer Science, vol 5967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12397-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-12397-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12396-2
Online ISBN: 978-3-642-12397-9
eBook Packages: Computer ScienceComputer Science (R0)