Improving of LVCSR for Causal Czech Using Publicly Available Language Resources

Mizera, Petr; Pollak, Petr

doi:10.1007/978-3-319-66429-3_42

Petr Mizera¹⁶ &
Petr Pollak¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2300 Accesses
2 Citations

Abstract

The paper presents the design of Czech casual speech recognition which is a part of the wider research focused on understanding very informal speaking styles. The study was carried out using the NCCCz corpus and the contributions of optimized acoustic and language models as well as pronunciation lexicon optimization were analyzed. Special attention was paid to the impact of publicly available corpora suitable for language model (LM) creation. Our final DNN-HMM system achieved in the task of casual speech recognition WER of 30–60% depending on LM used. The results of recognition for other speaking styles are presented as well for the comparison purposes. The system was built using KALDI toolkit and created recipes are available for the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project

Phonetic Segmentation Using KALDI and Reduced Pronunciation Detection in Causal Czech Speech

Deep Neural Network Acoustic Model Baseline for Character-Level Transcription of Naturally Spoken Czech Language

References

Cui, J., Ramabhadran, B., Cui, X., Rosenberg, A., Kingsbury, B., Sethy, A.: Recent improvements in neural network acoustic modeling for LVCSR in low resource languages. In: Proceedings of Interspeech 2014: 15th Annual Conference of the International Speech Communication Association, Singapore (2014)
Google Scholar
Seltzer, L.M., Dong, Y., Yongqiang, W.: An investigation of deep neural networks for noise robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, Canada (2013)
Google Scholar
Korvas, M., Plátek, O., Dušek, O., Žilka, L., Jurčíček, F.: Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014)
Google Scholar
Barras, C., Lamel, L., Gauvain, J.L.: Automatic transcription of compressed broadcast audio. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, USA, pp. 265–268 (2001)
Google Scholar
Nouza, J., Ždánský, J., Červa, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Proceedings of 15th IEEE MELECON Conference, La Valleta, Malta, pp. 202–205 (2010)
Google Scholar
Nouza, J., Blavka, K., Bohac, M., Cerva, P., Málek, J.: System for producing subtitles to internet audio-visual documents. In: 38th International Conference on Telecommunications and Signal Processing, TSP 2015, Prague, Czech Republic, pp. 1–5, 9–11 July 2015
Google Scholar
Psutka, J., Psutka, J., Ircing, P., Hoidekr, J.: Recognition of spontaneously pronounced TV ice-hockey commentary. In: Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, Tokyo, pp. 83–86 (2003)
Google Scholar
Lehr, M., Gorman, K., Shafran, I.: Discriminative pronunciation modeling for dialectal speech recognition. In: Proceedings of Interspeech 2014, Singapore, pp. 1458–1462 (2014)
Google Scholar
Nouza, J., Silovský, J.: Adpating lexical and language models for transcription of highly spontaneous spoken Czech. In: Proceedings of Text, Speech, and Dialogue, LNAI, vol. 6231, Brno, Czech Republic, pp. 377–384 (2010)
Google Scholar
Byrne, W., et al.: Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Trans. Speech Audio Process. 12(4), 420–435 (2004)
Article Google Scholar
Ernestus, M., Kočková-Amortová, L., Pollák, P.: The Nijmegen corpus of casual Czech. In: Proceedings of LREC 2014: 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp. 365–370 (2014)
Google Scholar
Torreira, F., Adda-Decker, M., Ernestus, M.: The Nijmegen corpus of casual French. Speech Commun. 52, 201–221 (2010)
Article Google Scholar
Prochazka, V., Pollak, P.: Conversational speech from Nijmegen corpus of casual Czech by general ASR language models. In: Production and Comprehension of Conversational Speech, pp. 34–35 (2011)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Vesely, K., Karafiat, M., Grezl, F.: Convolutive bottleneck network features for lVCSR. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2011
Google Scholar
Pollak, P., Cernocky, J.: Czech SPEECON adult database. Technical report (2004)
Google Scholar
Institute of the Czech National Corpus: SYN2006PUB corpus (2006). http://ucnk.ff.cuni.cz/english/syn2006pub.php
Prochazka, V., Pollak, P., Zdansky, J., Nouza, J.: Performance of Czech speech recognition with language models created from public resources. Radioengineering 20, 1002–1008 (2011)
Google Scholar
Institute of the Czech National Corpus: Corpus oral 2006 and oral 2008 and oral 2013, Institute of the Czech National Corpus FF UK. http://www.korpus.cz
Schuppler, B., Adda-Decker, M., Morales-Cordovilla, J.A.: Pronunciation variation in read and conversational Austrian German. In: Proceedings of Interspeech 2014, Singapore (2014)
Google Scholar
Kolman, A., Pollak, P.: Speech reduction in Czech. In: Proceedings of LabPhone 14, The 14th Conference on Laboratory Phonology, Tokyo, Japan (2014)
Google Scholar
Rajnoha, J., Pollák, P.: Czech spontaneous speech collection and annotation: the database of technical lectures. In: Esposito, A., Vích, R. (eds.) Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions. LNCS, vol. 5641, pp. 377–385. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03320-9_35
Chapter Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
Fousek, P., Pollak, P.: Efficient and reliable measurement and simulation of noisy speech background. In: Proceedings of EUROSPEECH 2003, 8-th European Conference on Speech Communication and Technology, Geneve, Switzerland (2003)
Google Scholar
Borsky, M., Mizera, P., Pollak, P.: Noise and channel normalized cepstral features for far-speech recognition. In: Proceedings of SPECOM 2013, The 15th International Conference on Speech and Computer, Pilsen, Czech Republic (2013)
Google Scholar

Download references

Acknowledgments

The research described in this paper was supported by internal CTU grant SGS17/183/OHK3/3T/13 “Special Applications of Signal Processing”.

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Czech Technical University in Prague, K13131, Technicka 2, 166 27, Praha 6, Czech Republic
Petr Mizera & Petr Pollak

Authors

Petr Mizera
View author publications
You can also search for this author in PubMed Google Scholar
Petr Pollak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Mizera .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mizera, P., Pollak, P. (2017). Improving of LVCSR for Causal Czech Using Publicly Available Language Resources. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_42
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics