Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations

Lehečka, Jan; Švec, Jan

doi:10.1007/978-3-642-40585-3_38

Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations

Jan Lehečka²⁰ &
Jan Švec²⁰

Conference paper

2397 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Abstract

The aim of this paper is to improve speech recognition by enriching language models with automatically detected foreign inclusions from a training text. The enriching is restricted only to foreign, proper-noun inclusions which are typically a dominant part of miss-recognized words. In our suggested approach, character-based n-gram language models are used for detection of foreign, single-word inclusions and for a language identification, and finite state transducers are used to generate foreign pronunciations. Results of this paper show that by enriching language model with English proper nouns found in Czech training text, the recognition of a speech containing English inclusions can be improved by 9.4% relative reduction of WER.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, X., Liang, W.: An N-Gram-and-Wikipedia joint approach to Natural Language Identification. In: 2010 4th International Universal Communication Symposium (IUCS), pp. 332–339 (2010)
Google Scholar
Martins, B., Silva, M.J.: Language identification in web pages. In: Proceedings of the 2005 ACM Symposium on Applied Computing, SAC 2005, pp. 764–768. ACM, New York (2005)
Google Scholar
Zissman, M.A., Singer, E.: Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling. In: 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1994, vol. 1, pp. I/305–I/308 (1994)
Google Scholar
Zissman, M.A.: Language identification using phoneme recognition and phonotactic language modeling. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 5, pp. 3503–3506 (1995)
Google Scholar
Yan, E.Y., Barnard: An approach to automatic language identification based on language-dependent phone recognition. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 5, pp. 3511–3514 (1995)
Google Scholar
Matejka, P., Schwarz, P., Cernocký, J., Chytil, P.: Phonotactic language identification using high quality phoneme recognition. In: INTERSPEECH, pp. 2237–2240 (2005)
Google Scholar
Ahmed, B., Cha, S.H., Tappert, C.: Detection of Foreign Entities in Native Text Using N-gram Based Cumulative Frequency Addition. In: Proceedings of CSIS Research Day. Pace University, New York (2005)
Google Scholar
Hakkinen, J., Tian, J.: N-gram and decision tree based language identification for written words. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001, pp. 335–338 (2001)
Google Scholar
Maison, B., Chen, S., Cohen, P.S.: Pronunciation modeling for names of foreign origin. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, pp. 429–434 (2003)
Google Scholar
Stolcke, A.: Srilm-an extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, pp. 257–286 (November 2002)
Google Scholar
Project Gutenberg, http://www.gutenberg.org
Novak, J., Dixon, P.: Improving WFST-based G2P conversion with alignment constraints and RNNLM N-best rescoring. In: Proceedings of International Conference on Spoken Language Processing Interspeech 2012 (2012)
Google Scholar
Švec, J., Hoidekr, J., Soutner, D., Vavruška, J.: Web text data mining for building large scale language modelling corpus. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 356–363. Springer, Heidelberg (2011)
Chapter Google Scholar
Skorkovská, L., Ircing, P., Pražák, A., Lehečka, J.: Automatic Topic Identification for Large Scale Language Modeling Data Filtering. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 64–71. Springer, Heidelberg (2011)
Chapter Google Scholar
The CMU Pronouncing Dictionary, http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Pražák, A., Loose, Z., Trmal, J., Psutka, J.V., Psutka, J.: Captioning of live TV programs through speech recognition and re-speaking. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 513–519. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jan Lehečka & Jan Švec

Authors

Jan Lehečka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Švec
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lehečka, J., Švec, J. (2013). Improving Speech Recognition by Detecting Foreign Inclusions and Generating Pronunciations. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics