The Use of Several Language Models and Its Impact on Word Insertion Penalty in LVCSR

Donaj, Gregor; Kačič, Zdravko

doi:10.1007/978-3-319-01931-4_47

Gregor Donaj²² &
Zdravko Kačič²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1217 Accesses
1 Citations

Abstract

This paper investigates the influence of hypothesis length in N-best list rescoring. It is theoretically explained why language models prefer shorter hypotheses. This bias impacts on the word insertion penalty used in continuous speech recognition. The theoretical findings are confirmed by experiments. Parameter optimization performed on the Slovene Broadcast News database showed why optimal word insertion penalties tend be greater when two language models are used in speech recognition. This paper also presents a two-pass speech recognition algorithm. Two types of language models were used, a standard trigram word-based language model and a trigram model of morpho-syntactic description tags. A relative decrease of 2.02 % in word error rate was achieved after parameter optimization. Statistical tests were performed to confirm the significance of the word error rate decrease.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arhar, Š., Gorjanc, V.: Korpus FidaPLUS: Nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo 52, 95–110 (2007)
Google Scholar
Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech and Language 16, 89–114 (2002)
Article Google Scholar
Bilmes, J.A., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 4–6 (2005)
Google Scholar
Bisani, M., Ney, H.: Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation. In: Proceedings of ICASSP 2004, Montreal, pp. I:409–I:412 (2004)
Google Scholar
Grčar, M., Krek, S., Dobrovoljc, K.: Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Zbornik Osme Konference Jezikovne Tehnologije, Ljubljana, pp. 89–94 (2012)
Google Scholar
Huet, S., Gravier, G., Sébillot, P.: Morphosyntactic Resources for Automatic Speech Recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech (2008)
Google Scholar
JOS Project, http://nl.ijs.si/jos/index-en.html
Multext-East Home Page, http://nl.ijs.si/ME/
Nejedlová, D.: Comparative Study on Bigram Language Models for Spoken Czech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 197–204. Springer, Heidelberg (2002)
Chapter Google Scholar
Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: IEEE International Conference on Acoustics, Speech and Signal Processing 1998, Nagoya, pp. I:181–I:184 (1998)
Google Scholar
Riezler, S., Maxwell III, J.T.: On Some Pitfails in Automatic Evaluation and Significance testing for MT. In: Proc. of ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, Ann Arbor, Michigan (2005)
Google Scholar
Sepesy Maučec, M., Rotovnik, T., Zemljak, M.: Modelling Highly Inflective Slovenian Language. International Journal of Speech Technology 6, 245–257 (2003)
Article Google Scholar
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at Sixteen: Update and Outlook. In: Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, Hawaii (2011)
Google Scholar
Žgank, A., Verdonik, D., Zögling Markuš, A., Kačič, Z.: BNSI Slovenian Broadcast News Database – speech and text corpus. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, 2000, Maribor, Slovenia
Gregor Donaj & Zdravko Kačič

Authors

Gregor Donaj
View author publications
You can also search for this author in PubMed Google Scholar
Zdravko Kačič
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Donaj, G., Kačič, Z. (2013). The Use of Several Language Models and Its Impact on Word Insertion Penalty in LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics