Abstract
This paper investigates the influence of hypothesis length in N-best list rescoring. It is theoretically explained why language models prefer shorter hypotheses. This bias impacts on the word insertion penalty used in continuous speech recognition. The theoretical findings are confirmed by experiments. Parameter optimization performed on the Slovene Broadcast News database showed why optimal word insertion penalties tend be greater when two language models are used in speech recognition. This paper also presents a two-pass speech recognition algorithm. Two types of language models were used, a standard trigram word-based language model and a trigram model of morpho-syntactic description tags. A relative decrease of 2.02 % in word error rate was achieved after parameter optimization. Statistical tests were performed to confirm the significance of the word error rate decrease.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arhar, Š., Gorjanc, V.: Korpus FidaPLUS: Nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo 52, 95–110 (2007)
Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech and Language 16, 89–114 (2002)
Bilmes, J.A., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 4–6 (2005)
Bisani, M., Ney, H.: Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation. In: Proceedings of ICASSP 2004, Montreal, pp. I:409–I:412 (2004)
Grčar, M., Krek, S., Dobrovoljc, K.: Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Zbornik Osme Konference Jezikovne Tehnologije, Ljubljana, pp. 89–94 (2012)
Huet, S., Gravier, G., Sébillot, P.: Morphosyntactic Resources for Automatic Speech Recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech (2008)
JOS Project, http://nl.ijs.si/jos/index-en.html
Multext-East Home Page, http://nl.ijs.si/ME/
Nejedlová, D.: Comparative Study on Bigram Language Models for Spoken Czech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 197–204. Springer, Heidelberg (2002)
Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: IEEE International Conference on Acoustics, Speech and Signal Processing 1998, Nagoya, pp. I:181–I:184 (1998)
Riezler, S., Maxwell III, J.T.: On Some Pitfails in Automatic Evaluation and Significance testing for MT. In: Proc. of ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, Ann Arbor, Michigan (2005)
Sepesy Maučec, M., Rotovnik, T., Zemljak, M.: Modelling Highly Inflective Slovenian Language. International Journal of Speech Technology 6, 245–257 (2003)
Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at Sixteen: Update and Outlook. In: Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, Hawaii (2011)
Žgank, A., Verdonik, D., Zögling Markuš, A., Kačič, Z.: BNSI Slovenian Broadcast News Database – speech and text corpus. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Donaj, G., Kačič, Z. (2013). The Use of Several Language Models and Its Impact on Word Insertion Penalty in LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)