Skip to main content

The Use of Several Language Models and Its Impact on Word Insertion Penalty in LVCSR

  • Conference paper
Speech and Computer (SPECOM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

Abstract

This paper investigates the influence of hypothesis length in N-best list rescoring. It is theoretically explained why language models prefer shorter hypotheses. This bias impacts on the word insertion penalty used in continuous speech recognition. The theoretical findings are confirmed by experiments. Parameter optimization performed on the Slovene Broadcast News database showed why optimal word insertion penalties tend be greater when two language models are used in speech recognition. This paper also presents a two-pass speech recognition algorithm. Two types of language models were used, a standard trigram word-based language model and a trigram model of morpho-syntactic description tags. A relative decrease of 2.02 % in word error rate was achieved after parameter optimization. Statistical tests were performed to confirm the significance of the word error rate decrease.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arhar, Š., Gorjanc, V.: Korpus FidaPLUS: Nova generacija slovenskega referenčnega korpusa. Jezik in slovstvo 52, 95–110 (2007)

    Google Scholar 

  2. Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Computer Speech and Language 16, 89–114 (2002)

    Article  Google Scholar 

  3. Bilmes, J.A., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 4–6 (2005)

    Google Scholar 

  4. Bisani, M., Ney, H.: Bootstrap Estimates for Confidence Intervals in ASR Performance Evaluation. In: Proceedings of ICASSP 2004, Montreal, pp. I:409–I:412 (2004)

    Google Scholar 

  5. Grčar, M., Krek, S., Dobrovoljc, K.: Obeliks: statistični oblikoskladenjski označevalnik in lematizator za slovenski jezik. In: Zbornik Osme Konference Jezikovne Tehnologije, Ljubljana, pp. 89–94 (2012)

    Google Scholar 

  6. Huet, S., Gravier, G., Sébillot, P.: Morphosyntactic Resources for Automatic Speech Recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech (2008)

    Google Scholar 

  7. JOS Project, http://nl.ijs.si/jos/index-en.html

  8. Multext-East Home Page, http://nl.ijs.si/ME/

  9. Nejedlová, D.: Comparative Study on Bigram Language Models for Spoken Czech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 197–204. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Ogawa, A., Takeda, K., Itakura, F.: Balancing acoustic and linguistic probabilities. In: IEEE International Conference on Acoustics, Speech and Signal Processing 1998, Nagoya, pp. I:181–I:184 (1998)

    Google Scholar 

  11. Riezler, S., Maxwell III, J.T.: On Some Pitfails in Automatic Evaluation and Significance testing for MT. In: Proc. of ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, Ann Arbor, Michigan (2005)

    Google Scholar 

  12. Sepesy Maučec, M., Rotovnik, T., Zemljak, M.: Modelling Highly Inflective Slovenian Language. International Journal of Speech Technology 6, 245–257 (2003)

    Article  Google Scholar 

  13. Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at Sixteen: Update and Outlook. In: Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, Hawaii (2011)

    Google Scholar 

  14. Žgank, A., Verdonik, D., Zögling Markuš, A., Kačič, Z.: BNSI Slovenian Broadcast News Database – speech and text corpus. In: Proceedings of Interspeech 2005 – Eurospeech, Lisbon, pp. 2525–2528 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Donaj, G., Kačič, Z. (2013). The Use of Several Language Models and Its Impact on Word Insertion Penalty in LVCSR. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01931-4_47

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01930-7

  • Online ISBN: 978-3-319-01931-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics