Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian

Varjokallio, Matti; Kurimo, Mikko; Virpioja, Sami

doi:10.1007/978-3-319-45925-7_11

Matti Varjokallio¹⁵,
Mikko Kurimo¹⁵ &
Sami Virpioja¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

544 Accesses
1 Citations

Abstract

We study class n-gram models for very large vocabulary speech recognition of Finnish and Estonian. The models are trained with vocabulary sizes of several millions of words using automatically derived classes. To evaluate the models on Finnish and an Estonian broadcast news speech recognition task, we modify Aalto University’s LVCSR decoder to operate with the class n-grams and very large vocabularies. Linear interpolation of a standard n-gram model and a class n-gram model provides relative perplexity improvements of 21.3 % for Finnish and 12.8 % for Estonian over the n-gram model. The relative improvements in word error rates are 5.5 % for Finnish and 7.4 % for Estonian. We also compare our word-based models to a state-of-the-art unlimited vocabulary recognizer utilizing subword n-gram models, and show that the very large vocabulary word-based models can perform equally well or better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aalto University: AaltoASR (2014). http://github.com/aalto-speech/AaltoASR/
Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Comput. Speech Lang. 16(1), 89–114 (2002)
Article Google Scholar
Botros, R., Irie, K., Sundermeyer, M., Ney, H.: On efficient training of word classes and their application to recurrent neural network language models. In: Proceedings of the INTERSPEECH, pp. 1443–1447, Dresden, Germany (2015)
Google Scholar
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–470 (1992)
Google Scholar
Brychcín, T., Konopik, M.: Morphological based language models for inflectional languages. In: The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Prague, Czech Republic (2011)
Google Scholar
Chen, S.F., Goodman, J.T.: An empirical study of smoothing techniques for language modeling. Technical report, TR-10-98. Computer Science Group, Harvard University (1998)
Google Scholar
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning. MPL 2002, vol. 6, pp. 21–30 (2002)
Google Scholar
Creutz, M., Stolcke, A., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Trans. Speech Lang. Process. 5(1), 1–29 (2007)
Article Google Scholar
Deligne, S., Bimbot, F.: Inference of variable-length linguistic and acoustic units by multigrams. Speech Commun. 23(3), 223–241 (1997)
Article Google Scholar
Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput. Speech Lang. 20(4), 515–541 (2006)
Article Google Scholar
Hirsimäki, T., Kurimo, M.: Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th Nordic Signal Processing Symposium (Norsig 2004), pp. 320–323, Espoo, Finland (2004)
Google Scholar
Hirsimäki, T., Kurimo, M.: Analysing recognition errors in unlimited-vocabulary speech recognition. In: Proceedings of the HLT-NAACL, pp. 193–196 (2009)
Google Scholar
Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(4), 724–732 (2009)
Article Google Scholar
Iskra, D.J., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kießling, A.: SPEECON - speech databases for consumer devices: database specification and validation. In: Proceedings of Third International Conference on Language Resources and Evaluation (LREC 2002), Canary Islands, Spain, May 2002
Google Scholar
Kneser, R., Ney, H.: Forming word classes by statistical clustering for statistical language modelling. In: Proceedings of the First International Conference on Quantitative Linguistics (QUALICO), pp. 221–226, Trier, Germany (1991)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–184 (1995)
Google Scholar
Kurimo, M., Enarvi, S., Tilk, O., Varjokallio, M., Mansikkaniemi, A., Alumäe, T.: Modeling under-resourced languages for speech recognition. Lang. Res. Eval. 1–27 (2015)
Google Scholar
Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Commun. 24, 19–37 (1998)
Article Google Scholar
Meister, E., Meister, L., Metsvahi, R.: New speech corpora at IoC. In: XXVII Fonetiikan, 2012 – Phonetics Symposium 2012, pp. 30–33 (2012)
Google Scholar
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite state transducers. In: Benesty, J., Sondhi, M., Huang, Y. (eds.) Handbook on Speech Processing and Speech Communication, pp. 559–584. Springer, Heidelberg (2008)
Chapter Google Scholar
Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)
Article Google Scholar
Niesler, T., Whittaker, E., Woodland, P.: Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: Proceedings of the ICASSP, Seattle, USA (1998)
Google Scholar
Niesler, T., Woodland, P.: Variable-length category n-gram language models. Comput. Speech Lang. 13, 99–124 (1999)
Article Google Scholar
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)
Article Google Scholar
Pirinen, T.A.: Omorfi - free and open source morphological lexical database for Finnish. In: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA, Vilnius, Lithuania (2015)
Google Scholar
Pylkkönen, J.: An efficient one-pass decoder for Finnish large vocabulary continuous speech recognition. In: Proceedings of the 2nd Baltic Confrence on Human Language Technologies (2005)
Google Scholar
Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning Kneser-Ney smoothed n-gram models. IEEE Trans. Speech, Audio Lang. Process. 15(5), 1617–1624 (2007)
Article Google Scholar
Silfverberg, M., Ruokolainen, T., Lindén, K., Kurimo, M.: FinnPos: an open-source morphological tagging and lemmatization toolkit for Finnish. Lang. Resour. Eval. 1–16 (2015)
Google Scholar
Sixtus, A., Ney, H.: From within-word model search to across-word model search in large vocabulary continuous speech recognition. Comput. Speech Lang. 16(2), 245–271 (2002)
Article Google Scholar
Soltau, H., Saon, G.: Dynamic network decoding revisited. In: IEEE Automatic Speech Recognition and Understanding Workshop, pp. 276–281 (2009)
Google Scholar
Tarjan, B., Fegyó, T., Mihajlik, P.: A bilingual study on the prediction of morph-based improvement. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages SLTU, St. Petersburg, Russia (2014)
Google Scholar
The Department of General Linguistics, University of Helsinki; The University of Eastern Finland; CSC - IT Center for Science Ltd
Google Scholar
Vaic̆iūnas, A.: Statistical language models of Lithuanian and their application to very large vocabulary speech recognition. Summary of Doctoral dissertation. Vytautas Magnus University, Kaunas (2006)
Google Scholar
Vaic̆iūnas, A., Kaminskas, V.: Statistical language models of Lithuanian based on word clustering and morphological decomposition. Inform. (Lith. Acad. Sci.) 15, 565–580 (2004)
Google Scholar
Varjokallio, M., Kurimo, M.: A word-level token-passing decoder for subword n-gram LVCSR. In: Proceedings of the IEEE Workshop on Spoken Language Technology, South Lake Tahoe, USA(2014)
Google Scholar
Varjokallio, M., Kurimo, M., Virpioja, S.: Learning a subword vocabulary based on unigram likelihood. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic (2013)
Google Scholar
Whittaker, E., Woodland, P.: Efficient class-based language modelling for very large vocabularies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, USA (2001)
Google Scholar
Whittaker, E., Woodland, P.: Language modelling for Russian and English using words and classes. Comput. Speech Lang. 17, 87–104 (2003)
Article Google Scholar
Young, S.J., Russell, N.H., Thornton, J.H.S.: Token passing: a simple conceptual model for connected speech recognition system. Technical report, Cambridge University Engineering Department (1989)
Google Scholar

Download references

Acknowledgments

This work was supported by the Academy of Finland with the grant 251170. Aalto Science-IT project provided computational resources for the work.

Author information

Authors and Affiliations

Department of Signal Processing and Acoustics, School of Electrical Engineering, Aalto University, Espoo, Finland
Matti Varjokallio & Mikko Kurimo
Department of Computer Science, School of Science, Aalto University, Espoo, Finland
Sami Virpioja

Authors

Matti Varjokallio
View author publications
You can also search for this author in PubMed Google Scholar
Mikko Kurimo
View author publications
You can also search for this author in PubMed Google Scholar
Sami Virpioja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matti Varjokallio .

Editor information

Editors and Affiliations

University of West Bohemia , Plzen, Czech Republic
Pavel Král
Rovira i Virgili University , Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Varjokallio, M., Kurimo, M., Virpioja, S. (2016). Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-45925-7_11
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics