Abstract
We study class n-gram models for very large vocabulary speech recognition of Finnish and Estonian. The models are trained with vocabulary sizes of several millions of words using automatically derived classes. To evaluate the models on Finnish and an Estonian broadcast news speech recognition task, we modify Aalto University’s LVCSR decoder to operate with the class n-grams and very large vocabularies. Linear interpolation of a standard n-gram model and a class n-gram model provides relative perplexity improvements of 21.3 % for Finnish and 12.8 % for Estonian over the n-gram model. The relative improvements in word error rates are 5.5 % for Finnish and 7.4 % for Estonian. We also compare our word-based models to a state-of-the-art unlimited vocabulary recognizer utilizing subword n-gram models, and show that the very large vocabulary word-based models can perform equally well or better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aalto University: AaltoASR (2014). http://github.com/aalto-speech/AaltoASR/
Aubert, X.L.: An overview of decoding techniques for large vocabulary continuous speech recognition. Comput. Speech Lang. 16(1), 89–114 (2002)
Botros, R., Irie, K., Sundermeyer, M., Ney, H.: On efficient training of word classes and their application to recurrent neural network language models. In: Proceedings of the INTERSPEECH, pp. 1443–1447, Dresden, Germany (2015)
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–470 (1992)
Brychcín, T., Konopik, M.: Morphological based language models for inflectional languages. In: The 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Prague, Czech Republic (2011)
Chen, S.F., Goodman, J.T.: An empirical study of smoothing techniques for language modeling. Technical report, TR-10-98. Computer Science Group, Harvard University (1998)
Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning. MPL 2002, vol. 6, pp. 21–30 (2002)
Creutz, M., Stolcke, A., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Trans. Speech Lang. Process. 5(1), 1–29 (2007)
Deligne, S., Bimbot, F.: Inference of variable-length linguistic and acoustic units by multigrams. Speech Commun. 23(3), 223–241 (1997)
Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Comput. Speech Lang. 20(4), 515–541 (2006)
Hirsimäki, T., Kurimo, M.: Decoder issues in unlimited Finnish speech recognition. In: Proceedings of the 6th Nordic Signal Processing Symposium (Norsig 2004), pp. 320–323, Espoo, Finland (2004)
Hirsimäki, T., Kurimo, M.: Analysing recognition errors in unlimited-vocabulary speech recognition. In: Proceedings of the HLT-NAACL, pp. 193–196 (2009)
Hirsimäki, T., Pylkkönen, J., Kurimo, M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(4), 724–732 (2009)
Iskra, D.J., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kießling, A.: SPEECON - speech databases for consumer devices: database specification and validation. In: Proceedings of Third International Conference on Language Resources and Evaluation (LREC 2002), Canary Islands, Spain, May 2002
Kneser, R., Ney, H.: Forming word classes by statistical clustering for statistical language modelling. In: Proceedings of the First International Conference on Quantitative Linguistics (QUALICO), pp. 221–226, Trier, Germany (1991)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–184 (1995)
Kurimo, M., Enarvi, S., Tilk, O., Varjokallio, M., Mansikkaniemi, A., Alumäe, T.: Modeling under-resourced languages for speech recognition. Lang. Res. Eval. 1–27 (2015)
Martin, S., Liermann, J., Ney, H.: Algorithms for bigram and trigram word clustering. Speech Commun. 24, 19–37 (1998)
Meister, E., Meister, L., Metsvahi, R.: New speech corpora at IoC. In: XXVII Fonetiikan, 2012 – Phonetics Symposium 2012, pp. 30–33 (2012)
Mohri, M., Pereira, F.C.N., Riley, M.: Speech recognition with weighted finite state transducers. In: Benesty, J., Sondhi, M., Huang, Y. (eds.) Handbook on Speech Processing and Speech Communication, pp. 559–584. Springer, Heidelberg (2008)
Ney, H., Ortmanns, S.: Progress in dynamic programming search for LVCSR. Proc. IEEE 88(8), 1224–1240 (2000)
Niesler, T., Whittaker, E., Woodland, P.: Comparison of part-of-speech and automatically derived category-based language models for speech recognition. In: Proceedings of the ICASSP, Seattle, USA (1998)
Niesler, T., Woodland, P.: Variable-length category n-gram language models. Comput. Speech Lang. 13, 99–124 (1999)
Ortmanns, S., Ney, H.: Look-ahead techniques for fast beam search. Comput. Speech Lang. 14(1), 15–32 (2000)
Pirinen, T.A.: Omorfi - free and open source morphological lexical database for Finnish. In: Proceedings of the 20th Nordic Conference of Computational Linguistics NODALIDA, Vilnius, Lithuania (2015)
Pylkkönen, J.: An efficient one-pass decoder for Finnish large vocabulary continuous speech recognition. In: Proceedings of the 2nd Baltic Confrence on Human Language Technologies (2005)
Siivola, V., Hirsimäki, T., Virpioja, S.: On growing and pruning Kneser-Ney smoothed n-gram models. IEEE Trans. Speech, Audio Lang. Process. 15(5), 1617–1624 (2007)
Silfverberg, M., Ruokolainen, T., Lindén, K., Kurimo, M.: FinnPos: an open-source morphological tagging and lemmatization toolkit for Finnish. Lang. Resour. Eval. 1–16 (2015)
Sixtus, A., Ney, H.: From within-word model search to across-word model search in large vocabulary continuous speech recognition. Comput. Speech Lang. 16(2), 245–271 (2002)
Soltau, H., Saon, G.: Dynamic network decoding revisited. In: IEEE Automatic Speech Recognition and Understanding Workshop, pp. 276–281 (2009)
Tarjan, B., Fegyó, T., Mihajlik, P.: A bilingual study on the prediction of morph-based improvement. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages SLTU, St. Petersburg, Russia (2014)
The Department of General Linguistics, University of Helsinki; The University of Eastern Finland; CSC - IT Center for Science Ltd
Vaic̆iūnas, A.: Statistical language models of Lithuanian and their application to very large vocabulary speech recognition. Summary of Doctoral dissertation. Vytautas Magnus University, Kaunas (2006)
Vaic̆iūnas, A., Kaminskas, V.: Statistical language models of Lithuanian based on word clustering and morphological decomposition. Inform. (Lith. Acad. Sci.) 15, 565–580 (2004)
Varjokallio, M., Kurimo, M.: A word-level token-passing decoder for subword n-gram LVCSR. In: Proceedings of the IEEE Workshop on Spoken Language Technology, South Lake Tahoe, USA(2014)
Varjokallio, M., Kurimo, M., Virpioja, S.: Learning a subword vocabulary based on unigram likelihood. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic (2013)
Whittaker, E., Woodland, P.: Efficient class-based language modelling for very large vocabularies. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, USA (2001)
Whittaker, E., Woodland, P.: Language modelling for Russian and English using words and classes. Comput. Speech Lang. 17, 87–104 (2003)
Young, S.J., Russell, N.H., Thornton, J.H.S.: Token passing: a simple conceptual model for connected speech recognition system. Technical report, Cambridge University Engineering Department (1989)
Acknowledgments
This work was supported by the Academy of Finland with the grant 251170. Aalto Science-IT project provided computational resources for the work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Varjokallio, M., Kurimo, M., Virpioja, S. (2016). Class n-Gram Models for Very Large Vocabulary Speech Recognition of Finnish and Estonian. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-45925-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)