Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text

Oparin, Ilya; Lamel, Lori; Gauvain, Jean-Luc

doi:10.1007/978-3-642-14770-8_31

Ilya Oparin²²,
Lori Lamel²² &
Jean-Luc Gauvain²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

International Conference on Natural Language Processing

1229 Accesses

Abstract

In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system. The experimental setup is that of the GALE Phase 4 evaluation. This setup is characterized by a large amount of available language model training data (over 3.2 billion segmented words). A conventional unpruned 4-gram language model with a vocabulary of 56K words serves as a baseline that is challenging to improve upon. However moderate perplexity and CER improvements over this model were obtained with a random forest language model. Different random forest training strategies were explored so as to attain the maximal gain in performance and Forest of Random Forest language modeling scheme is introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Automatic Speech Recognition with Dialect-Specific Language Models

A Comparative Study of Pretrained Language Models on Thai Social Text Categorization

LIFA: Language identification from audio with LPCC-G features

Article 14 December 2023

References

Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: A Tree-Based Statistical Language Model for Natural Language Speech Recognition. CSL 37, 1001–1008 (1989)
Google Scholar
Xu, P., Jelinek, F.: Random Forests in Language Modeling. In: Proc. of EMNLP 2004, Barcelona, pp. 325–332 (2004)
Google Scholar
Navratil, J., Jin, Q., Andrews, W., Campbell, J.P.: Phonetic Speaker Recognition Using Maximum-Likelihood Binary Decision Tree Models. In: Proc. of ICASSP 2003, Hon Kong, pp. 796–799 (2003)
Google Scholar
Xu, P.: Random Forests and the Data Sparseness Problem in Language Modeling. PhD Thesis, Johns Hopkins University, Baltimore (2005)
Google Scholar
Su, Y., Jelinek, F., Khudanpur, S.: Large-Scale Random Forest Language Models for Speech Recognition. In: Proc. of Interspeech 2007, Antwerp, pp. 598–601 (2007)
Google Scholar
Oparin, I.: Language Models for Automatic Speech Recognition of Inflectional Languages. PhD Thesis, University of West Bohemia, Plzen, Czech Republic (2009)
Google Scholar
Oparin, I., Glembek, O., Burget, L., Černocký, J.: Morphological Random Forests for Language Modeling of Inflectional Languages. In: Proc. of IEEE Spoken Language Technology Workshop, SLT 2008, Goa, pp. 189–192 (2008)
Google Scholar
Su, Y.: Knowledge Integration Into Language Models: A Random Forest Approach. PhD thesis, Johns Hopkins University, Baltimore (2009)
Google Scholar
Luo, J., Lamel, L., Gauvain, J.-L.: Modeling Characters Versus Words for Mandarin Speech Recognition. In: Proc. of ICASSP 2009, Taipei, pp. 4325–4328 (2009)
Google Scholar
Wu, D., Fung, P.: Improving Chinese Tokenization with Linguistic Filters on Statistical Lexical Acquisition. In: Proc. of ANLP 1994, pp. 180–181 (1994)
Google Scholar
Sproat, R., Chilin, S., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Computational Linguistics 22(3), 218–228 (1996)
Google Scholar
Gauvain, J.L., Lamel, L., Adda, G.: The LIMSI Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Lamel, L., Messaoudi, A., Gauvain, J.-L.: Improved Acoustic Modeling for Transcribing Arabic Broadcast Data. In: Proc. of Interspeech 2007, Antwerp, pp. 2077–2800 (2007)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. CSL 9(2), 171–185 (1995)
Google Scholar
Hieronymus, J.L., Liu, X., Gales, M.J.F., Woodland, P.C.: Exploiting Chinese Character Models to Improve Speech Recognition Performance. In: Proc. of Interspeech 2010, Brighton, pp. 367–370 (2010)
Google Scholar
Kneser, R., Ney, H.: Improved Backing-off for M-gram Language Modeling. In: Proc. of ICASSP 1995, Detroit, pp. 181–184 (1995)
Google Scholar
Schwenk, H., Gauvain, J.-L.: Training Neural Network Language Models on Very Large Corpora. In: Proc. of EMNLP, Vancouver, pp. 201–208 (2005)
Google Scholar
Su, Y.: Random Forest Language Model Toolkit, http://www.clsp.jhu.edu/~yisu/rflm.html

Download references

Author information

Authors and Affiliations

LIMSI CNRS, Spoken Language Processing Group, B.P. 133, 91403, Orsay cedex, France
Ilya Oparin, Lori Lamel & Jean-Luc Gauvain

Authors

Ilya Oparin
View author publications
You can also search for this author in PubMed Google Scholar
Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavik, Iceland
Hrafn Loftsson
Department of Icelandic, University of Iceland, Árnagardur v/Sudurgötu, 101, Reykjavik, Iceland
Eiríkur Rögnvaldsson
Arni Magnusson Institute for Icelandic Studies, Neshagi 16, 101, Reykjavik, Iceland
Sigrún Helgadóttir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oparin, I., Lamel, L., Gauvain, JL. (2010). Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-14770-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics