Abstract
This paper discusses a robust language model consisting of context-free grammar rules and symbol bigrams, integrated into a single framework. The aim is to remove the sharp grammatical/ungrammatical distinction by exploiting whatever grammar structure is present in every sentence, and hence to achieve continuity of scoring across the language. Both training and scoring are based on a similar principle: summing over paths that span the sentence. In addition to finding the overall score, a procedure for finding the best interpretation is described. Efficiency is maximised by the use of node-based (rather than path-based) procedures.
Preview
Unable to display preview. Download preview PDF.
References
J.H.Wright, G.J.F.Jones and E.N.Wrigley, “Hybrid grammar-bigram speech recognition system with first-order dependence model”, Proc. ICASSP-92, San Francisco, pp I-169–172.
G.J.F.Jones, J.H.Wright and E.N.Wrigley, “The HMM interface with hybrid grammar-bigram language models for speech recognition”, Proc. ICSLP-92, Banff, pp 253–256.
G.J.F.Jones, H.Lloyd-Thomas and J.H.Wright, “Adaptive statistical and grammar models of language for application to speech recognition”, Proc. I.E.E. Colloquium on Grammatical Inference: Theory, Applications and Alternatives, University of Essex, April 1993.
J.H.Wright, G.J.F.Jones and H.LLoyd-Thomas, “A consolidated language model for speech recognition”, Proc. European Conference on Speech Communication and Technology, Berlin, 1993, pp 977–980.
J.H.Wright, G.J.F.Jones and H.Lloyd-Thomas, “A robust language model incorporating a substring parser and extended n-grams”, Proc. ICASSP-94, Adelaide, pp 361–364.
G.J.F.Jones, “Application of Linguistic Models to Continuous Speech Recognition”, PhD Thesis, University of Bristol, 1994.
X.Huang, F.Alleva, H-W Hon, M-Y Hwang, K-F Lee, and R.Rosenfeld, “The SPHINX-II speech recognition system: an overview”, Computer Speech and Language 7, 1993, pp 137–148.
R.Rosenfeld, “A hybrid approach to adaptive statistical language modelling”, Proc. ARPA Workshop on Human Language Technology, Plainsboro, U.S.A., March 1994, pp 76–81.
R.Iyer, M.Ostendorf and J.R.Rohlicek, “Language modelling with sentence-level mixtures”, Proc. ARPA Workshop on Human Language Technology, Plainsboro, U.S.A., March 1994, pp 82–86.
H.Ney, U.Essen and R.Kneser, “On structuring probabilistic dependencies in stochastic language modelling”, Computer Speech and Language, vol 8 (1994), pp 1–38.
M.Meteer and J.R.Rohlicek, “Statistical language modelling combining n-gram and context-free grammars”, Proc. ICASSP-93, Minneapolis, pp II-37–40.
M. Tomita, “Efficient Parsing for Natural Language”, Kluwer Academic Publishers, Boston, 1986.
Y.M.Bishop, S.E.Fienberg and P.W.Holland, “Discrete Multivariate Analysis: Theory and Practice”, M.I.T.Press, 1975.
M.J.Russell, K.M.Ponting, S.M.Peeling, S.R.Browning, J.S.Bridie and R.K.Moore, “The ARM continuous speech recognition system”, Proc. ICASSP-90, Albuquerque.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wright, J.H., Jones, G.J.F., Lloyd-Thomas, H. (1994). Training and application of integrated grammar/bigram language models. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_153
Download citation
DOI: https://doi.org/10.1007/3-540-58473-0_153
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58473-5
Online ISBN: 978-3-540-48985-6
eBook Packages: Springer Book Archive