Skip to main content

Using knowledge to improve N-Gram language modelling through the MGGI methodology

  • Session: Interference of Stochastic Models 1
  • Conference paper
  • First Online:
Grammatical Interference: Learning Syntax from Sentences (ICGI 1996)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1147))

Included in the following conference series:

Abstract

The structural limitations of N-Gram models used for Language Modelling are illustrated through several examples. In most cases of interest, these limitations can be easily overcome using (general) regular or finite-state models, without having to resort to more complex, recursive devices. The problem is how to obtain the required finite-state structures from reasonably small amounts of training (positive) sentences of the considered task. Here this problem is approached through a Grammatical Inference technique known as MGGI. This allows us to easily apply a priory knowledge about the type of syntactic constraints that are relevant to the considered task to significantly improve the performance of N-Grams, using similar or smaller amounts of training data. Speech Recognition experiments are presented with results supporting the interest of the proposed approach.

Work partially supported by the Spanish CICYT under grant TIC95-0984-C02-01

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Angluin and C. H. Smith, “Inductive Inference: Theory and Methods”, Computing Surveys, 15, no. 3, pp. 46–62, 1983.

    Article  Google Scholar 

  2. D. Angluin, “Learning regular sets from queries and counter-examples”, Information and Computation, 75, pp. 87–106, 1987.

    Article  Google Scholar 

  3. D. Angluin, “Identifying Languages from Stochastic Examples”, YALEU/DCS/RR-614. 1988.

    Google Scholar 

  4. J. Berstel, “Transduction and Context-Free Languages”, B. G. Teubner Stuggrt, 1979.

    Google Scholar 

  5. R. C. Carrasco, J. Oncina, “Learning Stochastic Regular Grammars by Means of a State Merging Method”, Grammatical Inference and Applications, ICGI-94, pp. 139–152, 1994.

    Google Scholar 

  6. A. Castellanos, I. Galiano, E. Vidal, “Application of OSTIA to Machine Translation Tasks”, Grammatical Inference and Applications, ICGI-94, pp. 93–105, 1994.

    Google Scholar 

  7. J. A. Feldman, G. Lakoff, A. Stolcke and S. Hollbach Weber, “Miniature Language Acquisition: A touchstone for cognitive science International Computer Science Institute”, TR-90-009. 1990.

    Google Scholar 

  8. P. Garcia, E. Vidal, F. Casacuberta, “Local Languages, The successor method, and a step towards a general methodology for the inference of regular grammars”, IEEE Trans. PAMI, vol. 9, no. 6, pp. 841–845, Nov. 1987.

    Google Scholar 

  9. P. Garcia, E. Vidal, “Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition”, IEEE Trans, PAMI., vol. 12, no. 9, pp. 920–925, Sep. 1990.

    Google Scholar 

  10. M. Gold, “Language identification in the limit”, Inf. Control 10, pp. 447–474, 1967.

    Article  Google Scholar 

  11. M. Gold, “Complexity of automaton identification from given data”, Inf. Control 37, pp. 302–320, 1978.

    Article  Google Scholar 

  12. F. Jelinek, “Up from trigrams! The struggle for improved language Models”, EUROSPEECH 91, pp. 1037–1039, 1991.

    Google Scholar 

  13. K. J. Lang, “Random DFAs can be Approximately Learned from Sparse Uniform Examples”, COLT92.

    Google Scholar 

  14. D. Llorens, V. Jimenez, J, A. Sanchez, E. Vidal, H. Rulot, ”ATROS, an Automatically Trainable Continuous-Speech Recognition System for Limited-Domain Tasks”, Preprints of the VI Spanish Symp. of the AERFAI, Cordoba(Spain), 1995.

    Google Scholar 

  15. T. Yu. Medvedev, “On the Class of Events Representable in a Finite Automaton in Sequential Machines-Selected Papers”, ed. E. F. Moor, Addison-Wesley, pp.227–315, 1964.

    Google Scholar 

  16. J. Oncina, P. Garcia, “Inferring Regular Languages in Polynomial Update Time”, In “Pattern Recognition and Image Analysis”, Perez, Sanfeliu, Vidal (eds.), 49–61, World Scientific, 1992.

    Google Scholar 

  17. P. J. Price, “Evaluation of Spoken Language Systems: the ATIS Domain,” Proc. of 3rd DARPA Workshop on Speech and Natural Language, pp. 91–95, Hidden Valley (PA), June 1990.

    Google Scholar 

  18. E. Segarra, “Una Aproximacion Inductiva a la Comprension del Discurso Continuo”, PhD diss. Univ. Politecnica de Valencia. 1993.

    Google Scholar 

  19. A. Stolcke, “Inducing Probabilistic Grammars by Bayesian Model Merging”, Grammatical Inference and Applications, ICGI-94, Carrasco, Oncina (eds.), pp. 106–118, 1994.

    Google Scholar 

  20. E. Vidal, F. Casacuberta, P. Garcia, “Grammatical Inference and Automatic Inference Recognition”, Speech Recognition and Coding; New Advances and Trends, J.Rubio and J.M.Lopez (eds.), Springer-Verlag, 1994.

    Google Scholar 

  21. Y. Zalcstein, “Locally Testable Languages”, JCSS, 6, pp. 151–167, 1972.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Laurent Miclet Colin de la Higuera

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vidal, E., Llorens, D. (1996). Using knowledge to improve N-Gram language modelling through the MGGI methodology. In: Miclet, L., de la Higuera, C. (eds) Grammatical Interference: Learning Syntax from Sentences. ICGI 1996. Lecture Notes in Computer Science, vol 1147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0033353

Download citation

  • DOI: https://doi.org/10.1007/BFb0033353

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61778-5

  • Online ISBN: 978-3-540-70678-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics