Skip to main content

Smoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2652))

Included in the following conference series:

  • 921 Accesses

Abstract

In a previous work, a new probabilistic context-free grammar (PCFG) model for natural language parsing derived from a tree bank corpus has been introduced. The model estimates the probabilities according to a generalized k-grammar scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. However, it suffers from the problem of incomplete coverage. In this paper, we compare several smoothing techniques such as backing-off or interpolation that are used to avoid assigning zero probability to any sentence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charniak, E., Carroll, G.: Context-sensitive statistics for improved grammatical language models. In: Proceedings of the 12th National Conference on Artificial Inteligence, Seattle, WA, pp. 742–747. AAAI Press, Menlo Park (1994)

    Google Scholar 

  2. Johnson, M.: PCFG models of linguistic tree representations. Computational Linguistics 24(4), 613–632 (1998)

    Google Scholar 

  3. Verdú-Mas, J.L., Forcada, M.L., Carrasco, R.C., Calera-Rubio, J.: Tree k-grammar models for natural language modelling and parsing. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 56–63. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Charniak, E.: Treebank grammars. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1031–1036. AAAI Press/MIT Press (1996)

    Google Scholar 

  5. Brown, P.F., Della Pietra, V.J., deSouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)

    Google Scholar 

  6. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Joshi, A., Palmer, M. (eds.) Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, San Francisco, pp. 310–318. Morgan Kaufmann Publishers, San Francisco (1996)

    Google Scholar 

  7. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19, 313–330 (1993)

    Google Scholar 

  8. Nivat, M., Podelski, A.: Minimal ascending and descending tree automata. SIAM Journal on Computing 26(1), 39–58 (1997)

    Article  MathSciNet  Google Scholar 

  9. Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: A tree-based statistical language model for natural language speech recognition. In: Waibel, A., Lee, K.F. (eds.) Readings in Speech Recognition, pp. 507–514. Kaufmann, San Mateo (1990)

    Chapter  Google Scholar 

  10. Stolcke, A.: An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. In: Computational Linguistics, MIT Press for the Association for Computational Linguistics, vol. 21 (1995)

    Google Scholar 

  11. Black, E., Abney, S., Flickinger, D., Gdaniec, C., Grishman, R., Harrison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., Strzalkowski, T.: A procedure for quantitatively comparing the syntatic coverage of english grammars. In: Proc. Speech and Natural Language Workshop 1991, San Mateo, CA, pp. 306–311. Morgan Kauffmann, San Francisco (1991)

    Google Scholar 

  12. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verdú-Mas, J.L., Calera-Rubio, J., Carrasco, R.C. (2003). Smoothing Techniques for Tree-k-Grammar-Based Natural Language Modeling. In: Perales, F.J., Campilho, A.J.C., de la Blanca, N.P., Sanfeliu, A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2003. Lecture Notes in Computer Science, vol 2652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44871-6_122

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-44871-6_122

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40217-6

  • Online ISBN: 978-3-540-44871-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics