Skip to main content

Bounding the Maximal Parsing Performance of Non-Terminally Separated Grammars

  • Conference paper
Grammatical Inference: Theoretical Results and Applications (ICGI 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6339))

Included in the following conference series:

Abstract

Unambiguous Non-Terminally Separated (UNTS) grammars have good learnability properties but are too restrictive to be used for natural language parsing. We present a generalization of UNTS grammars called Unambiguous Weakly NTS (UWNTS) grammars that preserve the learnability properties. Then, we study the problem of using them to parse natural language and evaluating against a gold treebank. If the target language is not UWNTS, there will be an upper bound in the parsing performance. In this paper we develop methods to find upper bounds for the unlabeled F 1 performance that any UWNTS grammar can achieve over a given treebank. We define a new metric, show that its optimization is NP-Hard but solvable with specialized software, and show a translation of the result to a bound for the F 1. We do experiments with the WSJ10 corpus, finding an F 1 bound of 76.1% for the UWNTS grammars over the POS tags alphabet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S., Flickenger, S., Gdaniec, C., Grishman, C., Harrison, P., Hindle, D., Ingria, R., Jelinek, F., Klavans, J., Liberman, M., Marcus, M., Roukos, S., Santorini, B., Strzalkowski, T.: A procedure for quantitatively comparing the syntactic coverage of English grammars. In: Black, E. (ed.) Proceedings of a workshop on Speech and natural language, pp. 306–311 (1991)

    Google Scholar 

  2. Achterberg, T.: SCIP - a framework to integrate Constraint and Mixed Integer Programming. Tech. rep. (2004)

    Google Scholar 

  3. Adriaans, P.W., Vervoort, M.: The EMILE 4.1 grammar induction toolbox. In: Adriaans, P.W., Fernau, H., van Zaanen, M. (eds.) ICGI 2002. LNCS (LNAI), vol. 2484, pp. 293–295. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Bod, R.: Unsupervised parsing with U-DOP. In: Proceedings of the 10th CoNLL (CoNLL-X), pp. 85–92 (2006)

    Google Scholar 

  5. Clark, A.: PAC-learning unambiguous NTS languages. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 59–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Clark, A.: Learning deterministic context free grammars: The Omphalos competition. Machine Learning 66(1), 93–110 (2007)

    Article  Google Scholar 

  7. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations, pp. 85–103 (1972)

    Google Scholar 

  8. Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the 42nd ACL, pp. 478–485 (2004)

    Google Scholar 

  9. Luque, F., Infante-Lopez, G.: Upper bounds for unsupervised parsing with Unambiguous Non-Terminally Separated grammars. In: Proceedings of CLAGI, 12th EACL, pp. 58–65 (2009)

    Google Scholar 

  10. Luque, F., Infante-Lopez, G.: PAC-learning unambiguous k,l-NTS ≤  languages. In: Sempere, J.M. (ed.) ICGI 2010. LNCS (LNAI), vol. 6339, pp. 122–134. Springer, Heidelberg (2010)

    Google Scholar 

  11. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The Penn treebank. Computational Linguistics 19(2), 313–330 (1994)

    Google Scholar 

  12. Poljak, S.: A note on stable sets and coloring of graphs. Commentationes Mathematicae Universitatis Carolinae 15(2), 307–309 (1974)

    MATH  MathSciNet  Google Scholar 

  13. Seginer, Y.: Fast unsupervised incremental parsing. In: Proceedings of the 45th ACL, pp. 384–391 (2007)

    Google Scholar 

  14. van Zaanen, M.: ABL: alignment-based learning. In: Proceedings of the 18th conference on Computational linguistics, pp. 961–967 (2000)

    Google Scholar 

  15. van Zaanen, M., Geertzen, J.: Problems with evaluation of unsupervised empirical grammatical inference systems. In: Clark, A., Coste, F., Miclet, L. (eds.) ICGI 2008. LNCS (LNAI), vol. 5278, pp. 301–303. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Luque, F.M., Infante-Lopez, G. (2010). Bounding the Maximal Parsing Performance of Non-Terminally Separated Grammars. In: Sempere, J.M., García, P. (eds) Grammatical Inference: Theoretical Results and Applications. ICGI 2010. Lecture Notes in Computer Science(), vol 6339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15488-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15488-1_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15487-4

  • Online ISBN: 978-3-642-15488-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics