Skip to main content

PCFG Learning by Nonterminal Partition Search

  • Conference paper
  • First Online:
  • 330 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2484))

Abstract

pcfg Learning by Partition Search is a general grammatical inference method for constructing, adapting and optimising pcfgs. Given a training corpus of examples from a language, a canonical grammar for the training corpus, and a parsing task, Partition Search pcfg Learning constructs a grammar that maximises performance on the parsing task and minimises grammar size. This paper describes Partition Search in detail, also providing theoretical background and a characterisation of the family of inference methods it belongs to. The paper also reports an example application to the task of building grammars for noun phrase extraction, a task that is crucial in many applications involving natural language processing. In the experiments, Partition Search improves parsing performance by up to 21.45% compared to a general baseline and by up to 3.48% compared to a task-specific baseline, while reducing grammar size by up to 17.25%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Belz. 2001. Optimising corpus-derived probabilistic grammars. In Proceedings of Corpus Linguistics 2001, pages 46–57.

    Google Scholar 

  2. A. Belz. 2002. Learning Grammars for Different Parsing Tasks by Partition Search. To appear in Proceedings of COLING 2002.

    Google Scholar 

  3. E. Charniak and G. Carroll. 1994. Context-sensitive statistics for improved grammatical language models. Technical Report CS-94-07, Department of Computer Science, Brown University.

    Google Scholar 

  4. E. Charniak. 1996. Tree-bank grammars. Technical Report CS-96-02, Department of Computer Science, Brown University.

    Google Scholar 

  5. E. M. Gold. 1967. Language Identification in the Limit. Information and Control, 10:447–474.

    Article  MATH  Google Scholar 

  6. M. Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–632.

    Google Scholar 

  7. A. J. Korenjak. 1969. A practical method for constructing LR(k) processors. Communications of the ACM, 12(11).

    Google Scholar 

  8. Po Chui Luk, Helen Meng, and Fuliang Weng. 2000. Grammar partitioning and parser composition for natural langugage understanding. In Proceedings of ICSLP 2000.

    Google Scholar 

  9. J. Nerbonne, A. Belz, N. Cancedda, H. Déjean, J. Hammerton, R. Koeling, S. Konstantopoulos, M. Osborne, F. Thollard, and E. Tjong Kim Sang. 2001. Learning computational grammars. In Proceedings of CoNLL 2001, pages 97–104.

    Google Scholar 

  10. H. Schmid and S. Schulte Im Walde. 2000. Robust German noun chunking with a probabilistic context-free grammar. In Proceedings of COLING 2000, pages 726–732.

    Google Scholar 

  11. H. Schmid. 2000. LoPar: Design and implementation. Bericht des Sonderforschungsbereiches “Sprachtheoretische Grundlagen für die Computerlinguistik” 149, Institute for Computational Linguistics, University of Stuttgart.

    Google Scholar 

  12. J. Luis Verdú-Mas, J. Calera-Rubio, and R. C. Carrasco. 2000. A comparison of PCFG models. In Proceedings of CoNLL-2000 and LLL-2000, pages 123–125.

    Google Scholar 

  13. F. L. Weng and A. Stolcke. 1995. Partitioning grammars and composing parsers. In Proceedings of the 4th International Workshop on Parsing Technologies.

    Google Scholar 

  14. J. G. Wolff. 1982. Language Acquisition, Data Compression and Generalization. In Language and Communication, 2(1):57–89.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Belz, A. (2002). PCFG Learning by Nonterminal Partition Search. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-45790-9_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44239-4

  • Online ISBN: 978-3-540-45790-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics