PCFG Learning by Nonterminal Partition Search

Belz, Anja

doi:10.1007/3-540-45790-9_2

PCFG Learning by Nonterminal Partition Search

Anja Belz⁶

Conference paper
First Online: 01 January 2002

330 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2484))

Abstract

pcfg Learning by Partition Search is a general grammatical inference method for constructing, adapting and optimising pcfgs. Given a training corpus of examples from a language, a canonical grammar for the training corpus, and a parsing task, Partition Search pcfg Learning constructs a grammar that maximises performance on the parsing task and minimises grammar size. This paper describes Partition Search in detail, also providing theoretical background and a characterisation of the family of inference methods it belongs to. The paper also reports an example application to the task of building grammars for noun phrase extraction, a task that is crucial in many applications involving natural language processing. In the experiments, Partition Search improves parsing performance by up to 21.45% compared to a general baseline and by up to 3.48% compared to a task-specific baseline, while reducing grammar size by up to 17.25%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Belz. 2001. Optimising corpus-derived probabilistic grammars. In Proceedings of Corpus Linguistics 2001, pages 46–57.
Google Scholar
A. Belz. 2002. Learning Grammars for Different Parsing Tasks by Partition Search. To appear in Proceedings of COLING 2002.
Google Scholar
E. Charniak and G. Carroll. 1994. Context-sensitive statistics for improved grammatical language models. Technical Report CS-94-07, Department of Computer Science, Brown University.
Google Scholar
E. Charniak. 1996. Tree-bank grammars. Technical Report CS-96-02, Department of Computer Science, Brown University.
Google Scholar
E. M. Gold. 1967. Language Identification in the Limit. Information and Control, 10:447–474.
Article MATH Google Scholar
M. Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–632.
Google Scholar
A. J. Korenjak. 1969. A practical method for constructing LR(k) processors. Communications of the ACM, 12(11).
Google Scholar
Po Chui Luk, Helen Meng, and Fuliang Weng. 2000. Grammar partitioning and parser composition for natural langugage understanding. In Proceedings of ICSLP 2000.
Google Scholar
J. Nerbonne, A. Belz, N. Cancedda, H. Déjean, J. Hammerton, R. Koeling, S. Konstantopoulos, M. Osborne, F. Thollard, and E. Tjong Kim Sang. 2001. Learning computational grammars. In Proceedings of CoNLL 2001, pages 97–104.
Google Scholar
H. Schmid and S. Schulte Im Walde. 2000. Robust German noun chunking with a probabilistic context-free grammar. In Proceedings of COLING 2000, pages 726–732.
Google Scholar
H. Schmid. 2000. LoPar: Design and implementation. Bericht des Sonderforschungsbereiches “Sprachtheoretische Grundlagen für die Computerlinguistik” 149, Institute for Computational Linguistics, University of Stuttgart.
Google Scholar
J. Luis Verdú-Mas, J. Calera-Rubio, and R. C. Carrasco. 2000. A comparison of PCFG models. In Proceedings of CoNLL-2000 and LLL-2000, pages 123–125.
Google Scholar
F. L. Weng and A. Stolcke. 1995. Partitioning grammars and composing parsers. In Proceedings of the 4th International Workshop on Parsing Technologies.
Google Scholar
J. G. Wolff. 1982. Language Acquisition, Data Compression and Generalization. In Language and Communication, 2(1):57–89.
Article Google Scholar

Download references

Author information

Authors and Affiliations

ITRI University of Brighton, Lewes Road, Brighton, BN2 4GJ, UK
Anja Belz

Authors

Anja Belz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Perot Systems Nederland B.V., Hoefseweg 1, 3821 AE, Amersfoort, The Netherlands
Pieter Adriaans (Senior Research Advisor, Professor of Learning and Adaptive Systems) (Senior Research Advisor, Professor of Learning and Adaptive Systems)
ILLC/Computation and Complexity Theory, Universiteit van Amsterdam, Plantage Muidergracht 24, 1018 TV, Amsterdam, The Netherlands
Pieter Adriaans (Senior Research Advisor, Professor of Learning and Adaptive Systems) (Senior Research Advisor, Professor of Learning and Adaptive Systems)
School of Electrical Engineering and Computer Science, University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia
Henning Fernau
Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Sand 13, 72076, Tübingen, Germany
Henning Fernau
FNWI/ILLC, Cognitive Systems and Information Processing Group, Universiteit van Amsterdam, Room B-5.39, Nieuwe Achtergracht 166, 1018 WV, Amsterdam, The Netherlands
Menno van Zaanen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belz, A. (2002). PCFG Learning by Nonterminal Partition Search. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_2

Download citation

DOI: https://doi.org/10.1007/3-540-45790-9_2
Published: 05 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics