Learning PP attachment from corpus statistics

Franz, Alexander

doi:10.1007/3-540-60925-3_47

Learning PP attachment from corpus statistics

Alexander Franz¹

Conference paper
First Online: 01 January 2005

220 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Abstract

One of the main problems in natural language analysis is the resolution of structural ambiguity. Prepositional Phrase (PP) attachment ambiguity is a particularly difficult case. We describe a robust PP disambiguation procedure that learns from a text corpus. The method is based on a loglinear model, a type of statistical model that is able to account for combinations of multiple categorial features. A series of experiments that compare the loglinear method against other strategies are described. For the difficult case of three possible attachment sites, the loglinear method predicts PP attachment with significantly higher accuracy than a simpler procedure that uses lexical association strengths. At the same time, on general newswire text, the accuracy of the statistical method remains 10% below the performance of human experts. This suggests a limit on what can be learned automatically from text, and points to the need to combine machine learning with human expertise.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Alan Agresti. Categorical Data Analysis. John Wiley & Sons, New York, 1990.
Google Scholar
Y. M. Bishop, S. E. Fienberg, and P. W. Holland. Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA, 1975.
Google Scholar
Lois Boggess, Rajeev Agarwall, and Ron Davis. Disambiguation of Prepositional Phrases in automatically labelled technical text. In AAAI-91, pages 784–789, 1991.
Google Scholar
Eric Brill and Philip Resnik. A rule-based approach to Prepositional Phrase attachment disambiguation. In Proceedings of COLING-94, pages 1198–1204, 1994.
Google Scholar
Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, and Robert L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–480, 1990.
Google Scholar
Stephen Crain and Mark J. Steedman. On not being led up the garden path: The use of context by the psychological syntax processor. In David R. Dowty, Lauri Karttunen, and Anrnold M. Zwicky, editors, Natural Language Parsing, pages 320–358, Cambridge, UK, 1985. Cambridge University Press.
Google Scholar
W. E. Deming and F. F. Stephan. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statis, (11):427–444, 1940.
Google Scholar
Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
Google Scholar
Stephen E. Fienberg. The Analysis of Cross-Classified Categorical Data. The MIT Press, Cambridge, MA, second edition edition, 1980.
Google Scholar
M. Ford, J.W. Bresnan, and R. Kaplan. A competence-based theory of syntactic closure. In Joan W. Bresnan, editor, The Mental Representation of Grammatical Relations, Cambridge, MA, 1982. MIT Press.
Google Scholar
Lyn Frazier. On Comprehending Sentences: Syntactic Parsing Strategies. PhD thesis, University of Massachusetts, Amherst, MA, 1979.
Google Scholar
Lyn Frazier. Sentence processing: A tutorial review. In M. Coltheart, editor, Attention and Performance XII, pages 559–586, Hillsdale, NJ, 1987. Lawrence Erlbaum.
Google Scholar
Ted Gibson and Neal Pearlmutter. A corpus-based analysis of psycholinguistic constraints on PP attachment. In Charles Clifton Jr., Lyn Frazier, and Keith Rayner, editors, Perspectives on Sentence Processing. Lawrence Erlbaum Associates, 1994.
Google Scholar
Donald Hindle and Mats Rooth. Structural ambiguity and lexical relations. Computational Linguistics, 19(1):103–120, 1993.
Google Scholar
Graeme Hirst. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press, Cambridge, 1986.
Google Scholar
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993.
Google Scholar
Adwait Ratnaparkhi, Jeff Rynar, and Salim Roukos. A maximum entropy model for Prepositional Phrase attachment. In ARPA Workshop on Human Language Technology, Plainsboro, NJ, March 8–11 1994.
Google Scholar
Philip Resnik and Marti Hearst. Structural ambiguity and conceptual relations. In Proceedings of the Workshop on Very Large Corpora, pages 58–64, 1993.
Google Scholar
Eiichiro Sumita, Osamu Furuse, and Hitoshi Iida. An example-based disambiguation of Prepositional Phrase attachment. In Fifth International Conference on Theoretical and Methodological Isues in Machine Tranlation, pages 80–91, Kyoto, Japan, 1993.
Google Scholar
Greg Whittemore, Kathleen Ferrara, and Hans Brunner. Empirical study of predictive powers of simple attachment schemes for post-modifier Prepositional Phrases. In Proceedings of ACL-90, pages 23–30, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Machine Translation, Carnegie Mellon University, 15213, Pittsburgh, PA
Alexander Franz

Authors

Alexander Franz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franz, A. (1996). Learning PP attachment from corpus statistics. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_47

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_47
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics