Skip to main content

Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

Work done under partial support of Mexican Government (CONACyT, SNI, PIFI-IPN, CGEPI-IPN) and RITOS-2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)

    Google Scholar 

  2. Banerjee, S., Ted Pedersen, T.: The Design, Implementation, and Use of the Ngram Statistic Package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, pp. 370–381 (2003)

    Google Scholar 

  3. Brants, T.: TnT: A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, WA, USA (2000)

    Google Scholar 

  4. Calvo, H., Gelbukh, A.: Improving Disambiguation of Prepositional Phrase Attachments Using the Web as Corpus. In: Procs. of CIARP 2003, Cuba, pp. 592–598 (2003)

    Google Scholar 

  5. Calvo, H., Gelbukh, A.: Unsupervised Learning of Ontology-Linked Selectional Preferences. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 418–424. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Clark, S., Weir, D.: Class-based Probability Estimation Using a Semantic Hierarchy. Computational Linguistics 28(2) (2002)

    Google Scholar 

  7. Farreres, X., Rigau, G., Rodríguez, H.: Using WordNet for Building WordNets. In: Proceedings of COLING-ACL Workshop Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)

    Google Scholar 

  8. Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer, Dordrecht (1994)

    MATH  Google Scholar 

  9. Hindle, D., Rooth, M.: Structural ambiguity and lexical relations. Computational Linguistics 19, 103–120 (1993)

    Google Scholar 

  10. Kilgarriff, A.: Thesauruses for Natural Language Processing. In: Proceedings of NLP KE 2003, Beijing, China, pp. 5–13 (2003)

    Google Scholar 

  11. Lázaro Carreter, F. (ed.): Diccionario Anaya de la Lengua, Vox (1991)

    Google Scholar 

  12. Li, H., Abe, N.: Word clustering and disambiguation based on co-ocurrence data. In: Proceedings of COLING 1998, pp. 749–755 (1998)

    Google Scholar 

  13. Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)

    Google Scholar 

  14. Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, ch. 1. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  15. McLauchlan, M.: Thesauruses for Prepositional Phrase Attachment. In: Proceedings of CoNLL 2004, Boston, MA, USA, pp. 73–80 (2004)

    Google Scholar 

  16. Mitchell, B.: Prepositional phrase attachment using machine learning algorithms. Ph.D. thesis, University of Sheffield (2003)

    Google Scholar 

  17. Morales-Carrasco, R., Gelbukh, A.: Evaluation of TnT Tagger for Spanish. In: Proc. Fourth Mexican International Conference on Computer Science, Mexico (2003)

    Google Scholar 

  18. Navarro, B., Civit, M., Antonia Martí, M., Marcos, R., Fernández, B.: Syntactic, semantic and pragmatic annotation in Cast3LB. In: Shallow Processing of Large Corpora (SProLaC), a Workshop of Corpus Linguistics, Lancaster, UK (2003)

    Google Scholar 

  19. Pantel, P., Lin, D.: An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words. In: Proceedings of Association for Computational Linguistics (ACL 2000), Hong Kong, pp. 101–108 (2000)

    Google Scholar 

  20. Ratnaparkhi, A., Reynar, J., Roukos, S.: A maximum entropy model for prepositional phrase attachment. In: Proceedings of the ARPA Workshop on Human Language Technology, pp. 250–255 (1994)

    Google Scholar 

  21. Ratnaparkhi, A.: Unsupervised Statistical Models for Prepositional Phrase Attachment. In: Proceedings of COLINGACL 1998, Montreal, Canada (1998)

    Google Scholar 

  22. Resnik, P.: Selectional preference and sense disambiguation. In: ACL SIGLEX Workshop on Tagging Text with Lexical Semantics, Washington, D. C., USA (1997)

    Google Scholar 

  23. Roth, D.: Learning to Resolve Natural Language Ambiguities: A Unified Approach. In: Proceedings of AAAI 1998, Madison, Wisconsin, pp. 806–813 (1998)

    Google Scholar 

  24. Stetina, J., Nagao, M.: Corpus based PP attachment ambiguity resolution with a semantic dictionary. In: Proceedings of WVLC 1997, pp. 66–80 (1997)

    Google Scholar 

  25. Jones, S., Karen: Synonymy and Semantic Classification. Edinburgh University Press (1986)

    Google Scholar 

  26. Weeds, J.: Measures and Applications of Lexical Distributional Similarity. Julie Weeds, Ph.D. thesis. University of Sussex (2003)

    Google Scholar 

  27. Volk, M.: Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In: Proceeding of Corpus Linguistics 2001, Lancaster (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Calvo, H., Gelbukh, A., Kilgarriff, A. (2005). Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics