Skip to main content

Automatic Acquisition of GL Resources, Using an Explanatory, Symbolic Technique

  • Chapter
  • First Online:
Advances in Generative Lexicon Theory

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 46))

Abstract

This chapter presents a symbolic machine learning method that automatically infers, from descriptions of noun-verb pairs found in a corpus in which the verb plays (or not) one of the qualia roles of the noun, corpus-specific morpho-syntactic and semantic patterns that convey qualia relations. The patterns are explanatory and linguistically motivated, and can be applied to a corpus to efficiently extract GL resources and populate Generative Lexicons. The linguistic relevance of these patterns is examined, and the N-V qualia pairs that they can detect or not is discussed. Comparisons to other methods for corpus-based qualia couple extraction are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Armstrong, S. (1996). Multext: Multilingual text tools and corpora. In H. Feldweg & W. Hinrichs (Eds.), Lexikon und text (pp. 107–119). Tübingen: Niemeyer.

    Google Scholar 

  • Armstrong, A., Bouillon, P., & Robert, G. (1995). Tagger overview (Technical Report). Geneva, Switzerland: ISSCO. http://www.issco.unige.ch/staff/robert/tatoo/tagger.html

  • Bouaud, J., Habert, B., Nazarenko, A., & Zweigenbaum, P. (1997). Regroupements issus de dépendances syntaxiques en corpus: catégorisation et confrontation avec deux modélisations conceptuelles. In Proceedings of IC’97 (pp. 207–223). Ingénierie des Connaissances, Roscoff, France.

    Google Scholar 

  • Bouillon, P., Lehmann, S., Manzi, S., & Petitpierre, D. (1998). Développement de lexiques à grande échelle. In Proceedings of Colloque de Tunis 1997 « La mémoire des mots » (pp. 71–80). Tunis, Tunisia.

    Google Scholar 

  • Bouillon, P., Baud, R. H., Robert, G., & Ruch, P. (2000). Indexing by statistical tagging. In Proceedings of JADT’2000 (pp. 35–42). Journées internationales d’analyse de données textuelles, Lausanne, Switzerland.

    Google Scholar 

  • Ceusters, W., Spyns, P., DeMoor, G., & Martin, W. (1996). Tagging of medical texts: The multi-TALE project. Amsterdam: Ios Press.

    Google Scholar 

  • Church, K. W., & Gale, W. A. (1991). Concordances for parallel texts. In Proceedings of the 7th annual conference of the UW Centre for the New OED and Text Research (pp. 40–62). University of Waterloo, Ontario, Canada.

    Google Scholar 

  • Church, K. W., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of ACL’89. 27th Annual Meeting of the Association for Computational Linguistics (pp. 76–83), Vancouver, Canada.

    Google Scholar 

  • Claveau, V. (2003). Acquisition automatique de lexiques sémantiques pour la recherche d’information. PhD thesis, Université de Rennes 1, France.

    Google Scholar 

  • Claveau, V., & L’Homme, M.-C. (2004). Discovering specific relationships between nouns and verbs in a specialized French corpus. In Proceedings of CompuTerm’04, 3rd International Workshop on Computational Terminology, Geneva, Switzerland.

    Google Scholar 

  • Claveau, V., & Sébillot, P. (2004a). From efficiency to portability: Acquisition of semantic relations by semi-supervised machine learning. In Proceedings of COLING’04. 20th International Conference on Computational Linguistics (pp. 261–267), Geneva, Switzerland.

    Google Scholar 

  • Claveau, V., & Sébillot, P. (2004b). Extension de requêtes par lien sémantique nom-verbe acquis sur corpus. In Proceedings of TALN’04, Traitement automatique des langues naturelle, Fes, Morocco.

    Google Scholar 

  • Claveau, V., Sébillot, P., Fabre, C., & Bouillon, P. (2003). Learning semantic lexicons from a part-of-speech and semantically tagged corpus using inductive logic programming. Journal of Machine Learning Research, special issue on Inductive Logic Programming, 4, 493–525.

    Google Scholar 

  • Daille, B. (1994). Approche mixte pour l’extraction automatique de terminologie: statistique lexicale et filtres linguistiques. PhD thesis, Université Paris VII, France.

    Google Scholar 

  • Dunning, T. E. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

    Google Scholar 

  • Fabre, C. (1996). Interprétation automatique des séquences binominales en anglais et en français. Application à la recherche d’informations. PhD thesis, Université de Rennes 1, France.

    Google Scholar 

  • Fabre, C., & Sébillot, P. (1999). Semantic interpretation of binominal sequences and information retrieval. In Proceedings of international ICSC congress on computational intelligence: Methods and applications, CIMA’99. Symposium on Advances in Intelligent Data Analysis AIDA’99, Rochester.

    Google Scholar 

  • Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Galy, É. (2000). Repérer en corpus les associations sémantiques privilégiées entre le nom et le verbe: le cas de la fonction dénotée par le nom. Master’s thesis, Université de Toulouse – Le Mirail, France.

    Google Scholar 

  • Grefenstette, G. (1994). Explorations in automatic thesaurus discovery. Dordrecht: Kluwer Academic Publishers.

    MATH  Google Scholar 

  • Grefenstette, G. (1997). SQLET: Short query linguistic expansion techniques, palliating one-word queries by providing intermediate structure to text. In Proceedings of RIAO’97. Recherche d’Informations Assistée par Ordinateur (pp. 500–509), McGill-University, Montreal, Quebec, Canada.

    Google Scholar 

  • Harris, Z., Gottfried, M., Ryckman, T., Mattick, P., Jr., Daladier, A., Harris, T. N., & Harris, S. (1989). The form of information in science, analysis of immunology sublanguage (Boston studies in the philosophy of science, Vol. 104). Dordrecht: Kluwer Academic Publisher.

    Google Scholar 

  • Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING’92. 14th International Conference on Computational Linguistics (pp. 539–545), Nantes, France.

    Google Scholar 

  • Lapata, M., & Lascarides, A. (2003). A probabilisitic account of logical metonymy. Computational Linguistics, 29(2), 263–317.

    Google Scholar 

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19–20, 629–679.

    Article  Google Scholar 

  • Oueslati, R. (1999). Aide à l’acquisition de connaissances à partir de corpus. PhD thesis, Université Louis Pasteur, Strasbourg, France.

    Google Scholar 

  • Pearce, D. (2002). A comparative evaluation of collocation extraction techniques. In Proceedings of LREC’02. 3rd International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, Spain.

    Google Scholar 

  • Petitpierre, D., & Russel, G. (1998). Mmorph – The multext morphology program (Technical Report). Geneva: ISSCO.

    Google Scholar 

  • Pichon, R., & Sébillot, P. (1997). Acquisition automatique d’informations lexicales à partir de corpus: un bilan (Research Report, INRIA, No3321). France.

    Google Scholar 

  • Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.

    Google Scholar 

  • Pustejovsky, J., Bergler, S., & Anick, P. (1993). Lexical semantic techniques for corpus analysis. Computational Linguistics, 19(2), 331–358.

    Google Scholar 

  • Pustejovsky, J., Boguraev, B., Verhagen, M., Buitelaar, P., & Johnston, M. (1997). Semantic indexing and typed hyperlinking. In Proceedings of American Association for Artificial Intelligence Conference (pp. 120–128). Spring Symposium on Natural Language Processing for the World Wide Web, Stanford.

    Google Scholar 

  • Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–178.

    Google Scholar 

  • Vandenbroucke, L. (2000). Indexation automatique par couples nom-verbe pertinents, Mémoire de DES en information et documentation. Université Libre de Bruxelles, Belgium.

    Google Scholar 

  • Wilks, Y., & Stevenson, M. (1996). The grammar of sense: Is word-sense tagging much more than part-of-speech tagging? (Technical Report). Sheffield: University of Sheffield.

    Google Scholar 

  • Yarowsky, D. (1992). Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of COLING’92. 14th International Conference on Computational Linguistics, Nantes, France.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Claveau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Claveau, V., Sébillot, P. (2013). Automatic Acquisition of GL Resources, Using an Explanatory, Symbolic Technique. In: Pustejovsky, J., Bouillon, P., Isahara, H., Kanzaki, K., Lee, C. (eds) Advances in Generative Lexicon Theory. Text, Speech and Language Technology, vol 46. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5189-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-5189-7_19

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-5188-0

  • Online ISBN: 978-94-007-5189-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics