Using the Web to Validate Lexico-Semantic Relations

Costa, Hernani Pereira; Gonçalo Oliveira, Hugo; Gomes, Paulo

doi:10.1007/978-3-642-24769-9_43

Hernani Pereira Costa²¹,
Hugo Gonçalo Oliveira²¹ &
Paulo Gomes²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7026))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

1419 Accesses

Abstract

The evaluation of semantic relations acquired automatically from text is a challenging task, which generally ends up being done by humans. Despite less prone to errors, manual evaluation is hardly repeatable, time-consuming and sometimes subjective. In this paper, we evaluate relational triples automatically, exploiting popular similarity measures on the Web. After using these measures to quantify triples according to the co-occurrence of their arguments and textual patterns denoting their relation, some scores revealed to be highly correlated with the correction rate of the triples. The measures were also used to select correct triples in a set, with best F ₁ scores around 96%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, C.H., Gacs, P., Gcs, P., Member, S., Li, M., Vitanyi, P.M.B., Zurek, W.H.: Information Distance. IEEE Transactions on Information Theory 44, 1407–1423 (1998)
Article MathSciNet MATH Google Scholar
Blohm, S., Cimiano, P., Stemle, E.: Harvesting relations from the web: quantifiying the impact of filtering functions. In: Proc. 22nd National Conf. on Artificial Intelligence, pp. 1316–1321. AAAI (2007)
Google Scholar
Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M.: Mining for personal name aliases on the web. In: Proc. 17th International Conf. on the World Wide Web, pp. 1107–1108. ACM (2008)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proc. 16th International Conf. on the World Wide Web, pp. 757–766. ACM, New York (2007)
Google Scholar
Brank, J., Grobelnik, M., Mladenić, D.: A survey of ontology evaluation techniques. In: Proc. Conf. on Data Mining and Data Warehouses, SIKDD (2005)
Google Scholar
Cederberg, S., Widdows, D.: Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction. In: Proc. Conf. on Computational Natural Language Learning, pp. 111–118 (2003)
Google Scholar
Cilibrasi, R., Vitanyi, P.M.B.: Normalized Web Distance and Word Similarity. Computing Research Repository, ArXiv e-prints (2009)
Google Scholar
Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations Newsletter 6(2), 24–33 (2004)
Article Google Scholar
Cimiano, P., Wenderoth, J.: Automatic Acquisition of Ranked Qualia Structures from the Web. In: Proc. 45th Annual Meeting of the Association of Computational Linguistics, pp. 888–895. ACL, Prague (2007)
Google Scholar
Costa, R.P., Seco, N.: Hyponymy extraction and web search behavior analysis based on query reformulation. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds.) IBERAMIA 2008. LNCS (LNAI), vol. 5290, pp. 332–341. Springer, Heidelberg (2008)
Chapter Google Scholar
Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. In: Proc. 19th International Joint Conf. on Artificial Intelligence, pp. 1034–1041. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Article Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT (May 1998)
Google Scholar
Gracia, J.L., Mena, E.: Web-Based Measure of Semantic Relatedness. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 136–150. Springer, Heidelberg (2008)
Chapter Google Scholar
Harris, Z.: Distributional structure. In: Papers in Structural and Transformational Linguistics, pp. 775–794. D. Reidel Publishing Comp., Dordrecht (1970)
Chapter Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. 14th Conf. on Computational Linguistics, pp. 539–545. ACL, Morristown (1992)
Chapter Google Scholar
Lenat, D.: CYC: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38, 33–38 (1995)
Article Google Scholar
Magnini, B., Negri, M., Prevete, R., Tanev, H.: Is It the Right Answer? Exploiting Web Redundancy for Answer Validation. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics, pp. 425–432 (2002)
Google Scholar
Oliveira, P.C.: Probabilistic Reasoning in the Semantic Web using Markov Logic, pp. 67–73. University of Coimbra, Faculty of Sciences and Technology, Department of Informatics Engineering (July 2009)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proc. 21st International Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 113–120. ACL, Sydney (2006)
Google Scholar
Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Chapter Google Scholar
Wu, F., Weld, D.S.: Open Information Extraction Using Wikipedia. In: Proc. 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. ACL, Uppsala (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Cognitive and Media Systems Group, CISUC, University of Coimbra, Portugal
Hernani Pereira Costa, Hugo Gonçalo Oliveira & Paulo Gomes

Authors

Hernani Pereira Costa
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Gonçalo Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Gomes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Ciências, Departamento de Informática, GUESS/LabMAg/Universidade de Lisboa, Campo Grande, 749-016, Lisboa, Portugal
Luis Antunes
Department of Computer Science and Engineering, INESC-ID, Instituto Superior Técnico, IST, Avenida Rovisco Pais, 1049-001, Lisboa, Portugal
H. Sofia Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Costa, H.P., Gonçalo Oliveira, H., Gomes, P. (2011). Using the Web to Validate Lexico-Semantic Relations. In: Antunes, L., Pinto, H.S. (eds) Progress in Artificial Intelligence. EPIA 2011. Lecture Notes in Computer Science(), vol 7026. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24769-9_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-24769-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24768-2
Online ISBN: 978-3-642-24769-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics