Abstract
We propose a new heuristic for toponym sense disambiguation, to be used when mapping toponyms in text to ontology concepts, using techniques based on semantic similarity measures. We evaluated the proposed approach using a collection of Portuguese news articles from which the geographic entity names were extracted and then manually mapped to concepts in a geospatial ontology covering the territory of Portugal. The results suggest that using semantic similarity to disambiguate toponyms in text produces good results, in comparison with a baseline method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrade, L., Silva, M.J.: Relevance Ranking for Geographic IR. In: Purves, R., Jones, C. (eds.) GIR. Department of Geography, University of Zurich (2006)
Batista, D., Silva, M.J.: A Statistical Study of the WPT05 Crawl of the Portuguese Web. In: FALA 2010 VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop, Vigo, Spain (2010)
Butanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures. In: Proceedings of WordNet and Other Lexical Resources Workshop (2001)
Cardoso, N.: REMBRANDT - Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Encontro do Segundo HAREM, PROPOR 2008, Aveiro, Portugal (2008)
Gale, W.A., Church, K.W., Yarowsky, D.: One Sense per Discourse. In: Proceedings of the Workshop on Speech and Natural Language, HLT 1991 (1992)
Jiang, J.J., Conrath, D.W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proc. of the Int’l. Conf. on Research in Computational Linguistics, pp. 19–33 (1997)
Leidner, J.L., Sinclair, G., Webber, B.: Grounding Spatial Named Entities for Information Extraction and Question Answering. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References, vol. 1 (2003)
Lin, D.: An Information-Theoretic Definition of Similarity. In: ICML 1998: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann Publishers Inc., San Francisco (1998)
Lopez-Pellicer, F.J., Chaves, M., Rodrigues, C., Silva, M.J.: Geographic Ontologies Production in GREASE-II. Tech. Rep. TR 09-18, University of Lisbon, Faculty of Sciences, LASIGE (November 2009)
Martins, B., Anastácio, I., Calado, P.: A Machine Learning Approach for Resolving Place References in Text. In: Proceedings of the 13th AGILE International Conference on Geographic Information Science. Association of Geographic Information Laboratories for Europe. Springer, Guimarães (2010)
Navigli, R.: Word Sense Disambiguation: A Survey. ACM Comput. Surv. 41 (February 2009)
Rauch, E., Bukatin, M., Baker, K.: A Confidence-Based Framework for Disambiguating Geographic Terms. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References. Association for Computational Linguistics (2003)
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 1, pp. 448–453. Morgan Kaufmann Publishers Inc, San Francisco (1995)
Santos, D., Rocha, P.: The Key to the First CLEF with Portuguese: Topics, Questions and Answers in CHAVE. In: Multilingual Information Access for Text, Speech and Images (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Batista, D.S., Ferreira, J.D., Couto, F.M., Silva, M.J. (2012). Toponym Disambiguation Using Ontology-Based Semantic Similarity. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)