Abstract
Geographic location or place information has become an increasingly integrated and important element in web and online interaction, which is evident in the increasing sophistication and adoption of online mapping, navigational GPS, and location-aware search. A significant proportion of online location context, however, remains implicit in primarily unstructured document text. In order to leverage this location context, such references need to be extracted into structured knowledge elements defining place. A variety of “named entity” extraction methods have been developed in order to identify unstructured location references, alongside other references such as for persons or organizations, but geographic entity extraction remains an open problem. This chapter examines a multi-strategy approach to improving the quality of geo-entity extraction. The implemented experimental framework is targeted for web data, and it provides a comparative evaluation of individual approaches and parameterizations of our multi-strategy method. Results show that the multi-strategy approach provides a significant benefit in terms of accuracy, domain independence, and adaptability.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blumberg, R., Atre, S.: The Problem with Unstructured Data. DM Review (2003)
Smith, D.A., Crane, G.: Disambiguating Geo- graphic Names in a Historical Digital Library. In: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (2001)
Li, H., Srihari, R., Niu, C., Li, W.: InfoXtract location normalization: A hybrid approach to geographic references in information extraction. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)
Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (2004)
Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)
Kazama, J., Miyao, Y., Tsujii, J.: Maximum Entropy Tagger with Unsupervised Hidden Markov Models. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 333–340 (2001)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)
Uryupina, O.: Linguistically Motivated Sample Selection for Co-reference Resolution. In: Proceedings of the 5th Discourse Anaphora and Anaphor Resolution Colloquium (2004)
Nadeau, D.: Balie – Baseline Information Extraction: Multilingual Information Extraction from Text with Machine Learning and Natural Language Techniques. Technical Report (2005), http://balie.sourceforge.net/dnadeau05balie.pdf
Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. Thesis. NYU (1999)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gandhi, R., Wilson, D.C. (2009). A Multi-strategy Approach to Geo-Entity Recognition. In: Ras, Z.W., Ribarsky, W. (eds) Advances in Information and Intelligent Systems. Studies in Computational Intelligence, vol 251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04141-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-04141-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04140-2
Online ISBN: 978-3-642-04141-9
eBook Packages: EngineeringEngineering (R0)