Skip to main content

A Multi-strategy Approach to Geo-Entity Recognition

  • Chapter
  • 587 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 251))

Abstract

Geographic location or place information has become an increasingly integrated and important element in web and online interaction, which is evident in the increasing sophistication and adoption of online mapping, navigational GPS, and location-aware search. A significant proportion of online location context, however, remains implicit in primarily unstructured document text. In order to leverage this location context, such references need to be extracted into structured knowledge elements defining place. A variety of “named entity” extraction methods have been developed in order to identify unstructured location references, alongside other references such as for persons or organizations, but geographic entity extraction remains an open problem. This chapter examines a multi-strategy approach to improving the quality of geo-entity extraction. The implemented experimental framework is targeted for web data, and it provides a comparative evaluation of individual approaches and parameterizations of our multi-strategy method. Results show that the multi-strategy approach provides a significant benefit in terms of accuracy, domain independence, and adaptability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blumberg, R., Atre, S.: The Problem with Unstructured Data. DM Review (2003)

    Google Scholar 

  2. Smith, D.A., Crane, G.: Disambiguating Geo- graphic Names in a Historical Digital Library. In: Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (2001)

    Google Scholar 

  3. Li, H., Srihari, R., Niu, C., Li, W.: InfoXtract location normalization: A hybrid approach to geographic references in information extraction. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)

    Google Scholar 

  4. Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (2004)

    Google Scholar 

  5. Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Proceedings of the Workshop on the Analysis of Geographic References NAACL-HLT (2003)

    Google Scholar 

  6. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  7. Kazama, J., Miyao, Y., Tsujii, J.: Maximum Entropy Tagger with Unsupervised Hidden Markov Models. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS 2001), pp. 333–340 (2001)

    Google Scholar 

  8. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)

    Google Scholar 

  9. Uryupina, O.: Linguistically Motivated Sample Selection for Co-reference Resolution. In: Proceedings of the 5th Discourse Anaphora and Anaphor Resolution Colloquium (2004)

    Google Scholar 

  10. Nadeau, D.: Balie – Baseline Information Extraction: Multilingual Information Extraction from Text with Machine Learning and Natural Language Techniques. Technical Report (2005), http://balie.sourceforge.net/dnadeau05balie.pdf

  11. Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. Thesis. NYU (1999)

    Google Scholar 

  12. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)

    Google Scholar 

  13. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named Entity Recognition through Classifier Combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gandhi, R., Wilson, D.C. (2009). A Multi-strategy Approach to Geo-Entity Recognition. In: Ras, Z.W., Ribarsky, W. (eds) Advances in Information and Intelligent Systems. Studies in Computational Intelligence, vol 251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04141-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04141-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04140-2

  • Online ISBN: 978-3-642-04141-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics