Skip to main content

A Machine Learning Approach for Resolving Place References in Text

  • Chapter
  • First Online:
Geospatial Thinking

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC,volume 0))

Abstract

This paper presents a machine learning method for resolving place references in text, i.e. linking character strings in documents to locations on the surface of the Earth. This is a fundamental task in the area of Geographic Information Retrieval, supporting access through geography to large document collections. The proposed method is an instance of stacked learning, in which a first learner based on a Hidden Markov Model is used to annotate place references, and then a second learner implementing a regression through a Support Vector Machine is used to rank the possible disabiguations for the references that were initially annotated. The proposed method was evaluated through gold-standard document collections in three different languages, having place references annotated by humans. Results show that the proposed method compares favorably against commercial state-of-the-art systems such as the Metacarta geo-tagger and Yahoo! Placemaker.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Ahn, D., Rantwijk, J., and Rijke, M. (2007) A Cascaded Machine Learning Approach to Interpreting Temporal Expressions. Proceedings of the 2007 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

    Google Scholar 

  • Amitay, E., Har’El, N., Sivan, R., and Soffer, A. (2004) Web-a-where: Geotagging web content. Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval.

    Google Scholar 

  • Anastácio, I., Martins, B., Calado, P. (2009) A Comparison of Different Approaches for Assigning Geographic Scopes to Documents. Proceedings of INForum: 1st Simpósio de Informática.

    Google Scholar 

  • Bender, O., Och, F., and Ney, H. (2003) Maximum Entropy Models for Named Entity Recognition. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • Buscaldi, D., and Rosso, P. (2008) A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographic Information Science, 22(3).

    Google Scholar 

  • Carpenter, B. (2007) LingPipe for 99.99% Recall of Gene Mentions. Proceedings of the 2nd BioCreative Workshop.

    Google Scholar 

  • Chieu, H., and Ng, H. (2003) Named Entity Recognition with a Maximum Entropy Approach. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. (2003) Named Entity Recognition through Classifier Combination. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • Garbin, E., and Mani, I. (2005) Disambiguating toponyms in news. Proceedings of the 2005 Conference on Human Language Technology and Empirical Methods in Natural Language Processing.

    Google Scholar 

  • Ladra, S., Luaces, M., Pedreira, O., and Seco, D. (2008) A Toponym Resolution Service Following the OGC WPS Standard. Proceedings of the 8th international Symposium on Web and Wireless Geographical information Systems.

    Google Scholar 

  • Leidner, J. (2004) Towards a Reference Corpus for Automatic Toponym Resolution Evaluation. Proceedings of the 1st Workshop on Geographic Information Retrieval.

    Google Scholar 

  • Leidner, J. (2007) Toponym Resolution in Text. PhD thesis, University of Edinburgh.

    Google Scholar 

  • Manning, C., and Schutze, H. (1999) Foundations of Statistical Natural Language Processing. MIT Press.

    Google Scholar 

  • Martins, B. and Calado, P. (2010) Learning to Rank for Geographic Information Retrieval. Proceedings of the 6th ACM Workshop on Geographic Information Retrieval

    Google Scholar 

  • Mayfield, J., McNamee, P., and Piatko, C. (2003) Named Entity Recognition using Hundreds of Thousands of Features. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • McCallum, A., and Li, W. (2003) Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • Mota, C., and Santos, D. (eds., 2008) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca.

    Google Scholar 

  • Rauch, E., Bukatin, M., and Baker, K. (2003) A confidence-based framework for disambiguating geographic terms. Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References.

    Google Scholar 

  • Sang, E., and Meulder, F. (2003) Introduction to the CoNLL-2003 shared task: Language-Independent Named Entity Recognition. Proceedings of the 7th Conference on Natural Language Learning.

    Google Scholar 

  • Smith, D. and Crane, G. (2001) Disambiguating geographic names in a historical digital library. Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries.

    Google Scholar 

  • Smith, D. and Mann, G. (2002) Bootstrapping toponym classifiers. Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References.

    Google Scholar 

  • Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy K. (1999) Improvements to the SMO Algorithm for SVM Regression. IEEE Transactions on Neural Networks, 11(5).

    Google Scholar 

  • Smola, A., Scholkopf, B. (1998) A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.

    Google Scholar 

  • Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer-Verlag.

    Google Scholar 

  • Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2).

    Google Scholar 

  • Wick, M., and Becker, T. (2007) Enhancing RSS Feeds with Extracted Geospatial Information for Further Processing and Visualization. In A. Scharl and K. Tochtermann (eds.) The Geospatial Web - How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society, Springer.

    Google Scholar 

  • Witten, I., and Frank, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.

    Google Scholar 

  • Wolpert, D. (1992) Stacked Generalization, Neural Networks, 5(2).

    Google Scholar 

  • Zhou, G., and Su, J. (2002) Named Entity Recognition using an HMM-based Chunk Tagger. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Fundação para a Ciência e a Tecnologia (FCT), through project grant PTDC/EIA/73614/2006 (GREASE-II).

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Martins, B., Anastácio, I., Calado, P. (2010). A Machine Learning Approach for Resolving Place References in Text. In: Painho, M., Santos, M., Pundt, H. (eds) Geospatial Thinking. Lecture Notes in Geoinformation and Cartography, vol 0. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12326-9_12

Download citation

Publish with us

Policies and ethics