Abstract
This paper presents a machine learning method for resolving place references in text, i.e. linking character strings in documents to locations on the surface of the Earth. This is a fundamental task in the area of Geographic Information Retrieval, supporting access through geography to large document collections. The proposed method is an instance of stacked learning, in which a first learner based on a Hidden Markov Model is used to annotate place references, and then a second learner implementing a regression through a Support Vector Machine is used to rank the possible disabiguations for the references that were initially annotated. The proposed method was evaluated through gold-standard document collections in three different languages, having place references annotated by humans. Results show that the proposed method compares favorably against commercial state-of-the-art systems such as the Metacarta geo-tagger and Yahoo! Placemaker.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahn, D., Rantwijk, J., and Rijke, M. (2007) A Cascaded Machine Learning Approach to Interpreting Temporal Expressions. Proceedings of the 2007 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
Amitay, E., Har’El, N., Sivan, R., and Soffer, A. (2004) Web-a-where: Geotagging web content. Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval.
Anastácio, I., Martins, B., Calado, P. (2009) A Comparison of Different Approaches for Assigning Geographic Scopes to Documents. Proceedings of INForum: 1st Simpósio de Informática.
Bender, O., Och, F., and Ney, H. (2003) Maximum Entropy Models for Named Entity Recognition. Proceedings of the 7th Conference on Natural Language Learning.
Buscaldi, D., and Rosso, P. (2008) A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographic Information Science, 22(3).
Carpenter, B. (2007) LingPipe for 99.99% Recall of Gene Mentions. Proceedings of the 2nd BioCreative Workshop.
Chieu, H., and Ng, H. (2003) Named Entity Recognition with a Maximum Entropy Approach. Proceedings of the 7th Conference on Natural Language Learning.
Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. (2003) Named Entity Recognition through Classifier Combination. Proceedings of the 7th Conference on Natural Language Learning.
Garbin, E., and Mani, I. (2005) Disambiguating toponyms in news. Proceedings of the 2005 Conference on Human Language Technology and Empirical Methods in Natural Language Processing.
Ladra, S., Luaces, M., Pedreira, O., and Seco, D. (2008) A Toponym Resolution Service Following the OGC WPS Standard. Proceedings of the 8th international Symposium on Web and Wireless Geographical information Systems.
Leidner, J. (2004) Towards a Reference Corpus for Automatic Toponym Resolution Evaluation. Proceedings of the 1st Workshop on Geographic Information Retrieval.
Leidner, J. (2007) Toponym Resolution in Text. PhD thesis, University of Edinburgh.
Manning, C., and Schutze, H. (1999) Foundations of Statistical Natural Language Processing. MIT Press.
Martins, B. and Calado, P. (2010) Learning to Rank for Geographic Information Retrieval. Proceedings of the 6th ACM Workshop on Geographic Information Retrieval
Mayfield, J., McNamee, P., and Piatko, C. (2003) Named Entity Recognition using Hundreds of Thousands of Features. Proceedings of the 7th Conference on Natural Language Learning.
McCallum, A., and Li, W. (2003) Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Proceedings of the 7th Conference on Natural Language Learning.
Mota, C., and Santos, D. (eds., 2008) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca.
Rauch, E., Bukatin, M., and Baker, K. (2003) A confidence-based framework for disambiguating geographic terms. Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References.
Sang, E., and Meulder, F. (2003) Introduction to the CoNLL-2003 shared task: Language-Independent Named Entity Recognition. Proceedings of the 7th Conference on Natural Language Learning.
Smith, D. and Crane, G. (2001) Disambiguating geographic names in a historical digital library. Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries.
Smith, D. and Mann, G. (2002) Bootstrapping toponym classifiers. Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic References.
Shevade, S., Keerthi, S., Bhattacharyya, C., Murthy K. (1999) Improvements to the SMO Algorithm for SVM Regression. IEEE Transactions on Neural Networks, 11(5).
Smola, A., Scholkopf, B. (1998) A Tutorial on Support Vector Regression. NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.
Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer-Verlag.
Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2).
Wick, M., and Becker, T. (2007) Enhancing RSS Feeds with Extracted Geospatial Information for Further Processing and Visualization. In A. Scharl and K. Tochtermann (eds.) The Geospatial Web - How Geobrowsers, Social Software and the Web 2.0 are Shaping the Network Society, Springer.
Witten, I., and Frank, E. (2000) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.
Wolpert, D. (1992) Stacked Generalization, Neural Networks, 5(2).
Zhou, G., and Su, J. (2002) Named Entity Recognition using an HMM-based Chunk Tagger. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.
Acknowledgments
This work was supported by the Fundação para a Ciência e a Tecnologia (FCT), through project grant PTDC/EIA/73614/2006 (GREASE-II).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Martins, B., Anastácio, I., Calado, P. (2010). A Machine Learning Approach for Resolving Place References in Text. In: Painho, M., Santos, M., Pundt, H. (eds) Geospatial Thinking. Lecture Notes in Geoinformation and Cartography, vol 0. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12326-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12326-9_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12325-2
Online ISBN: 978-3-642-12326-9
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)