Skip to main content

A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation

  • Conference paper
Language Processing and Intelligent Information Systems (IIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Abstract

Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proc. of MUC-7 (1998)

    Google Scholar 

  2. Buscaldi, D., Rosso, P.: A conceptual density-based approach for the disambiguation of toponyms. Journal of Geographical Information Science 22(3), 301–313 (2008)

    Article  Google Scholar 

  3. Carpenter, B.: Character language models for chinese word segmentation and named entity recognition. In: Association for Computational Linguistics, pp. 169–172 (2006)

    Google Scholar 

  4. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of the 43rd ACL (2005)

    Google Scholar 

  5. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proc. of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)

    Google Scholar 

  6. Furche, T., Grasso, G., Orsi, G., Schallhart, C., Wang, C.: Automatically learning gazetteers from the deep web. In: Proc. of the 21st International Conference Companion on World Wide Web, pp. 341–344 (2012)

    Google Scholar 

  7. Grishman, R., Sundheim, B.: Message understanding conference - 6: A brief history. In: Proc. of Int’l Conf. on Computational Linguistics, pp. 466–471 (1996)

    Google Scholar 

  8. Habib, M.B., van Keulen, M.: Named entity extraction and disambiguation: The reinforcement effect. In: Proc. of MUD 2011, Seatle, USA, pp. 9–16 (2011)

    Google Scholar 

  9. Habib, M.B., van Keulen, M.: Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proc. of KDIR 2012, pp. 399–410 (2012)

    Google Scholar 

  10. Hobbs, J., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: A system for extracting information from text. In: Proc. of Human Language Technology, pp. 133–137 (1993)

    Google Scholar 

  11. Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proc. of COLING 2002, pp. 1–7 (2002)

    Google Scholar 

  12. Leidner, J.L.: Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Universal Press, Boca Raton (2008)

    Google Scholar 

  13. Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming news. In: Proc. of SIGIR 2011, pp. 843–852 (2011)

    Google Scholar 

  14. Martins, B., Anastácio, I., Calado, P.: A machine learning approach for resolving place references in text. In: Proc. of AGILE 2010 (2010)

    Google Scholar 

  15. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proc. of CoNLL 2003, pp. 188–191 (2003)

    Google Scholar 

  16. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artificial Intelligence (2012), http://www.sciencedirect.com/science/article/pii/S0004370212000276

  17. Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Fluart, F., Zaghouani, W., Widiger, A., Charlotte Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. In: Proc. of LREC 2006, pp. 53–58 (2006)

    Google Scholar 

  18. Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Workshop Proc. of the HLT-NAACL 2003, pp. 50–54 (2003)

    Google Scholar 

  19. Rennie, J.D.M.: Using term informativeness for named entity detection. In: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–360 (2005)

    Google Scholar 

  20. Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: ACL 2008 (2008)

    Google Scholar 

  21. Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC-7 (1998)

    Google Scholar 

  22. Smith, D., Crane, G.: Disambiguating geographic names in a historical digital library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 127–136. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  23. Smith, D., Mann, G.: Bootstrapping toponym classifiers. In: Workshop Proc. of HLT-NAACL 2003, pp. 45–49 (2003)

    Google Scholar 

  24. Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  26. Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proc. ACL 2002, pp. 473–480 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Habib, M.B., van Keulen, M. (2013). A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38634-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38633-6

  • Online ISBN: 978-3-642-38634-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics