A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation

Habib, Mena B.; van Keulen, Maurice

doi:10.1007/978-3-642-38634-3_1

Mena B. Habib¹⁸ &
Maurice van Keulen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7912))

Included in the following conference series:

Intelligent Information Systems Symposium

1115 Accesses
3 Citations

Abstract

Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Geoparsing Recognition and Extraction from Amazigh Corpus Using the NooJ Complex Annotation Structures

A Study on Turkish Meronym Extraction Using a Variety of Lexico-Syntactic Patterns

SynFinder: A System for Domain-Based Detection of Synonyms Using WordNet and the Web of Data

References

Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proc. of MUC-7 (1998)
Google Scholar
Buscaldi, D., Rosso, P.: A conceptual density-based approach for the disambiguation of toponyms. Journal of Geographical Information Science 22(3), 301–313 (2008)
Article Google Scholar
Carpenter, B.: Character language models for chinese word segmentation and named entity recognition. In: Association for Computational Linguistics, pp. 169–172 (2006)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of the 43rd ACL (2005)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proc. of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)
Google Scholar
Furche, T., Grasso, G., Orsi, G., Schallhart, C., Wang, C.: Automatically learning gazetteers from the deep web. In: Proc. of the 21st International Conference Companion on World Wide Web, pp. 341–344 (2012)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference - 6: A brief history. In: Proc. of Int’l Conf. on Computational Linguistics, pp. 466–471 (1996)
Google Scholar
Habib, M.B., van Keulen, M.: Named entity extraction and disambiguation: The reinforcement effect. In: Proc. of MUD 2011, Seatle, USA, pp. 9–16 (2011)
Google Scholar
Habib, M.B., van Keulen, M.: Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proc. of KDIR 2012, pp. 399–410 (2012)
Google Scholar
Hobbs, J., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: A system for extracting information from text. In: Proc. of Human Language Technology, pp. 133–137 (1993)
Google Scholar
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proc. of COLING 2002, pp. 1–7 (2002)
Google Scholar
Leidner, J.L.: Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Universal Press, Boca Raton (2008)
Google Scholar
Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming news. In: Proc. of SIGIR 2011, pp. 843–852 (2011)
Google Scholar
Martins, B., Anastácio, I., Calado, P.: A machine learning approach for resolving place references in text. In: Proc. of AGILE 2010 (2010)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proc. of CoNLL 2003, pp. 188–191 (2003)
Google Scholar
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artificial Intelligence (2012), http://www.sciencedirect.com/science/article/pii/S0004370212000276
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Fluart, F., Zaghouani, W., Widiger, A., Charlotte Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. In: Proc. of LREC 2006, pp. 53–58 (2006)
Google Scholar
Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Workshop Proc. of the HLT-NAACL 2003, pp. 50–54 (2003)
Google Scholar
Rennie, J.D.M.: Using term informativeness for named entity detection. In: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–360 (2005)
Google Scholar
Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: ACL 2008 (2008)
Google Scholar
Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC-7 (1998)
Google Scholar
Smith, D., Crane, G.: Disambiguating geographic names in a historical digital library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 127–136. Springer, Heidelberg (2001)
Chapter Google Scholar
Smith, D., Mann, G.: Bootstrapping toponym classifiers. In: Workshop Proc. of HLT-NAACL 2003, pp. 45–49 (2003)
Google Scholar
Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)
Chapter Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Article MATH Google Scholar
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proc. ACL 2002, pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of EEMCS, University of Twente, Enschede, The Netherlands
Mena B. Habib & Maurice van Keulen

Authors

Mena B. Habib
View author publications
You can also search for this author in PubMed Google Scholar
Maurice van Keulen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Mieczysław A. Kłopotek , Jacek Koronacki , Małgorzata Marciniak & Agnieszka Mykowiecka , , &
Institute of Computer Science, Polish Academy of Sciences, ul. Brzegi 55, 80-045, Gdańsk, Poland
Sławomir T. Wierzchoń

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Habib, M.B., van Keulen, M. (2013). A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-38634-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics