Abstract
Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE named entity system as used in MUC-7. In: Proc. of MUC-7 (1998)
Buscaldi, D., Rosso, P.: A conceptual density-based approach for the disambiguation of toponyms. Journal of Geographical Information Science 22(3), 301–313 (2008)
Carpenter, B.: Character language models for chinese word segmentation and named entity recognition. In: Association for Computational Linguistics, pp. 169–172 (2006)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of the 43rd ACL (2005)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proc. of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)
Furche, T., Grasso, G., Orsi, G., Schallhart, C., Wang, C.: Automatically learning gazetteers from the deep web. In: Proc. of the 21st International Conference Companion on World Wide Web, pp. 341–344 (2012)
Grishman, R., Sundheim, B.: Message understanding conference - 6: A brief history. In: Proc. of Int’l Conf. on Computational Linguistics, pp. 466–471 (1996)
Habib, M.B., van Keulen, M.: Named entity extraction and disambiguation: The reinforcement effect. In: Proc. of MUD 2011, Seatle, USA, pp. 9–16 (2011)
Habib, M.B., van Keulen, M.: Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proc. of KDIR 2012, pp. 399–410 (2012)
Hobbs, J., Appelt, D., Bear, J., Israel, D., Kameyama, M., Stickel, M., Tyson, M.: Fastus: A system for extracting information from text. In: Proc. of Human Language Technology, pp. 133–137 (1993)
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proc. of COLING 2002, pp. 1–7 (2002)
Leidner, J.L.: Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. Universal Press, Boca Raton (2008)
Lieberman, M.D., Samet, H.: Multifaceted toponym recognition for streaming news. In: Proc. of SIGIR 2011, pp. 843–852 (2011)
Martins, B., Anastácio, I., Calado, P.: A machine learning approach for resolving place references in text. In: Proc. of AGILE 2010 (2010)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proc. of CoNLL 2003, pp. 188–191 (2003)
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artificial Intelligence (2012), http://www.sciencedirect.com/science/article/pii/S0004370212000276
Pouliquen, B., Kimler, M., Steinberger, R., Ignat, C., Oellinger, T., Fluart, F., Zaghouani, W., Widiger, A., Charlotte Forslund, A., Best, C.: Geocoding multilingual texts: Recognition, disambiguation and visualisation. In: Proc. of LREC 2006, pp. 53–58 (2006)
Rauch, E., Bukatin, M., Baker, K.: A confidence-based framework for disambiguating geographic terms. In: Workshop Proc. of the HLT-NAACL 2003, pp. 50–54 (2003)
Rennie, J.D.M.: Using term informativeness for named entity detection. In: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 353–360 (2005)
Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: ACL 2008 (2008)
Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC-7 (1998)
Smith, D., Crane, G.: Disambiguating geographic names in a historical digital library. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 127–136. Springer, Heidelberg (2001)
Smith, D., Mann, G.: Bootstrapping toponym classifiers. In: Workshop Proc. of HLT-NAACL 2003, pp. 45–49 (2003)
Szarvas, G., Farkas, R., Kocsor, A.: A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds.) DS 2006. LNCS (LNAI), vol. 4265, pp. 267–278. Springer, Heidelberg (2006)
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967)
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proc. ACL 2002, pp. 473–480 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Habib, M.B., van Keulen, M. (2013). A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol 7912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38634-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-38634-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38633-6
Online ISBN: 978-3-642-38634-3
eBook Packages: Computer ScienceComputer Science (R0)