Abstract
Everyday place descriptions often contain place names of fine-grained features, such as buildings or businesses, that are more difficult to disambiguate than names referring to larger places, for example cities or natural geographic features. Fine-grained places are often significantly more frequent and more similar to each other, and disambiguation heuristics developed for larger places, such as those based on population or containment relationships, are often not applicable in these cases. In this research, we address the disambiguation of fine-grained place names from everyday place descriptions. For this purpose, we evaluate the performance of different existing clustering-based approaches, since clustering approaches require no more knowledge other than the locations of ambiguous place names. We consider not only approaches developed specifically for place name disambiguation, but also clustering algorithms developed for general data mining that could potentially be leveraged. We compare these methods with a novel algorithm, and show that the novel algorithm outperforms the other algorithms in terms of disambiguation precision and distance error over several tested datasets.
Similar content being viewed by others
References
Adelfio MD, Samet H (2013) Structured toponym resolution using combined hierarchical place categories. In: Proceedings of the 7th workshop on geographic information retrieval, pp 49–56
Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: Proceedings of SIGIR ’04 conference on research and development in information retrieval, pp 273–280
Angiulli F (2006) Clustering by exceptions. In: Proceedings of the national conference on artificial intelligence, pp 312–317
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD conference. Philadelphia, pp 49–60
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas CTM (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71
Buscaldi D (2011) Approaches to disambiguating toponyms. SIGSPATIAL Special 3(2):16–19
Buscaldi D, Magnini B (2010) Grounding toponyms in an Italian local news corpus. In: Proceedings of the 6th workshop on geographic information retrieval, pp 70–75
Buscaldi D, Rosso P (2008) A conceptual density-based approach for the disambiguation of toponyms. Int J Geogr Inf Sci 22(3):301–313
Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceedings of the 2nd international workshop on geographic information retrieval, pp 19–22
Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda HXG (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 160–172
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International conference on information and knowledge management, pp 759–768
Derungs C, Palacio D, Purves RS (2012) Resolving fine granularity toponyms: evaluation of a disambiguation approach. In: Proceedings of the 7th international conference on geographic information science, pp 1–5
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining, pp 47–58
Ester M, Kriegel HP, Sander J, Xu X, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). Portland, pp 226–231
Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geograph Inf Sci 22(10):1039–1044. https://doi.org/10.1080/13658810701850497. http://www.tandfonline.com/doi/abs/10.1080/13658810701850497, arXiv:1011.1669v3
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod record, vol 27. ACM, pp 73–84
Habib MB, Keulen MV, van Keulen M (2012) Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proceedings of the international conference on knowledge discovery and information retrieval, KDIR 2012. Barcelona, pp 399–410
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc 28(1):100–108
Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Research and advanced technology for digital libraries. Springer, pp 280–290 https://doi.org/10.1007/3-540-45268-0_26. http://link.springer.com/10.1007/3-540-45268-0_26
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Kim J, Vasardani M, Winter S (2015) Harvesting large corpora for generating place graphs. In: Workshop on cognitive engineering for spatial information processes, COSIT 2015, pp 20–26
Kohonen T (1998) The self-organizing map. Neurocomputing 21(1–3):1–6
Leidner JL (2008) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers
Leidner JL, Sinclair G, Webber B (2003) Grounding spatial named entities for information extraction and question answering. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references, pp 31–38
Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2007) STEWARD: architecture of a spatio-textual search engine. In: Samet H, Shahabi C, Schneider M (eds) Proceedings of the 15th annual ACM international symposium on advances in geographic information systems, Seattle, pp 186–193
Liu F, Vasardani M, Baldwin T (2014) Automatic identification of locative expressions from social media text: a comparative analysis. In: Proceedings of the 4th international workshop on location and the web. ACM, pp 9–16
Moncla L, Renteria-Agualimpia W, Nogueras-iso J, Gaio M (2014) Geocoding for texts with fine-grain toponyms : an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 183–192
Palacio D, Derungs C, Purves R (2015) Development and evaluation of a geographic information retrieval system using fine grained toponyms. J Spat Inf Sci 2015(11):1–29
Ripley BD (1976) The second-order analysis of stationary point processes. J Appl Probab 13(2):255–266
Roberts K, Bejan CA, Harabagiu SM (2010) Toponym disambiguation using events. In: FLAIRS conference, vol 10, p 1
Roller S, Speriosu M, Rallapalli S, Wing B, Baldridge J (2012) Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1500–1510
Smith DA, Crane G (2001) Disambiguating geographic names in a historical digital library. In: International conference on theory and practice of digital libraries, Springer, pp 127–136
Smith DA, Mann GS (2003) Bootstrapping toponym classifier. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references. Association for Computational Linguistics, pp 45-49
Teitler BE, Lieberman MD, Panozzo D, Sankaranarayanan J, Samet H, Sperling J (2008) NewsStand: a new view on news. In: Aref W G, Mokbel M F, Schneider M (eds) Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 144–153
Vasardani M, Timpf S, Winter S, Tomko M (2013) From descriptions to depictions: a conceptual framework. In: Tenbrink T, Stell J, Galton A, Wood Z (eds) Spatial information theory: 11th international conference COSIT 2013. Springer, pp 299–319
Vasardani M, Winter S, Richter KF (2013b) Locating place names from place descriptions. Int J Geogr Inf Sci 27(12):2509–2532
Wing B, Baldridge J (2014) Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 336–348
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, H., Vasardani, M. & Winter, S. Clustering-based disambiguation of fine-grained place names from descriptions. Geoinformatica 23, 449–472 (2019). https://doi.org/10.1007/s10707-019-00341-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-019-00341-6