Clustering-based disambiguation of fine-grained place names from descriptions

Chen, Hao; Vasardani, Maria; Winter, Stephan

doi:10.1007/s10707-019-00341-6

Clustering-based disambiguation of fine-grained place names from descriptions

Published: 25 January 2019

Volume 23, pages 449–472, (2019)
Cite this article

GeoInformatica Aims and scope Submit manuscript

472 Accesses
Explore all metrics

Abstract

Everyday place descriptions often contain place names of fine-grained features, such as buildings or businesses, that are more difficult to disambiguate than names referring to larger places, for example cities or natural geographic features. Fine-grained places are often significantly more frequent and more similar to each other, and disambiguation heuristics developed for larger places, such as those based on population or containment relationships, are often not applicable in these cases. In this research, we address the disambiguation of fine-grained place names from everyday place descriptions. For this purpose, we evaluate the performance of different existing clustering-based approaches, since clustering approaches require no more knowledge other than the locations of ambiguous place names. We consider not only approaches developed specifically for place name disambiguation, but also clustering algorithms developed for general data mining that could potentially be leveraged. We compare these methods with a novel algorithm, and show that the novel algorithm outperforms the other algorithms in terms of disambiguation precision and distance error over several tested datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spatially-aware algorithm for location extraction from structured documents

Article 04 November 2022

Context and Vagueness in Automated Interpretation of Place Description: A Computational Model

Automated Interpretation of Place Descriptions: Determining Entity Types for Querying OSM

Article Open access 07 February 2023

Notes

References

Adelfio MD, Samet H (2013) Structured toponym resolution using combined hierarchical place categories. In: Proceedings of the 7th workshop on geographic information retrieval, pp 49–56
Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: Proceedings of SIGIR ’04 conference on research and development in information retrieval, pp 273–280
Angiulli F (2006) Clustering by exceptions. In: Proceedings of the national conference on artificial intelligence, pp 312–317
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD conference. Philadelphia, pp 49–60
Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas CTM (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71
Buscaldi D (2011) Approaches to disambiguating toponyms. SIGSPATIAL Special 3(2):16–19
Article Google Scholar
Buscaldi D, Magnini B (2010) Grounding toponyms in an Italian local news corpus. In: Proceedings of the 6th workshop on geographic information retrieval, pp 70–75
Buscaldi D, Rosso P (2008) A conceptual density-based approach for the disambiguation of toponyms. Int J Geogr Inf Sci 22(3):301–313
Article Google Scholar
Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceedings of the 2nd international workshop on geographic information retrieval, pp 19–22
Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda HXG (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 160–172
Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332
Article Google Scholar
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International conference on information and knowledge management, pp 759–768
Derungs C, Palacio D, Purves RS (2012) Resolving fine granularity toponyms: evaluation of a disambiguation approach. In: Proceedings of the 7th international conference on geographic information science, pp 1–5
Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining, pp 47–58
Ester M, Kriegel HP, Sander J, Xu X, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). Portland, pp 226–231
Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geograph Inf Sci 22(10):1039–1044. https://doi.org/10.1080/13658810701850497. http://www.tandfonline.com/doi/abs/10.1080/13658810701850497, arXiv:1011.1669v3
Article Google Scholar
Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod record, vol 27. ACM, pp 73–84
Habib MB, Keulen MV, van Keulen M (2012) Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proceedings of the international conference on knowledge discovery and information retrieval, KDIR 2012. Barcelona, pp 399–410
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc 28(1):100–108
Google Scholar
Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Research and advanced technology for digital libraries. Springer, pp 280–290 https://doi.org/10.1007/3-540-45268-0_26. http://link.springer.com/10.1007/3-540-45268-0_26
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Article Google Scholar
Kim J, Vasardani M, Winter S (2015) Harvesting large corpora for generating place graphs. In: Workshop on cognitive engineering for spatial information processes, COSIT 2015, pp 20–26
Kohonen T (1998) The self-organizing map. Neurocomputing 21(1–3):1–6
Article Google Scholar
Leidner JL (2008) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers
Leidner JL, Sinclair G, Webber B (2003) Grounding spatial named entities for information extraction and question answering. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references, pp 31–38
Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2007) STEWARD: architecture of a spatio-textual search engine. In: Samet H, Shahabi C, Schneider M (eds) Proceedings of the 15th annual ACM international symposium on advances in geographic information systems, Seattle, pp 186–193
Liu F, Vasardani M, Baldwin T (2014) Automatic identification of locative expressions from social media text: a comparative analysis. In: Proceedings of the 4th international workshop on location and the web. ACM, pp 9–16
Moncla L, Renteria-Agualimpia W, Nogueras-iso J, Gaio M (2014) Geocoding for texts with fine-grain toponyms : an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 183–192
Palacio D, Derungs C, Purves R (2015) Development and evaluation of a geographic information retrieval system using fine grained toponyms. J Spat Inf Sci 2015(11):1–29
Google Scholar
Ripley BD (1976) The second-order analysis of stationary point processes. J Appl Probab 13(2):255–266
Article Google Scholar
Roberts K, Bejan CA, Harabagiu SM (2010) Toponym disambiguation using events. In: FLAIRS conference, vol 10, p 1
Roller S, Speriosu M, Rallapalli S, Wing B, Baldridge J (2012) Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1500–1510
Smith DA, Crane G (2001) Disambiguating geographic names in a historical digital library. In: International conference on theory and practice of digital libraries, Springer, pp 127–136
Smith DA, Mann GS (2003) Bootstrapping toponym classifier. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references. Association for Computational Linguistics, pp 45-49
Teitler BE, Lieberman MD, Panozzo D, Sankaranarayanan J, Samet H, Sperling J (2008) NewsStand: a new view on news. In: Aref W G, Mokbel M F, Schneider M (eds) Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 144–153
Vasardani M, Timpf S, Winter S, Tomko M (2013) From descriptions to depictions: a conceptual framework. In: Tenbrink T, Stell J, Galton A, Wood Z (eds) Spatial information theory: 11th international conference COSIT 2013. Springer, pp 299–319
Vasardani M, Winter S, Richter KF (2013b) Locating place names from place descriptions. Int J Geogr Inf Sci 27(12):2509–2532
Article Google Scholar
Wing B, Baldridge J (2014) Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 336–348

Download references

Author information

Authors and Affiliations

University of Melbourne, Melbourne, Australia
Hao Chen, Maria Vasardani & Stephan Winter

Authors

Hao Chen
View author publications
You can also search for this author inPubMed Google Scholar
Maria Vasardani
View author publications
You can also search for this author inPubMed Google Scholar
Stephan Winter
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hao Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Vasardani, M. & Winter, S. Clustering-based disambiguation of fine-grained place names from descriptions. Geoinformatica 23, 449–472 (2019). https://doi.org/10.1007/s10707-019-00341-6

Download citation

Received: 21 February 2018
Revised: 04 November 2018
Accepted: 08 January 2019
Published: 25 January 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s10707-019-00341-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering-based disambiguation of fine-grained place names from descriptions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A spatially-aware algorithm for location extraction from structured documents

Context and Vagueness in Automated Interpretation of Place Description: A Computational Model

Automated Interpretation of Place Descriptions: Determining Entity Types for Querying OSM

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now