skip to main content
10.1145/3347146.3359339acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
poster

Learning Domain Specific Models for Toponym Interlinking

Published: 05 November 2019 Publication History

Abstract

Interlinking spatio-textual data comprises a core problem within the research literature, as well as a task of high practical importance in a plethora of industrial applications involving GIS systems. In its general form, it consists in identifying, between two sources of spatio-texual entities, pairs of entities that match, i.e. correspond to the same real-world entities. In this paper, we focus on interlinking spatio-textual entities based solely on their name, that is we handle the problem of toponym interlinking. To solve the problem, works in the literature exploit generic string similarity measures and either apply them as is, or integrate them as training features in classification models, without adapting/extending them based on the specific characteristics of toponyms. In this work, we showcase that domain knowledge can significantly improve the accuracy of toponym interlinking, by proposing domain specific similarity measures that take into account specificities of toponyms. We assess the implemented measures on Geonames and demonstrate significant increases in interlinking accuracy compared to baseline methods.

References

[1]
Mikhail Bilenko and Raymond J. Mooney. 2003. Adaptive Duplicate Detection Using Learnable String Similarity Measures. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '03).
[2]
P. Christen. 2006. A Comparison of Personal Name Matching: Techniques and Practical Issues. In Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).
[3]
William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. A Comparison of String Distance Metrics for Name-matching Tasks. In Proceedings of the 2003 International Conference on Information Integration on the Web (IIWEB'03).
[4]
Nilesh Dalvi, Marian Olteanu, Manish Raghavan, and Philip Bohannon. 2014. Deduplicating a Places Database. In Proceedings of the 23rd International Conference on World Wide Web (WWW '14).
[5]
Clodoveu A. Davis and Emerson de Salles. 2007. Approximate String Matching for Geographic Names and Personal Names. In GeoInfo.
[6]
Deniz Kılınç. 2016. An accurate toponym-matching measure based on approximate string matching. Journal of Information Science 42, 2 (2016), 138--149.
[7]
Bruno Martins. 2011. A Supervised Machine Learning Approach for Duplicate Detection over Gazetteer Records. In Proceedings of the 4th International Conference on GeoSpatial Semantics (GeoS'11).
[8]
Erwan Moreau, François Yvon, and Olivier Cappé. 2008. Robust Similarity Measures for Named Entities Matching. In Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1 (COLING '08).
[9]
George Papadakis, George Alexiou, George Papastefanatos, and Georgia Koutrika. [n. d.]. Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data. PVLDB ([n. d.]).
[10]
Gabriel Recchia and Max Louwerse. 2013. A Comparison of String Similarity Measures for Toponym Matching. COMP 2013 - ACM SIGSPATIAL International Workshop on Computational Models of Place (11 2013), 54--61.
[11]
Rui Santos, Patricia Murrieta-Flores, Pável Calado, and Bruno Martins. 2017. Toponym matching through deep neural networks. International Journal of Geographical Information Science 32 (10 2017), 1--25.
[12]
Rui Santos, Patricia Murrieta-Flores, and Bruno Martins. 2017. Learning to combine multiple string similarity metrics for effective toponym matching. International Journal of Digital Earth 11 (09 2017), 1--26.
[13]
Vivek Sehgal, Lise Getoor, and Peter D Viechnicki. 2006. Entity Resolution in Geospatial Data Integration. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems (GIS 06).
[14]
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, and James Fu. 2010. Detecting Nearly Duplicated Records in Location Datasets. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS 10).

Cited By

View all
  • (2024) DePNR : A DeBERTa ‐based deep learning model with complete position embedding for place name recognition from geographical literature Transactions in GIS10.1111/tgis.13170Online publication date: 3-May-2024
  • (2020)Boosting toponym interlinking by paying attention to both machine and deep learningProceedings of the Sixth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data10.1145/3403896.3403970(1-5)Online publication date: 14-Jun-2020
  • (2020)Aligning geographic entities from historical maps for building knowledge graphsInternational Journal of Geographical Information Science10.1080/13658816.2020.1845702(1-30)Online publication date: 12-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2019
648 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2019

Check for updates

Author Tags

  1. Interlinking
  2. Learning
  3. String Similarity
  4. Toponym

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

SIGSPATIAL '19
Sponsor:

Acceptance Rates

SIGSPATIAL '19 Paper Acceptance Rate 34 of 161 submissions, 21%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024) DePNR : A DeBERTa ‐based deep learning model with complete position embedding for place name recognition from geographical literature Transactions in GIS10.1111/tgis.13170Online publication date: 3-May-2024
  • (2020)Boosting toponym interlinking by paying attention to both machine and deep learningProceedings of the Sixth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data10.1145/3403896.3403970(1-5)Online publication date: 14-Jun-2020
  • (2020)Aligning geographic entities from historical maps for building knowledge graphsInternational Journal of Geographical Information Science10.1080/13658816.2020.1845702(1-30)Online publication date: 12-Nov-2020
  • (2020)Learning Advanced Similarities and Training Features for Toponym InterlinkingAdvances in Information Retrieval10.1007/978-3-030-45439-5_8(111-125)Online publication date: 14-Apr-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media