Skip to main content
Log in

Recommending Geo-semantically Related Classes for Link Discovery

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

The growth of Web of Data led to the development of dataset recommendation methodologies, which automate the discovery of datasets that may contain same or related instances (i.e., objects), in order to be used as input for several tasks including Link Discovery. The recommendation process takes as input one dataset (or any tripleset) and proposes other datasets which are the most likely to contain related instances. Existing recommenders determine the relevance between datasets by comparing their textual and structural similarity or by examining existing links among them. In this paper, we determine relevancy by comparing the geospatial relatedness of triplesets containing instances belonging to spatial classes (that is, classes containing instances whose locations are georeferenced by point geometries) based on the hypothesis that pairs of classes whose instances present similar spatial distribution are likely to contain semantically related instances. The proposed methodology builds summaries that capture the spatial distribution of classes. It utilizes the summaries, first, to rule out irrelevant (to an input class) classes by applying spatial filters and, then, to rank the remaining classes by applying a geospatial relatedness measure, so as the top ranked classes are more probable to contain related instances. The methodology’s evaluation contains an exploration of Web of Data spatial classes characteristics and a discussion of the experiment results that validate our hypothesis. We show that the spatial filtering reduces effectively and efficiently up to 99% the search space for relevant classes in Web of Data and that the proposed geospatial relatedness measures generate ranked lists of recommended classes with 62% mean average precision, approximately 35% higher than simple baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://lod-cloud.net/.

  2. http://datahub.io/.

  3. http://lod-cloud.net.

  4. http://stats.lod2.eu.

  5. https://www.w3.org/2003/01/geo/.

  6. They refer in term co-occurrence in text corpuses.

  7. http://vocab.deri.ie/void#Dataset.

  8. University Ontology (https://www.cs.umd.edu/projects/plus/SHOE/onts/univ1.0.html).

  9. Dublin Core Metadata Initiative (http://www.dublincore.org/specifications/dublin-core/dcmi-terms/).

  10. http://geo-aegean.aegean.gr:8080/WoDSpatialClassRecommender/.

  11. The implementation can be extended so as the source class to be any point spatial dataset specified by the user (such as a personal shapefile, a geoJSON file, a Web Feature Service or a spatial class from a non-identified SPARQL endpoint).

  12. https://old.datahub.io/.

  13. http://lov.okfn.org.

  14. http://lov4iot.appspot.com/?p=ontologies.

  15. http://geovocab.org/.

  16. http://www.geosparql.org/.

  17. http://data.ordnancesurvey.co.uk/ontology.

  18. http://www.geonames.org/ontology/.

  19. http://www.georss.org/rdf_rss1.html.

  20. The respective SPARQL queries for the rest ontologies are available at https://github.com/vkopsachilis/WoDSpatialClassRecommender.

  21. The full list of identified spatial classes is available at https://github.com/vkopsachilis/WoDSpatialClassRecommender.

  22. The respective SPARQL queries for the rest ontologies are available at https://github.com/vkopsachilis/WoDSpatialClassRecommender.

  23. The full ground truth list is available at https://github.com/vkopsachilis/WoDSpatialClassRecommender.

  24. https://github.com/emir-munoz/ws4j.

References

  1. Adelfio MD, Nutanong S, Samet H (2011) Similarity search on a large collection of point sets. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, New York, GIS ’11, pp 132–141. https://doi.org/10.1145/2093973.2093992

  2. Ballatore A, Bertolotto M, Wilson DC (2014) An evaluative baseline for geo-semantic relatedness and similarity. GeoInformatica 18(4):747–767

    Article  Google Scholar 

  3. Ben Ellefi M, Bellahsene Z, Dietze S, Todorov K (2016a) Beyond established knowledge graphs-recommending web datasets for data linking. In: Bozzon A, Cudre-Maroux P, Pautasso C (eds) Web engineering. Springer, Cham, pp 262–279

  4. Ben Ellefi M, Bellahsene Z, Dietze S, Todorov K (2016b) Dataset recommendation for data linking: An intensional approach. In: Proceedings of the 13th international conference on the semantic web. latest advances and new domains, vol 9678. Springer, Berlin, pp 36–51

  5. Berners-Lee T (2006) Linked data. https://www.w3.org/DesignIssues/LinkedData.html. Last accessed 16 August 2019

  6. Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL conference 2009

  7. Caraballo AAM, Arruda NM, Nunes BP, Lopes GR, Casanova MA (2014) Trtml—a tripleset recommendation tool based on supervised learning algorithms. In: Presutti V, Blomqvist E, Troncy R, Sack H, Papadakis I, Tordai A (eds) The semantic web: ESWC 2014 satellite events. Springer, Cham, pp 413–417

    Chapter  Google Scholar 

  8. Caraballo AAM, Nunes BP, Casanova MA (2016) Drx: A lod dataset interlinking recommendation tool

  9. Chapman A, Simperl EPB, Koesten L, Konstantinidis G, Ibáñez-Gonzalez LD, Kacprzak E, Groth PT (2019) Dataset search: a survey. arXiv:abs/1901.00735

  10. Das Sarma A, Fang L, Gupta N, Halevy A, Lee H, Wu F, Xin R, Yu C (2012) Finding related tables. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, New York, SIGMOD ’12, pp 817–828. https://doi.org/10.1145/2213836.2213962

  11. Efstathiades C, Belesiotis A, Skoutas D, Pfoser D (2016) Similarity search on spatio-textual point sets. In: EDBT

  12. Emaldi M, Corcho Ó, de Ipiña DL (2014) Detection of related semantic datasets based on frequent subgraph mining. In: IESD@ISWC

  13. Feliachi A, Abadie N, Hamdi F (2017) An adaptive approach for interlinking georeferenced data. In: Proceedings of the knowledge capture conference. ACM, New York, K-CAP 2017, pp 12:1–12:8

  14. Harth A, Hose K, Karnstedt M, Polleres A, Sattler KU, Umbrich J (2010) Data summaries for on-demand queries over linked data. In: Proceedings of the 19th international conference on world wide web. ACM, New York, WWW ’10, pp 411–420. https://doi.org/10.1145/1772690.1772733

  15. Heath T, Bizer C (2011) Linked data: evolving the web into a global data space. Synth Lect Seman Web Theory Technol 1(1):1–136. https://doi.org/10.2200/S00334ED1V01Y201102WBE001

    Article  Google Scholar 

  16. Hecht B, Raubal M (2008) GeoSR: Geographically explore semantic relations in world knowledge. Springer, Berlin, pp 95–113

  17. Kanza Y, Kravi E, Safra E, Sagiv Y (2017) Location-based distance measures for geosocial similarity. ACM Trans Web 11(3):17:1–17:32. https://doi.org/10.1145/3054951

    Article  Google Scholar 

  18. Kufer S, Henrich A (2014) Hybrid quantized resource descriptions for geospatial source selection. In: Proceedings of the 4th international workshop on location and the web. ACM, New York, LocWeb ’14, pp 17–24. https://doi.org/10.1145/2663713.2664428

  19. Lehmberg O, Ritze D, Ristoski P, Meusel R, Paulheim H, Bizer C (2015) The mannheim search join engine. Web Semant 35(P3):159–166. https://doi.org/10.1016/j.websem.2015.05.001

    Article  Google Scholar 

  20. Leme LAPP, Lopes GR, Nunes BP, Casanova MA, Dietze S (2013) Identifying candidate datasets for data interlinking. In: Daniel F, Dolog P, Li Q (eds) Web engineering. Springer, Berlin, pp 354–366

    Chapter  Google Scholar 

  21. Liu H, Wang T, Tang J, Ning H, Wei D, Xie S, Liu P (2016) Identifying linked data datasets for sameas interlinking using recommendation techniques. In: Cui B, Zhang N, Xu J, Lian X, Liu D (eds) Web-age information management. Springer, Cham, pp 298–309

    Chapter  Google Scholar 

  22. Liu H, Wang T, Tang J, Ning H, Wei D (2017) Link prediction of datasets sameAs interlinking network on web of data. In: 3rd international conference on information management (ICIM), pp 346–352. https://doi.org/10.1109/INFOMAN.2017.7950406

  23. Lopes GR, Leme LAPP, Nunes BP, Casanova MA, Dietze S (2013) Recommending tripleset interlinking through a social network approach. In: Lin X, Manolopoulos Y, Srivastava D, Huang G (eds) Web information systems engineering—WISE 2013. Springer, Berlin, pp 149–161

    Chapter  Google Scholar 

  24. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  Google Scholar 

  25. Martins YC, da Mota FF, Cavalcanti MC (2016) Dscrank: a method for selection and ranking of datasets. In: Garoufallou E, Subirats Coll I, Stellato A, Greenberg J (eds) Metadata and semantics research. Springer, Cham, pp 333–344

    Chapter  Google Scholar 

  26. Mehdi M, Iqbal A, Hogan A, Hasnain A, Khan Y, Decker S, Sahay R (2014) Discovering domain-specific public sparql endpoints: a life-sciences use-case. In: Proceedings of the 18th international database engineering and applications symposium. ACM, New York, IDEAS ’14, pp 39–45. https://doi.org/10.1145/2628194.2628220

  27. Mountantonakis M, Tzitzikas Y (2018) Scalable methods for measuring the connectivity and quality of large numbers of linked datasets. J Data Inf Qual 9(3):15:1–15:49

    Google Scholar 

  28. Nentwig M, Hartung M, Ngonga Ngomo AC, Rahm E (2015) A survey of current link discovery frameworks. Semantic Web (Preprint):1–18. http://www.semantic-web-journal.net/system/files/swj1117.pdf

  29. Neumaier S, Polleres A (2019) Enabling spatio-temporal search in open data. J Web Semant 55:21–36. https://doi.org/10.1016/j.websem.2018.12.007

    Article  Google Scholar 

  30. Ngomo ACN, Auer S (2011) Limes - a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI

  31. Nikolov A, d’Aquin M (2011) Identifying relevant sources for data linking using a semantic web index. In: WWW2011 workshop: linked data on the web (LDOW 2011) at 20th international world wide web conference (WWW 2011)

  32. Nikolov A, d’Aquin M, Motta E (2012) What should I link to? Identifying relevant sources and classes for data linking. In: Pan JZ, Chen H, Kim HG, Li J, Horrocks I, Mizoguchi R, Wu Z, Wu Z (eds) The semantic web. Springer, Berlin, pp 284–299

    Chapter  Google Scholar 

  33. Röder M, Ngonga Ngomo AC, Ermilov I, Both A (2016) Detecting similar linked datasets using topic modelling. In: Proceedings of the 13th international conference on the semantic web. Latest advances and new domains, vol 9678. Springer, Berlin, pp 3–19

  34. Saleem M, Khan Y, Hasnain A, Ermilov I, Ngonga Ngomo AC (2014) A fine-grained evaluation of sparql endpoint federation systems. Semant Web J. https://doi.org/10.3233/SW-150186

  35. Schmachtenberg M, Bizer C, Paulheim H (2014a) Adoption of the linked data best practices in different topical domains. In: Mika P, Tudorache T, Bernstein A, Welty C, Knoblock C, Vrandečić D, Groth P, Noy N, Janowicz K, Goble C (eds) The semantic web—ISWC 2014. Springer, Cham, pp 245–260

    Chapter  Google Scholar 

  36. Schmachtenberg M, Bizer C, Paulheim H (2014b) State of the lod cloud 2014. http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/. Last accessed 16 August 2019

  37. Schwering A, Raubal M (2005) Spatial relations for semantic similarity measurement. In: Akoka J, Liddle SW, Song IY, Bertolotto M, Comyn-Wattiau I, van den Heuvel WJ, Kolp M, Trujillo J, Kop C, Mayr HC (eds) Perspectives in conceptual modeling. Springer, Berlin, pp 259–269

    Chapter  Google Scholar 

  38. Sherif MA, Ngomo ACN (2017) A systematic survey of point set distance measures for link discovery. Semant Web 9:589–604

    Article  Google Scholar 

  39. Sun W, Chou CP, Stacy AW, Ma H, Unger J, Gallaher P (2007) Sas and spss macros to calculate standardized Cronbach’s alpha using the upper bound of the phi coefficient for dichotomous items. Behav Res Methods 39(1):71–81. https://doi.org/10.3758/BF03192845

    Article  Google Scholar 

  40. Tobler WR (1970) A computer movie simulating urban growth in the detroit region. Econ Geogr 46(sup1):234–240. https://doi.org/10.2307/143141

    Article  Google Scholar 

  41. Tummarello G, Cyganiak R, Catasta M, Danielczyk S, Delbru R, Decker S (2010) Sig.ma: live views on the web of data. J Web Semant 8(4):355–364. https://doi.org/10.1016/j.websem.2010.08.003

    Article  Google Scholar 

  42. Vidal ME, Castillo S, Acosta M, Montoya G, Palma G (2016) On the selection of sparql endpoints to efficiently execute federated sparql queries. In: Hameurlain A, Kung J, Wagner R (eds) Transactions on large-scale data- and knowledge-centered systems XXV. Springer, Berlin, pp 109–149

    Chapter  Google Scholar 

  43. Vilches-Blázquez LM, Saquicela V, Corcho O (2012) Interlinking geospatial information in the web of data. Springer, Berlin, pp 119–139

  44. Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: Bernstein A, Karger DR, Heath T, Feigenbaum L, Maynard D, Motta E, Thirunarayan K (eds) The semantic web—ISWC 2009. Springer, Berlin, pp 650–665

    Chapter  Google Scholar 

  45. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics, Stroudsburg, PA, ACL ’94, pp 133–138. https://doi.org/10.3115/981732.981751

  46. Zhu R, Hu Y, Janowicz K, McKenzie G (2016) Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics. Trans GIS 20(3):333–355. https://doi.org/10.1111/tgis.12232

    Article  Google Scholar 

Download references

Acknowledgements

This research is being supported by the funding program “YPATIA” of University of Aegean.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasilis Kopsachilis.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kopsachilis, V., Vaitis, M., Mamoulis, N. et al. Recommending Geo-semantically Related Classes for Link Discovery. J Data Semant 9, 151–177 (2020). https://doi.org/10.1007/s13740-020-00117-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-020-00117-4

Keywords

Navigation