Skip to main content

LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8194))

Abstract

During the last years, RDF datasets from almost any knowledge domain have been published in the Linking Open Data (LOD) cloud. The Linked Open Data guidelines establish the conditions to be satisfied by resources in order to be included as part of the LOD cloud, as well as connected to previously published data. The process of publication and linkage of resources in the LOD cloud relies on: i) data cleaning and transformation into existing RDF formats, ii) storage of the data into RDF storage systems, and iii) data interlinking. Because of data source heterogeneity, generated RDF data may be ambiguous and links may be incomplete with respect to this data. Users of the Web of Data require linked data to meet high quality standards in order to develop applications that can produce trustworthy results, but data in the LOD cloud has not been curated; thus, tools are necessary to detect data quality problems. For example, researchers that study Life Sciences datasets to explain phenomena or identify anomalies, demand that their findings correspond to current discoveries, and not to the effect of low data quality standards of completeness or redundancy. In this paper we propose LiQuate, a system that uses Bayesian networks to study the incompleteness of links, and ambiguities between labels and between links in the LOD cloud, and can be applied to any domain. Additionally, a probabilistic rule-based system is used to infer new links that associate equivalent resources, and allow to resolve the ambiguities and incompleteness identified during the exploration of the Bayesian network. As a proof of concept, we applied LiQuate to existing Life Sciences linked datasets, and detected ambiguities in the data, that may compromise the confidence of the results of applications such as link prediction or pattern discovery. We illustrate a variety of identified problems and propose a set of enriched intra- and inter-links that may improve the quality of data items and links of specific datasets of the LOD cloud.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of VLDB 2007 (2007)

    Google Scholar 

  2. Broecheler, M., Mihalkova, L., Getoor, L.: Probabilistic similarity logic. In: Conference on Uncertainty in Artificial Intelligence (2010)

    Google Scholar 

  3. Ceri, S., Gottlob, G., Tanga, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering 1(1) (1989)

    Google Scholar 

  4. Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)

    Google Scholar 

  5. Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press (2009)

    Google Scholar 

  6. Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: WWW (2012)

    Google Scholar 

  7. Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: EDBT/ICDT Workshop on Linked Web Data Management (2011)

    Google Scholar 

  8. Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. SIGMOD Record 30(2), 461–472 (2001)

    Article  Google Scholar 

  9. Guret, C., Groth, P., Stadler, C., Lehmann, J.: Linked data quality assessment through network analysis. In: ISWC 2011 Posters and Demos (2011)

    Google Scholar 

  10. Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl: sameas isn’t the same: An analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: Linkedct: A linked data space for clinical trials. CoRR, abs/0908.0567 (2009)

    Google Scholar 

  12. Hassanzadeh, O., Yeganeh, S.H., Miller, R.J.: Linking semistructured data on the web. In: WebDB (2011)

    Google Scholar 

  13. Isele, R., Jentzsch, A., Bizer, C.: Silk server - adding missing links while consuming linked data. In: 1st International Workshop on Consuming Linked Data (COLD 2010), Shanghai (2010)

    Google Scholar 

  14. Jentzsch, A., Andersson, B., Hassanzadeh, O., Stephens, S., Bizer, C.: Enabling Tailored Therapeutics with Linked Data. In: Proceedings of the WWW 2009 Workshop on Linked Data on the Web, LDOW 2009 (2009)

    Google Scholar 

  15. Kimmig, A., Bach, S.H., Broecheler, M., Huang, B., Getoor, L.: A short introduction to probabilistic soft logic. In: NIPS Workshop on Probabilistic Programming: Foundations and Applications (2012)

    Google Scholar 

  16. Langegger, A., Wolfram, W.: Rdfstats - an extensible rdf statistics generator and library. In: DEXA Workshops (2009)

    Google Scholar 

  17. Maali, F., Cyganiak, R., Peristeras, V.: Re-using cool uris: Entity reconciliation against lod hubs. In: Proceedings of the Linked Data on the Web Workshop 2011 (LDOW 2011), WWW 2011 (2011)

    Google Scholar 

  18. Memory, A., Kimmig, A., Bach, S.H., Raschid, L., Getoor, L.: Graph summarization in annotated data using probabilistic soft logic. In: URSW (2012)

    Google Scholar 

  19. Naumann, F., Sattler, K.-U.: Information quality: Fundamentals, techniques, and use (2006)

    Google Scholar 

  20. Ruckhaus, E., Vidal, M.-E.: The BAY-HIST Prediction Model for RDF Documents. In: Proceedings of the 2nd ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web-CEUR, vol. 611, pp. 30–41 (2010)

    Google Scholar 

  21. Stankovic, M., Jovanovic, J., Laublet, P.: Linked data metrics for flexible expert search on the open web. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 108–123. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  22. Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X.-N.: Link prediction for annotation graphs using graph summarization. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 714–729. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Villazón-Terrazas, B., Vilches-Blázquez, L., Corcho, O., Gómez-Pérez, A.: Methodological guidelines for publishing government linked data linking government data. In: Wood, D. (ed.) Linking Government Data, ch. 2, pp. 27–49. Springer, New York (2011)

    Google Scholar 

  24. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  25. W3C. OWL Web Ontology Language Reference (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruckhaus, E., Vidal, ME. (2013). LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud. In: Lacroix, Z., Ruckhaus, E., Vidal, ME. (eds) Resource Discovery. RED 2012. Lecture Notes in Computer Science, vol 8194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45263-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45263-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45262-8

  • Online ISBN: 978-3-642-45263-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics