Skip to main content

Quantifying the Bias in Data Links

  • Conference paper
Knowledge Engineering and Knowledge Management (EKAW 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8876))

Abstract

The main idea behind Linked Data is to connect data from different sources together, in order to develop a hub of shared and publicly accessible knowledge. While the benefit of sharing knowledge is universally recognised, what is less visible is how much results can be affected when the knowledge in one dataset and in the connected ones are not equally distributed. This lack of balance in information, or bias, generally assumed a priori, can actually be quantified to improve the quality of the results of applications and analytics relying on such linked data. In this paper, we propose a process to measure how much bias one dataset contains when compared to another one, by identifying the most affected RDF properties and values within the set of entities that those datasets have in common (defined as the linkset). This process was ran on a wide range of linksets from Linked Data, and in the experiment section we present the results as well as measures of its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beck, N., Scheglmann, S., Gottron, T.: LinDA: A service infrastructure for linked data analysis and provision of data statistics. In: Cimiano, P., Fernández, M., Lopez, V., Schlobach, S., Völker, J. (eds.) ESWC 2013. LNCS, vol. 7955, pp. 225–230. Springer, Heidelberg (2013)

    Google Scholar 

  2. Heath, T., Bizer, C.: Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology 1(1), 1–136 (2011)

    Article  Google Scholar 

  3. Begg, C., Berlin, J.: Publication Bias: A Problem in Interpreting Medical Data. Journal of the Royal Statistical Society 151(3), 419–463 (1988)

    Article  Google Scholar 

  4. Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: Analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Doucouliagos, H., Laroche, P., Stanley, T.D.: Publication bias in union-productivity research? Relations Industrielles/Industrial Relations, 320–347 (2005)

    Google Scholar 

  6. Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: Collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Gerber, A.S., Green, D.P., Nickerson, D.: Testing for publication bias in political science. Political Analysis 9(4), 385–392 (2001)

    Article  Google Scholar 

  8. Gerber, A., Malhotra, N.: Can political science literatures be believed? A study of publication bias in the APSR and the AJPS. In: Annual Meeting of the Midwest Political Science Association, pp. 12–15 (2006)

    Google Scholar 

  9. Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)

    Google Scholar 

  10. Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: An analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web 14, 14–44 (2012)

    Article  Google Scholar 

  12. Jaffri, A., Glaser, H., Millard, I.: Uri disambiguation in the context of linked data (2008)

    Google Scholar 

  13. Jain, P., Hitzler, P., Yeh, P.Z., Verma, K., Sheth, A.P.: Linked Data Is Merely More Data. In: AAAI Spring Symposium: linked data meets artificial intelligence (March 2010)

    Google Scholar 

  14. McCusker, J., McGuinness, D.L.: owl:sameAs considered harmful to provenance. In: Proceedings of the ISCB Conference on Semantics in Healthcare and Life Sciences (2010)

    Google Scholar 

  15. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)

    Article  Google Scholar 

  16. Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: looking for Clusters Explanations in a Labyrinth of Linked Data. In: 11th Extended Semantic Web Conference, ESWC 2014 (2014)

    Google Scholar 

  17. Tiddi, I., d’Aquin, M., Motta, E.: Walking Linked Data: a graph traversal approach to explain clusters. In: Proceedings of the Fifth International Workshop on Consuming Linked Data, COLD 2014 (2014)

    Google Scholar 

  18. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Zapilko, B., Harth, A., Mathiak, B.: Enriching and analysing statistics with linked open data. In: NTTS-Conference on New Techniques and Technologies for Statistics, Brussel (2011)

    Google Scholar 

  20. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S., Hitzler, P.: Quality assessment methodologies for linked open data. Submitted to Semantic Web Journal (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tiddi, I., d’Aquin, M., Motta, E. (2014). Quantifying the Bias in Data Links. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds) Knowledge Engineering and Knowledge Management. EKAW 2014. Lecture Notes in Computer Science(), vol 8876. Springer, Cham. https://doi.org/10.1007/978-3-319-13704-9_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13704-9_40

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13703-2

  • Online ISBN: 978-3-319-13704-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics