Abstract
The main idea behind Linked Data is to connect data from different sources together, in order to develop a hub of shared and publicly accessible knowledge. While the benefit of sharing knowledge is universally recognised, what is less visible is how much results can be affected when the knowledge in one dataset and in the connected ones are not equally distributed. This lack of balance in information, or bias, generally assumed a priori, can actually be quantified to improve the quality of the results of applications and analytics relying on such linked data. In this paper, we propose a process to measure how much bias one dataset contains when compared to another one, by identifying the most affected RDF properties and values within the set of entities that those datasets have in common (defined as the linkset). This process was ran on a wide range of linksets from Linked Data, and in the experiment section we present the results as well as measures of its performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beck, N., Scheglmann, S., Gottron, T.: LinDA: A service infrastructure for linked data analysis and provision of data statistics. In: Cimiano, P., Fernández, M., Lopez, V., Schlobach, S., Völker, J. (eds.) ESWC 2013. LNCS, vol. 7955, pp. 225–230. Springer, Heidelberg (2013)
Heath, T., Bizer, C.: Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology 1(1), 1–136 (2011)
Begg, C., Berlin, J.: Publication Bias: A Problem in Interpreting Medical Data. Journal of the Royal Statistical Society 151(3), 419–463 (1988)
Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: Analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010)
Doucouliagos, H., Laroche, P., Stanley, T.D.: Publication bias in union-productivity research? Relations Industrielles/Industrial Relations, 320–347 (2005)
Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: Collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013)
Gerber, A.S., Green, D.P., Nickerson, D.: Testing for publication bias in political science. Political Analysis 9(4), 385–392 (2001)
Gerber, A., Malhotra, N.: Can political science literatures be believed? A study of publication bias in the APSR and the AJPS. In: Annual Meeting of the Midwest Political Science Association, pp. 12–15 (2006)
Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: An analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)
Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web 14, 14–44 (2012)
Jaffri, A., Glaser, H., Millard, I.: Uri disambiguation in the context of linked data (2008)
Jain, P., Hitzler, P., Yeh, P.Z., Verma, K., Sheth, A.P.: Linked Data Is Merely More Data. In: AAAI Spring Symposium: linked data meets artificial intelligence (March 2010)
McCusker, J., McGuinness, D.L.: owl:sameAs considered harmful to provenance. In: Proceedings of the ISCB Conference on Semantics in Healthcare and Life Sciences (2010)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)
Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: looking for Clusters Explanations in a Labyrinth of Linked Data. In: 11th Extended Semantic Web Conference, ESWC 2014 (2014)
Tiddi, I., d’Aquin, M., Motta, E.: Walking Linked Data: a graph traversal approach to explain clusters. In: Proceedings of the Fifth International Workshop on Consuming Linked Data, COLD 2014 (2014)
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)
Zapilko, B., Harth, A., Mathiak, B.: Enriching and analysing statistics with linked open data. In: NTTS-Conference on New Techniques and Technologies for Statistics, Brussel (2011)
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S., Hitzler, P.: Quality assessment methodologies for linked open data. Submitted to Semantic Web Journal (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Tiddi, I., d’Aquin, M., Motta, E. (2014). Quantifying the Bias in Data Links. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds) Knowledge Engineering and Knowledge Management. EKAW 2014. Lecture Notes in Computer Science(), vol 8876. Springer, Cham. https://doi.org/10.1007/978-3-319-13704-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-13704-9_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13703-2
Online ISBN: 978-3-319-13704-9
eBook Packages: Computer ScienceComputer Science (R0)