Quantifying the Bias in Data Links

Tiddi, Ilaria; d’Aquin, Mathieu; Motta, Enrico

doi:10.1007/978-3-319-13704-9_40

Ilaria Tiddi²³,
Mathieu d’Aquin²³ &
Enrico Motta²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8876))

Included in the following conference series:

International Conference on Knowledge Engineering and Knowledge Management

1359 Accesses
2 Citations

Abstract

The main idea behind Linked Data is to connect data from different sources together, in order to develop a hub of shared and publicly accessible knowledge. While the benefit of sharing knowledge is universally recognised, what is less visible is how much results can be affected when the knowledge in one dataset and in the connected ones are not equally distributed. This lack of balance in information, or bias, generally assumed a priori, can actually be quantified to improve the quality of the results of applications and analytics relying on such linked data. In this paper, we propose a process to measure how much bias one dataset contains when compared to another one, by identifying the most affected RDF properties and values within the set of entities that those datasets have in common (defined as the linkset). This process was ran on a wide range of linksets from Linked Data, and in the experiment section we present the results as well as measures of its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Methods and Systems for the Linked Data

Network Metrics for Assessing the Quality of Entity Resolution Between Multiple Datasets

An Analysis of Links in Wikidata

References

Beck, N., Scheglmann, S., Gottron, T.: LinDA: A service infrastructure for linked data analysis and provision of data statistics. In: Cimiano, P., Fernández, M., Lopez, V., Schlobach, S., Völker, J. (eds.) ESWC 2013. LNCS, vol. 7955, pp. 225–230. Springer, Heidelberg (2013)
Google Scholar
Heath, T., Bizer, C.: Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology 1(1), 1–136 (2011)
Article Google Scholar
Begg, C., Berlin, J.: Publication Bias: A Problem in Interpreting Medical Data. Journal of the Royal Statistical Society 151(3), 419–463 (1988)
Article Google Scholar
Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: Analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010)
Chapter Google Scholar
Doucouliagos, H., Laroche, P., Stanley, T.D.: Publication bias in union-productivity research? Relations Industrielles/Industrial Relations, 320–347 (2005)
Google Scholar
Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: Collection and exploitation. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013)
Chapter Google Scholar
Gerber, A.S., Green, D.P., Nickerson, D.: Testing for publication bias in political science. Political Analysis 9(4), 385–392 (2001)
Article Google Scholar
Gerber, A., Malhotra, N.: Can political science literatures be believed? A study of publication bias in the APSR and the AJPS. In: Annual Meeting of the Midwest Political Science Association, pp. 12–15 (2006)
Google Scholar
Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)
Google Scholar
Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs isn’t the same: An analysis of identity in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010)
Chapter Google Scholar
Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web 14, 14–44 (2012)
Article Google Scholar
Jaffri, A., Glaser, H., Millard, I.: Uri disambiguation in the context of linked data (2008)
Google Scholar
Jain, P., Hitzler, P., Yeh, P.Z., Verma, K., Sheth, A.P.: Linked Data Is Merely More Data. In: AAAI Spring Symposium: linked data meets artificial intelligence (March 2010)
Google Scholar
McCusker, J., McGuinness, D.L.: owl:sameAs considered harmful to provenance. In: Proceedings of the ISCB Conference on Semantics in Healthcare and Life Sciences (2010)
Google Scholar
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)
Article Google Scholar
Tiddi, I., d’Aquin, M., Motta, E.: Dedalo: looking for Clusters Explanations in a Labyrinth of Linked Data. In: 11th Extended Semantic Web Conference, ESWC 2014 (2014)
Google Scholar
Tiddi, I., d’Aquin, M., Motta, E.: Walking Linked Data: a graph traversal approach to explain clusters. In: Proceedings of the Fifth International Workshop on Consuming Linked Data, COLD 2014 (2014)
Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650–665. Springer, Heidelberg (2009)
Chapter Google Scholar
Zapilko, B., Harth, A., Mathiak, B.: Enriching and analysing statistics with linked open data. In: NTTS-Conference on New Techniques and Technologies for Statistics, Brussel (2011)
Google Scholar
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S., Hitzler, P.: Quality assessment methodologies for linked open data. Submitted to Semantic Web Journal (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Media Institute, The Open University, United Kingdom
Ilaria Tiddi, Mathieu d’Aquin & Enrico Motta

Authors

Ilaria Tiddi
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu d’Aquin
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Motta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Geography, University of California, Santa Barbara, CA, USA
Krzysztof Janowicz
Dept. of Computer Science, VU University Amsterdam, The Netherlands
Stefan Schlobach
University of Linköping, Sweden
Patrick Lambrix
Aalto University, Finland
Eero Hyvönen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tiddi, I., d’Aquin, M., Motta, E. (2014). Quantifying the Bias in Data Links. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds) Knowledge Engineering and Knowledge Management. EKAW 2014. Lecture Notes in Computer Science(), vol 8876. Springer, Cham. https://doi.org/10.1007/978-3-319-13704-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-13704-9_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13703-2
Online ISBN: 978-3-319-13704-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics