Abstract
An increasing number of linked datasets is published on the Web, using RDF(S)/OWL. The availability of the schema describing these datasets is crucial for their meaningful usage. A dataset may contain schema-related information, however, languages do not impose any constraint on their structure, and a gap may therefore exist between the schema and the actual instances. In this paper, we tackle the problem of evaluating this gap. We present an approach relying on both type and class profiles, as well as a set of quality metrics. We also present some experimental evaluations to illustrate the use of the proposed metrics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arenas, M., Dıaz, G., Fokoue, A., Kementsietsidis, A., Srinivas, K.: A principled approach to bridging the gap between graph data and their schemas. In: VLDB (2014)
Batini, C., Scannapieco, M.: Data Quality: Concepts. Methodologies and Techniques. Springer Science & Business Media, New York (2006)
Berti-Équille, L., Comyn-Wattiau, I., Cosquer, M., Kedad, Z., Nugier, S., Peralta, V., Cherfi, S.S.-S., Thion-Goasdoué, V.: Assessment and analysis of information quality: a multidimensional model and case studies. IJIQ 2(4), 300–323 (2011)
Duan, S., Kementsietsidis, A., Srinivas, K., Udrea, O.: Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In: SIGMOD (2011)
Fürber, C., Hepp, M.: Using semantic web resources for data quality management. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 211–225. Springer, Heidelberg (2010)
Fürber, C., Hepp, M.: Using SPARQL and SPIN for data quality management on the semantic web. In: Abramowicz, W., Tolksdorf, R. (eds.) BIS 2010. LNBIP, vol. 47, pp. 35–46. Springer, Heidelberg (2010)
Fürber, C., Hepp, M.: Swiqa-a semantic web information quality assessment framework. In: ECIS (2011)
Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Workshop on Linked Web Data Management (2011)
Kellou-Menouer, K., Kedad, Z.: Discovering types in RDF datasets. In: 12th European Semantic Web Conference, ESWC. Springer (2015, poster paper)
Kellou-Menouer, K., Kedad, Z.: Schema discovery in RDF data sources. In: Jeusfeld, M., Karlapalem, K. (eds.) ER 2015. LNCS, vol. 9382, pp. XX–YY. Springer, Heidelberg (2015)
Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: WWW (2014)
Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: a tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)
Moody, D.: Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions. In: Data & Knowledge Engineering (2005)
Pipino, L., Lee, Y., Wang, R.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)
Redman, T.: Data Quality for the Information Age. Artech House, Boston (1996)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)
Acknowledgements
This work was partially funded by the French National Research Agency through the CAIR ANR-14-CE23-0006 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kellou-Menouer, K., Kedad, Z. (2015). Evaluating the Gap Between an RDF Dataset and Its Schema. In: Jeusfeld, M., Karlapalem, K. (eds) Advances in Conceptual Modeling. ER 2015. Lecture Notes in Computer Science(), vol 9382. Springer, Cham. https://doi.org/10.1007/978-3-319-25747-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-25747-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25746-4
Online ISBN: 978-3-319-25747-1
eBook Packages: Computer ScienceComputer Science (R0)