Abstract
Linked Open Data (LOD) has become a vast repository with billions of triples available in thousands of datasets. One of the most pressing challenges to Linked Data management is detecting errors in numerical data. This study proposes a novel probabilistic framework that enables the detection of inconsistencies in numerical attributes including not only integer, float or double values but also date values of liked data. We develop an automatic method to detect error between multi attributes which can not be detected only considering single attribute. Evaluations are performed on four DBpedia versions from 3.7 to 2014 which are a central hub dataset of LOD cloud. Results show that our approach reaches \(96\,\%\) precision when testing on DBpedia 2014 with threshold \(\alpha =0.9\). We also compare the percentage distribution of errors between different DBpedia versions and analyze two basic classes of causes that lead to errors. Efficiency evaluation results confirm the scalability of our approach to large Linked Data repositories.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The namespace dbres denotes http://dbpedia.org/resource, dbonto denotes http://dbpedia.org/ontology, rdf denotes http://www.w3.org/1999/02/22-rdf-syntax-ns#, foaf denotes http://xmlns.com/foaf/0.1/.
- 2.
The namespace owl denotes http://www.w3.org/2002/07/owl.
References
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. IJSWIS 5(3), 1–22 (2009)
Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: 9th International Conference on Semantic Systems (I-SEMANTICS 2013), pp. 97–104 (2013)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Millard, I.C., Glaser, H., Salvadores, M., Shadbolt, N.: Consuming multiple linked data. In: COLD 2010 - Workshop at the 9th International Semantic Web Conference (2010)
Nickel, M., Tresp, V., Kriegel, H.P.: Factorizing YAGO: scalable machine learning for linked data. In: Proceedings of the 21st Annual Conference on World Wide Web (WWW 2012), pp. 271–280 (2012)
Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 14 (2010)
Martinez, M.V., Parisi, F., Pugliese, A., Simari, G.I., Subrahmanian, V.S.: Policy-based inconsistency management in relational databases. Int. J. Approx. Reason. 55(2), 501–528 (2014)
Fan, G., Fan, W., Geerts, F.: Detecting errors in numeric attributes. In: Li, F., Li, G., Hwang, S., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 125–137. Springer, Heidelberg (2014)
Waitelonis, J., Ludwig, N., Knuth, M., Sack, H.: Who knows? evaluating linked data heuristics with a quiz that cleans up DBpedia. Interact. Technol. Smart Edu. 8(4), 236–248 (2011)
Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: The 8th International Conference on Semantic Systems, pp. 33–40 (2012)
Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. J. Web Semant. 14, 14–44 (2012)
Ruckhaus, E., Baldizán, O., Vidal, M.-E.: Analyzing linked data quality with LiQuate. In: Demey, Y.T., Panetto, H. (eds.) OTM 2013 Workshops 2013. LNCS, vol. 8186, pp. 629–638. Springer, Heidelberg (2013)
Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013)
Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: a tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)
Wang, S., Lin, C.J., Wu, C., Chaovalitwongse, W.: Early detection of numerical typing errors using data mining techniques. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 41(6), 1199–1212 (2011)
Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)
Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 357–372. Springer, Heidelberg (2014)
Acknowledgments
The work is supported by the Natural Science Foundation of Jiangsu Province under Grant BK20140643, the National Natural Science Foundation of China under Grant 61170165 and the 863 program under Grant 2015AA015406.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., Li, Y., Xu, F., Zhong, X. (2015). Probabilistic Error Detecting in Numerical Linked Data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-22849-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)