Skip to main content

Probabilistic Error Detecting in Numerical Linked Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Abstract

Linked Open Data (LOD) has become a vast repository with billions of triples available in thousands of datasets. One of the most pressing challenges to Linked Data management is detecting errors in numerical data. This study proposes a novel probabilistic framework that enables the detection of inconsistencies in numerical attributes including not only integer, float or double values but also date values of liked data. We develop an automatic method to detect error between multi attributes which can not be detected only considering single attribute. Evaluations are performed on four DBpedia versions from 3.7 to 2014 which are a central hub dataset of LOD cloud. Results show that our approach reaches \(96\,\%\) precision when testing on DBpedia 2014 with threshold \(\alpha =0.9\). We also compare the percentage distribution of errors between different DBpedia versions and analyze two basic classes of causes that lead to errors. Efficiency evaluation results confirm the scalability of our approach to large Linked Data repositories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The namespace dbres denotes http://dbpedia.org/resource, dbonto denotes http://dbpedia.org/ontology, rdf denotes http://www.w3.org/1999/02/22-rdf-syntax-ns#, foaf denotes http://xmlns.com/foaf/0.1/.

  2. 2.

    The namespace owl denotes http://www.w3.org/2002/07/owl.

References

  1. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. IJSWIS 5(3), 1–22 (2009)

    Google Scholar 

  2. Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of DBpedia. In: 9th International Conference on Semantic Systems (I-SEMANTICS 2013), pp. 97–104 (2013)

    Google Scholar 

  3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Millard, I.C., Glaser, H., Salvadores, M., Shadbolt, N.: Consuming multiple linked data. In: COLD 2010 - Workshop at the 9th International Semantic Web Conference (2010)

    Google Scholar 

  5. Nickel, M., Tresp, V., Kriegel, H.P.: Factorizing YAGO: scalable machine learning for linked data. In: Proceedings of the 21st Annual Conference on World Wide Web (WWW 2012), pp. 271–280 (2012)

    Google Scholar 

  6. Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 14 (2010)

    Article  Google Scholar 

  7. Martinez, M.V., Parisi, F., Pugliese, A., Simari, G.I., Subrahmanian, V.S.: Policy-based inconsistency management in relational databases. Int. J. Approx. Reason. 55(2), 501–528 (2014)

    Article  MathSciNet  Google Scholar 

  8. Fan, G., Fan, W., Geerts, F.: Detecting errors in numeric attributes. In: Li, F., Li, G., Hwang, S., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 125–137. Springer, Heidelberg (2014)

    Google Scholar 

  9. Waitelonis, J., Ludwig, N., Knuth, M., Sack, H.: Who knows? evaluating linked data heuristics with a quiz that cleans up DBpedia. Interact. Technol. Smart Edu. 8(4), 236–248 (2011)

    Article  Google Scholar 

  10. Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: The 8th International Conference on Semantic Systems, pp. 33–40 (2012)

    Google Scholar 

  11. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. J. Web Semant. 14, 14–44 (2012)

    Article  Google Scholar 

  12. Ruckhaus, E., Baldizán, O., Vidal, M.-E.: Analyzing linked data quality with LiQuate. In: Demey, Y.T., Panetto, H. (eds.) OTM 2013 Workshops 2013. LNCS, vol. 8186, pp. 629–638. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Kontokostas, D., Zaveri, A., Auer, S., Lehmann, J.: TripleCheckMate: a tool for crowdsourcing the quality assessment of linked data. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 265–272. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Wang, S., Lin, C.J., Wu, C., Chaovalitwongse, W.: Early detection of numerical typing errors using data mining techniques. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 41(6), 1199–1212 (2011)

    Article  Google Scholar 

  16. Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  17. Fleischhacker, D., Paulheim, H., Bryl, V., Völker, J., Bizer, C.: Detecting errors in numerical linked data using cross-checked outlier detection. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 357–372. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgments

The work is supported by the Natural Science Foundation of Jiangsu Province under Grant BK20140643, the National Natural Science Foundation of China under Grant 61170165 and the 863 program under Grant 2015AA015406.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiying Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, H., Li, Y., Xu, F., Zhong, X. (2015). Probabilistic Error Detecting in Numerical Linked Data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22849-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22848-8

  • Online ISBN: 978-3-319-22849-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics