Skip to main content

An Analysis of the Quality Issues of the Properties Available in the Spanish DBpedia

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9422))

Abstract

DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia ontology but are auto-generated using the infobox attribute-value pairs. The work presented in this paper inspects the quality issues of the properties used in the Spanish DBpedia dataset according to conciseness, consistency, syntactic validity, and semantic accuracy quality dimensions. The main contribution of the paper is the identification of quality issues in the Spanish DBpedia and the possible causes of their existence. The findings presented in this paper can be used as feedback to improve the DBpedia extraction process in order to eliminate such quality issues from DBpedia.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/.

  2. 2.

    http://es.dbpedia.org.

  3. 3.

    See statistics for Spanish at http://wiki.dbpedia.org/services-resources/datasets/dataset-statistics.

  4. 4.

    See the datasets loaded at http://wiki.dbpedia.org/services-resources/datasets/data-set-loaded-2014.

  5. 5.

    http://mappings.dbpedia.org/server/ontology/.

  6. 6.

    http://mappings.dbpedia.org.

  7. 7.

    http://mappings.dbpedia.org/server/statistics/es/.

  8. 8.

    The Spanish DBpedia 2014 dataset is the last publicly available version in July 2015.

  9. 9.

    http://loupe.linkeddata.es/loupe/.

  10. 10.

    http://loupe.linkeddata.es/loupe/methods.html#property.

  11. 11.

    http://www.w3.org/TR/owl2-direct-semantics/#Interpretations.

  12. 12.

    http://docs.oracle.com/javase/7/docs/api/java/text/Collator.html.

  13. 13.

    In these cases, the infobox label contains a slash. For example, the label ‘idoma/s’ generates a property ‘http://es.dbpedia.org/property/idioma/s’.

  14. 14.

    http://dx.doi.org/10.6084/m9.figshare.1491367.

  15. 15.

    http://docs.oracle.com/javase/7/docs/api/java/text/Normalizer.html.

  16. 16.

    http://dx.doi.org/10.6084/m9.figshare.1491372.

  17. 17.

    http://dx.doi.org/10.6084/m9.figshare.1491432.

  18. 18.

    http://loupe.linkeddata.es/loupe/.

References

  1. Acosta, M., Zaveri, A., Simperl, E., Kontokostas, D., Auer, S., Lehmann, J.: Crowdsourcing linked data quality assessment. In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 260–276. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 213–228. Springer, Heidelberg (2014)

    Google Scholar 

  4. Fürber, C., Hepp, M.: SWIQA a semantic web information quality assessment framework. In: Proceeding of the 19th European Conference on Information Systems (ECIS 2011), vol. 15, p. 19 (2011)

    Google Scholar 

  5. Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: Proceedings of the Linked Data on the Web (LDOW 2010), CEUR Workshop Proceedings, vol. 628 (2010)

    Google Scholar 

  6. Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R.: Databugger: a test-driven framework for debugging the web of data. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 115–118 (2014)

    Google Scholar 

  7. Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012)

    Google Scholar 

  8. Wienand, D., Paulheim, H.: Detecting incorrect numerical data in DBpedia. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 504–518. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. Zaveri, A., Rula, A., Maurinob, A., Pietrobonc, R., Lehmanna, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web J. (2015)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the BES-2014-068449 grant under the 4V project (TIN2013-46238-C4-2-R), the LIDER project (EU FP7 610782), and the JCI-2012-12719 contract.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nandana Mihindukulasooriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mihindukulasooriya, N., Rico, M., García-Castro, R., Gómez-Pérez, A. (2015). An Analysis of the Quality Issues of the Properties Available in the Spanish DBpedia. In: Puerta, J., et al. Advances in Artificial Intelligence. CAEPIA 2015. Lecture Notes in Computer Science(), vol 9422. Springer, Cham. https://doi.org/10.1007/978-3-319-24598-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24598-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24597-3

  • Online ISBN: 978-3-319-24598-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics