Skip to main content

Managing and Compiling Data Dependencies for Semantic Applications Using Databus Client

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1537))

Abstract

Realizing a data-driven application or workflow, that consumes bulk data files from the Web, poses a multitude of challenges ranging from sustainable dependency management supporting automatic updates, to dealing with compression, serialization format, and data model variety. In this work, we present an approach using the novel Databus Client, which is backed by the DBpedia Databus - a data asset release management platform inspired by paradigms and techniques successfully applied in software release management. The approach shifts effort from the publisher to the client while making data consumption and dependency management easier and more unified as a whole. The client leverages 4 layers (download, compression, format, and mapping) that tackle individual challenges and offers a fully automated way for extracting and compiling data assets from the DBpedia Databus, given one command and a flexible dependency configuration using SPARQL or Databus Collections. The current vertical-sliced implementation supports format conversion within as well as mapping between RDF triples, RDF quads, and CSV/TSV files. We developed an evaluation strategy for the format conversion and mapping functionality using so-called round trip tests.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://databus.dbpedia.org.

  2. 2.

    https://www.dbpedia.org/blog/databus-collections-feature/.

  3. 3.

    https://github.com/dbpedia/databus-client.

  4. 4.

    https://www.docker.com/.

  5. 5.

    https://commons.apache.org/proper/commons-compress/.

  6. 6.

    https://jena.apache.org/.

  7. 7.

    https://spark.apache.org/.

  8. 8.

    https://tarql.github.io/.

  9. 9.

    https://rml.io/.

  10. 10.

    https://datahub.io.

References

  1. Fernández, J.D., MartÍnez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). J. Web Semant. 19, 22–41 (2013). https://doi.org/10.1016/j.websem.2013.01.002

    Article  Google Scholar 

  2. Frey, J., Hellmann, S.: Fair linked data - towards a linked data backbone for users and machines. In: WWW Companion (2021). https://doi.org/10.1145/3442442.3451364

  3. Frey, J., Hofer, M., Obraczka, D., Lehmann, J., Hellmann, S.: DBpedia FlexiFusion the best of wikipedia \(>\) wikidata \(>\) your data. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 96–112. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_7

    Chapter  Google Scholar 

  4. Frey, J., Streitmatter, D., Götz, F., Hellmann, S., Arndt, N.: DBpedia archivo: a web-scale interface for ontology archiving under consumer-oriented aspects. In: Blomqvist, E., et al. (eds.) SEMANTICS 2020. LNCS, vol. 12378, pp. 19–35. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59833-4_2

    Chapter  Google Scholar 

  5. Hofer, M., Hellmann, S., Dojchinovski, M., Frey, J.: The new dbpedia release cycle: increasing agility and efficiency in knowledge extraction workflows. In: Semantic Systems (2020). https://doi.org/10.1007/978-3-030-59833-4_1

  6. Knap, T., et al.: Unifiedviews: an ETL tool for RDF data management. Semant. Web 9(5), 661–676 (2018). https://doi.org/10.3233/SW-180291

    Article  Google Scholar 

  7. Paschke, A., Schäfermeier, R.: OntoMaven - maven-based ontology development and management of distributed ontology repositories. In: Nalepa, G.J., Baumeister, J. (eds.) Synergies Between Knowledge Engineering and Software Engineering. AISC, vol. 626, pp. 251–273. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-64161-4_12

    Chapter  Google Scholar 

  8. Sauermann, L., Cyganiak, R.: Cool uris for the semantic web. W3c interest group note, W3C (2008). https://www.w3.org/TR/cooluris/

  9. Verborgh, R., Sande, M.V., Colpaert, P., Coppens, S., Mannens, E., de Walle, R.V.: Web-scale querying through linked data fragments. In: Proceedings of the 7th Workshop on Linked Data on the Web, vol. 1184. CEUR (2014). http://ceur-ws.org/Vol-1184/ldow2014_paper_04.pdf

  10. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI 2012), pp. 15–28. USENIX Association (2012). https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia

Download references

Acknowledgments

This work was partially supported by grants from the Federal Ministry for Economic Affairs and Energy of Germany (BMWi) to the projects LOD-GEOSS (03EI1005E) and PLASS (01MD19003D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johannes Frey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frey, J., Götz, F., Hofer, M., Hellmann, S. (2022). Managing and Compiling Data Dependencies for Semantic Applications Using Databus Client. In: Garoufallou, E., Ovalle-Perandones, MA., Vlachidis, A. (eds) Metadata and Semantic Research. MTSR 2021. Communications in Computer and Information Science, vol 1537. Springer, Cham. https://doi.org/10.1007/978-3-030-98876-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98876-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98875-3

  • Online ISBN: 978-3-030-98876-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics