Skip to main content

Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems. OTM 2017 Conferences (OTM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10574))

Abstract

There is a multitude of tools for preparation of Linked Data from data sources such as CSV and XML files. These tools usually perform as expected when processing examples, or smaller real world data. However, a majority of these tools become hard to use when faced with a larger dataset such as hundreds of megabytes large CSV file. Tools which load the entire resulting RDF dataset into memory usually have memory requirements unsatisfiable by commodity hardware. This is the case of RDF-based ETL tools. Their limits can be avoided by running them on powerful and expensive hardware, which is, however, not an option for majority of data publishers. Tools which process the data in a streamed way tend to have limited transformation options. This is the case of text-based transformations, such as XSLT, or per-item SPARQL transformations such as the streamed version of TARQL. In this paper, we show how the power and transformation options of RDF-based ETL tools can be combined with the possibility to transform large datasets on common consumer hardware for so called chunkable data - data which can be split in a certain way. We demonstrate our approach in our RDF-based ETL tool, LinkedPipes ETL. We include experiments on selected real world datasets and a comparison of performance and memory consumption of available tools.

This work was supported in part by the Czech Science Foundation (GAČR), grant number 16-09713S and in part by the project SVV 260451.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://lod-cloud.net/

  2. 2.

    https://opendata.cz

  3. 3.

    https://tarql.github.io/

  4. 4.

    https://www.w3.org/TR/csv2rdf/

  5. 5.

    https://www.w3.org/community/rsp/

  6. 6.

    https://etl.linkedpipes.com

  7. 7.

    https://etl.linkedpipes.com/documentation/#debug

  8. 8.

    https://demo.etl.linkedpipes.com

  9. 9.

    http://rdf4j.org/

  10. 10.

    https://etl.linkedpipes.com/components/

  11. 11.

    https://github.com/openlink/virtuoso-opensource/issues/119

  12. 12.

    https://github.com/openlink/virtuoso-opensource/issues/207

  13. 13.

    http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksHowToHandle BandwidthLimitExceed

  14. 14.

    https://www.w3.org/TR/skos-reference/

  15. 15.

    http://comsode.eu

  16. 16.

    http://www.statnipokladna.cz/cs/csuis/sprava-ciselniku

  17. 17.

    http://wwwinfo.mfcr.cz/ares/ares_xml.html.en

  18. 18.

    http://monitor.statnipokladna.cz/en/2016/zdrojova-data/

  19. 19.

    https://www.sap.com

  20. 20.

    https://www.w3.org/TR/vocab-data-cube/

  21. 21.

    http://www.cuzk.cz/Uvod/Produkty-a-sluzby/RUIAN/RUIAN.aspx (in Czech).

  22. 22.

    http://www.geotools.org/

  23. 23.

    https://datagraft.net/

  24. 24.

    https://www.w3.org/TR/ldp/

  25. 25.

    http://openbudgets.eu

  26. 26.

    https://linked.opendata.cz

References

  1. Calbimonte, J.-P., Aberer, K.: Reactive processing of RDF streams of events. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 457–468. Springer, Cham (2015). doi:10.1007/978-3-319-25639-9_56

    Chapter  Google Scholar 

  2. Corcoglioniti, F., Aprosio, A.P., Rospocher, M.: Demonstrating the power of streaming and sorting for non-distributed RDF processing: RDFpro. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track Co-located with the 14th International Semantic Web Conference (ISWC 2015), vol. 1486. CEUR Workshop Proceedings, Bethlehem, PA, USA, 11 October 2015. CEUR-WS.org (2015)

    Google Scholar 

  3. Giménez-Garcia, J.M., Fernández, J.D., Martínez-Prieto, M.A.: MapReduce-based solutions for scalable SPARQL querying. Open J. Semant. Web (OJSW) 1(1), 1–18 (2014)

    Google Scholar 

  4. Gschwend, A., Neuroni, A.C., Gehrig, T., Combettoo, M.: Publication and reuse of linked data: the fusepool publish-process-perform platform for linked data. Innov. Public Sect. 22, 116–123 (2015)

    Google Scholar 

  5. Klímek, J., Škoda, P., Nečaský, M.: LinkedPipes ETL: evolved linked data preparation. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 95–100. Springer, Cham (2016). doi:10.1007/978-3-319-47602-5_20

    Chapter  Google Scholar 

  6. Knap, T., Hanečák, P., Klímek, J., Mader, C., Nečaský, M., Nuffelen, B.V., Škoda, P.: UnifiedViews: an ETL tool for RDF data management. Semantic Web (Accepted 2017). http://semantic-web-journal.net/content/unifiedviews-etl-tool-rdf-data-management-0

  7. Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 375–390. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30284-8_32

    Chapter  Google Scholar 

  8. Le-Phuoc, D., Polleres, A., Hauswirth, M., Tummarello, G., Morbidoni, C.: Rapid prototyping of semantic mash-ups through semantic web pipes. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 581–590. ACM, New York (2009)

    Google Scholar 

  9. Marx, E., Shekarpour, S., Auer, S., Ngomo, A.-C.N.: Large-scale RDF dataset slicing. In: Proceedings of the 2013 IEEE Seventh International Conference on Semantic Computing, ICSC 2013, pp. 228–235. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  10. De Meester, B., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: Declarative data transformations for linked data generation: the case of DBpedia. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10250, pp. 33–48. Springer, Cham (2017). doi:10.1007/978-3-319-58451-5_3

    Chapter  Google Scholar 

  11. Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata, S., Bucher, B., Hamdi, F., Bihanic, L., Képéklian, G., Cotton, F., Euzenat, J., Fan, Z., Vandenbussche, P.-Y., Vatant, B.: Enabling linked data publication with the Datalift platform. In: Proceedings of AAAI Workshop on Semantic Cities, Toronto, Canada, July 2012

    Google Scholar 

  12. Thellmann, K., Orlandi, F., Auer, S.: LinDA - visualising and exploring linked data. In: Proceedings of the Posters and Demos Track of 10th International Conference on Semantic Systems - SEMANTiCS 2014, Leipzig, Germany, September 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Klímek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Klímek, J., Škoda, P. (2017). Speeding up Publication of Linked Data Using Data Chunking in LinkedPipes ETL. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69459-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69458-0

  • Online ISBN: 978-3-319-69459-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics