Skip to main content
Log in

RDF-Gen: generating RDF triples from big data sources

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Transforming disparate and heterogeneous data sources that provide large volumes of data in high velocity into a common form allows integrated and enriched views on data and thus provides further opportunities to advance the effectiveness and accuracy of data analysis and prediction tasks. This paper presents the RDF-Gen approach for transforming data provided by archival and streaming data sources, provided in various formats, into RDF triples, according to a set of ontological specifications. RDF-Gen introduces a generic mechanism which supports the transformation of data efficiently (i.e., with high throughput and low latency), even in cases where the velocity of data presents high peaks, offering facilities for discovering associations between data from different sources, and supporting transformation of modular data sets. This paper presents a parallel implementation of RDF-Gen, also presenting data transformation workflows that allow variations incorporating RDF-Gen instances, adjusting to the needs of data sources, application areas and performance requirements. RDF-Gen is experimentally evaluated against state of the art, in both archival and streaming settings: Experimental results show RDF-Gen efficiency and highlight key contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. GeoJSON Specification is available online at https://tools.ietf.org/html/rfc7946.

  2. RDF/XML Specification is available online at https://www.w3.org/TR/rdf-syntax-grammar/.

  3. https://zenodo.org/record/2576152.

  4. https://www.w3.org/TR/r2rml/.

  5. http://graphdb.ontotext.com/documentation/free/loading-data-using-ontorefine.html.

  6. http://vos.openlinksw.com/owiki/wiki/VOS/VirtSponger.

  7. The predefined terms for the configuration file are in the namespace http://www.datacron-project.eu/RDFGen_conf#.

  8. https://flink.apache.org/.

  9. http://www.datacron-project.eu.

  10. https://zenodo.org/record/1167595.

  11. https://github.com/datAcron-project/RDF-Gen/tree/master/configurations/zenodo_dataset_configurations.

  12. https://zenodo.org/record/2576584.

  13. http://ai-group.ds.unipi.gr/datacron_ontology/.

References

  1. Brecher C, Özdemir D, Feng J, Herfs W, Fayzullin K, Hamadou M, Müller A (2010) Integration of software tools with heterogeneous data structures in production plant lifecycles. IFAC Proc Vol 43(4):48–53

    Article  Google Scholar 

  2. Chortaras A, Stamou G (2018) D2RML: integrating heterogeneous data and web services into custom RDF graphs. In: Workshop on linked data, LDOW@WWW 2018

  3. Dell’Aglio D, Valle ED, van Harmelen F, Bernstein A (2017) Stream reasoning: a survey and outlook: a summary of ten years of research and a vision for the next decade. Data Sci. J. 1:59–83

    Article  Google Scholar 

  4. Dimou A, Sande MV, Colpaert P, Verborgh R, Mannens E, de Walle RV (2014) RML: a generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the 7th workshop on linked data on the web

  5. Dong XL, Srivastava D (2015) Big data integration. Synthesis lectures on data management. Morgan & Claypool Publishers. https://doi.org/10.2200/S00578ED1V01Y201404DTM040

  6. Efthymiou K, Sipsas K, Mourtzis D, Chryssolouris G (2013) On an integrated knowledge based framework for manufacturing systems early design phase. Procedia CIRP 9:121–126

    Article  Google Scholar 

  7. ESRI (1998) Esri shapefile technical description. Technical report. Tech. rep., Environmental Systems Research Institute, Inc., 380 New York Street, Redlands, CA 92373–8100 USA, http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

  8. Haesendonck G, Maroy W, Heyvaert P, Verborgh R, Dimou A (2019) Parallel RDF generation from heterogeneous big data. In: Proceedings of the international workshop on semantic big data, SBD ’19. pp 1:1–1:6

  9. Hirzel M, Baudart G, Bonifati A, Valle ED, Sakr S, Vlachou A (2018) Stream processing languages in the big data era. SIGMOD Rec 47(2):29–40

    Article  Google Scholar 

  10. Junior AC, Debruyne C, Brennan R, O’Sullivan D (2016a) FunUL: a method to incorporate functions into uplift mapping languages. In: Proceedings of the 18th international conference on information integration and web-based applications and services. pp 267–275

  11. Junior AC, Debruyne C, O’Sullivan D (2016b) Incorporating functions in mappings to facilitate the uplift of CSV files into RDF. In: The semantic web - ESWC 2016 satellite events. pp 55–59

  12. Knoblock CA, Szekely PA, Ambite JL, Goel A, Gupta S, Lerman K, Muslea M, Taheriyan M, Mallick P (2012) Semi-automatically mapping structured sources into the semantic web. In: The semantic web: research and applications. pp 375–390

  13. Kyzirakos K, Vlachopoulos I, Savva D, Manegold S, Koubarakis M (2018) GeoTriples: transforming geospatial data into RDF graphs using R2RML and RML mappings. J Web Semant 52:16–53

    Article  Google Scholar 

  14. Lefrançois M, Zimmermann A, Bakerally N (2017) A SPARQL extension for generating RDF from heterogeneous formats. In: The semantic web. pp 35–50

  15. Meester BD, Maroy W, Dimou A, Verborgh R, Mannens E (2017) Declarative data transformations for linked data generation: the case of DBpedia. In: Proceedings of the 14th ESWC. pp 33–48

  16. Nentwig M, Hartung M, Ngomo AN, Rahm E (2017) A survey of current link discovery frameworks. Semant Web 8(3):419–436. https://doi.org/10.3233/SW-150210

    Article  Google Scholar 

  17. Ocker F, Vogel-Heuser B, Seitz M, Paredis CJ (2020) A knowledge based system for managing heterogeneous sources of engineering information. IFAC-PapersOnLine 53(2):10511–10517

    Article  Google Scholar 

  18. Perry M, Herring J (2012) Open geospatial consortium. GeoSPARQL - A geographic query language for RDF data, OpenGIS implementation standard. Accessed 10 Aug 2019

  19. Phuoc DL, Quoc HNM, Ngo QH, Nhat TT, Hauswirth M (2016) The graph of things: a step towards the live knowledge graph of connected things. J Web Semant 37–38:25–35

    Article  Google Scholar 

  20. Santipantakis GM, Vouros GA, Glenis A, Doulkeridis C, Vlachou A (2017) The datAcron ontology for semantic trajectories. In The semantic web: ESWC 2017 satellite events. pp 26–30

  21. Santipantakis GM, Glenis A, Kalaitzian N, Vlachou A, Doulkeridis C, Vouros GA (2018a) FAIMUSS: flexible data transformation to rdf from multiple streaming sources. EDBT 2018

  22. Santipantakis GM, Kotis KI, Vouros GA, Doulkeridis C (2018b) RDF-Gen: generating RDF from streaming and archival data. In: WIMS, ACM. pp 28:1–28:10

  23. Scharffe F, Atemezing G, Troncy R, Gandon F, Villata S, Bucher B, Hamdi F, Bihanic L, Képéklian G, Cotton F, Euzenat J, Fan Z, Vandenbussche PY, Vatant B (2012) Enabling linked data publication with the Datalift platform. In: Semantic cities @AAAI 2012, AAAI workshops, vol WS-12-13

  24. Simsek U, Kärle E, Fensel D (2019) RocketRML - a NodeJS implementation of a use-case specific RML mapper. CoRR arXiv:1903.04969

  25. Slepicka J, Yin C, Szekely P, Knoblock C (2015) KR2RML: an alternative interpretation of R2RML for heterogeneous sources. In: Proceedings of the 6th international workshop on consuming linked data (COLD 2015)

  26. Venetis T, Vassalos V (2015) Data integration in the human brain project. In: Ambite JL, Ashish N (eds) Data integration in the life sciences. Springer, New York, pp 28–36

    Chapter  Google Scholar 

  27. Vouros G, Santipantakis G, Doulkeridis C, Vlachou A, Andrienko G, Andrienko N, Fuchs G, Martinez MG, Cordero JMG (2019) The datAcron ontology for the specification of semantic trajectories: specification of semantic trajectories for data transformations supporting visual analytics. J Data Semant 8:235–262

    Article  Google Scholar 

  28. Vouros GA, Vlachou A, Santipantakis GM, Doulkeridis C, Pelekis N, Georgiou HV, Theodoridis Y, Patroumpas K, Alevizos E, Artikis A, Claramunt C, Ray C, Scarlatti D, Fuchs G, Andrienko GL, Andrienko NV, Mock M, Camossi E, Jousselme A, Garcia JMC (2018) Big data analytics for time critical mobility forecasting: recent progress and research challenges. In: Proceedings of the 21th international conference on extending database technology, EDBT 2018, Vienna, Austria, March 26–29, 2018. pp 612–623

Download references

Acknowledgements

This work was supported by EU projects datAcron (Grant Agreement No 687591), VesselAI (Grant Agreement No 957237), and by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: HFRI-FM17-81).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios M. Santipantakis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santipantakis, G.M., Kotis, K.I., Glenis, A. et al. RDF-Gen: generating RDF triples from big data sources. Knowl Inf Syst 64, 2985–3015 (2022). https://doi.org/10.1007/s10115-022-01729-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01729-x

Keywords

Navigation