Transforming disparate and heterogeneous data sources that provide large volumes of data in high velocity into a common form allows integrated and enriched views on data and thus provides further opportunities to advance the effectiveness and accuracy of data analysis and prediction tasks. This paper presents the RDF-Gen approach for transforming data provided by archival and streaming data sources, provided in various formats, into RDF triples, according to a set of ontological specifications. RDF-Gen introduces a generic mechanism which supports the transformation of data efficiently (i.e., with high throughput and low latency), even in cases where the velocity of data presents high peaks, offering facilities for discovering associations between data from different sources, and supporting transformation of modular data sets. This paper presents a parallel implementation of RDF-Gen, also presenting data transformation workflows that allow variations incorporating RDF-Gen instances, adjusting to the needs of data sources, application areas and performance requirements. RDF-Gen is experimentally evaluated against state of the art, in both archival and streaming settings: Experimental results show RDF-Gen efficiency and highlight key contributions.
GeoJSON Specification is available online at https://tools.ietf.org/html/rfc7946.
RDF/XML Specification is available online at https://www.w3.org/TR/rdf-syntax-grammar/.
The predefined terms for the configuration file are in the namespace http://www.datacron-project.eu/RDFGen_conf#.
This work was supported by EU projects datAcron (Grant Agreement No 687591), VesselAI (Grant Agreement No 957237), and by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: HFRI-FM17-81).
Santipantakis, G.M., Kotis, K.I., Glenis, A. et al. RDF-Gen: generating RDF triples from big data sources. Knowl Inf Syst 64, 2985–3015 (2022). https://doi.org/10.1007/s10115-022-01729-x
https://doi.org/10.1007/s10115-022-01729-x