Abstract
Knowledge graphs in RDF are often generated from heterogeneous data sources to power services. However, knowledge graph generation is an unbalanced effort for producers compared to consumers of a knowledge graph. In this paper, I present my research about (i) investigating current RDF knowledge graph production and consumption approaches, and (ii) how to involve the consumer into a hybrid RDF generation approach to reduce the necessary resources for generating RDF for producers & consumers. I discuss the shortcomings of existing approaches for RDF generation from heterogeneous data sources (i.e., materialization and virtualization) and how I will address these: a Systematic Literature Review; an analysis and a set of guidelines for producers to select the right approach for an use case; and a combined hybrid approach to balance the producer’s and consumer’s effort in RDF generation. I already performed a Systematic Literature Review to get an overview of the existing approaches for RDF production from heterogeneous data sources. These results will be used to establish a set of producer guidelines, a benchmark to compare the current materialization and virtualization approaches, and evaluate the proposed hybrid approach. Thanks to my research, knowledge graph production and consumption will be more balanced and accessible to smaller companies and individuals. This way, they can focus on providing better services on top of a knowledge graph instead of being limited by the lack of computing resources to harvest enormous amounts of data from the Web and integrate it into a knowledge graph.
D. Van Assche—Supervised by Anastasia Dimou https://doi.org/0000-0003-2138-7972 & Ben De Meester https://doi.org/0000-0003-0248-0987.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bansal, S., Kagemann, S.: Integrating big data: a semantic extract-transform-load framework. Computer 48(3), 42–50 (2015)
Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics X. LNCS, vol. 4900, pp. 133–173. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77688-8_5
Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. Working group recommendation, World Wide Web Consortium (W3C) (2012)
Slepicka, J., Yin, C., Szekely, P.A., Knoblock, C.A.: Kr2rml: an alternative interpretation of r2rml for heterogenous sources. In: Proceedings of the 6th International Workshop on Consuming Linked Data (COLD 2015) (2015)
Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of heterogeneous databases into RDF, and application to the construction of a SKOS taxonomical reference. In: Monfort, V., Krempels, K.-H., Majchrzak, T.A., Turk, Ž (eds.) WEBIST 2015. LNBIP, vol. 246, pp. 275–296. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30996-5_14
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a Generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the 7th Workshop on Linked Data on the Web (2014)
Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., Vidal, M.-E.: SDM-RDFizer: an RML interpreter for the efficient creation of RDF knowledge graphs. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020)
Chaves-Fraga, D., Ruckhaus, E., Priyatna, F., Vidal, M.-E., Corcho, O.: Enhancing virtual ontology based access over tabular data with Morph-CSV. Semant. Web 12(6), 869–902 (2021)
Haesendonck, G., Maroy, W., Heyvaert, P., Verborgh, R., Dimou, A.: Parallel RDF generation from heterogeneous big data. In: Proceedings of the International Workshop on Semantic Big Data - SBD 2019 (2019)
Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. Recommendation, World Wide Web Consortium (W3C) (2013)
Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL extension for generating RDF from heterogeneous formats. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 35–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_3
Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-X: an opinionated approach to SPARQL anything. In: Further with Knowledge Graphs - Proceedings of the 17th International Conference on Semantic Systems, 6–9 September 2021, Amsterdam, The Netherlands, pp. 58–73 (2021)
Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between RDF and XML with XSPARQL. J. Data Semant. 3, 147–185 (2012)
García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: ShExML: improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Comput. Sci. 318, e318 (2020)
Prud’hommeaux, E.: Shape Expressions 1.0 Primer. Member submission, World Wide Web Consortium (W3C) (2014)
Priyatna, F., Corcho, O., Sequeda, J.: Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph. In: Proceedings of the 23rd International Conference on World Wide web, pp. 479–490 (2014)
Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)
Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web J. 3, 471–487 (2017)
Mami, M.N., Graux, D., Scerri, S., Jabeen, H., Auer, S., Lehmann, J.: Squerall: virtual ontology-based access to heterogeneous and large data sources. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 229–245. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_15
Endris, K.M., Rohde, P.D., Vidal, M.-E., Auer, S.: Ontario: federated query processing against a semantic data lake. In: Database and Expert Systems Applications: 30th International Conference, DEXA, Part I, pp. 379–395 (2019)
Khan, Y., Zimmermann, A., Jha, A., Gadepally, V., D’Aquin, M., Sahay, R.: One size does not fit all: querying web polystores. IEEE Access 7, 9598–9617 (2019)
Chaves-Fraga, D., Priyatna, F., Cimmino, A., Toledo, J., Ruckhaus, E., Corcho, O.: GTFS-Madrid-bench: a benchmark for virtual knowledge graph access in the transport domain. J. Web Semant. 65, 100596 (2020)
Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. (IJSWIS) 2, 1–24 (2009)
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2), 158–182 (2005). Selcted Papers from the International Semantic Web Conference, 2004
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp\(\hat{}\) 2bench: a SPARQL performance benchmark. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 222–233 (2009)
Hasnain, A., et al.: Biofed: federated query processing over life sciences linked open data. J. Biomed. Semant. 8(1), 1–19 (2017)
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark–performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29
Rivero, C.R., Schultz, A., Bizer, C., Ruiz Cortés, D.: Benchmarking the performance of linked data translation systems. In: LDOW 2012: WWW2012 Workshop on Linked Data on the Web (2012)
Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: reality check for OBDA systems. In: EDBT, pp. 617–628 (2015)
Chaves-Fraga, D., Endris, K.M., Iglesias, E., Corcho, O., Vidal, M.-E.: What are the parameters that affect the construction of a knowledge graph? In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 695–713. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_43
Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)
Machado, G.V., Cunha, Í., Pereira, A.C.M., Oliveira, L.B.: DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J. Internet Serv. Appl. 10(1), 1–15 (2019). https://doi.org/10.1186/s13174-019-0121-z
Van Assche, D., et al.: Leveraging web of things W3C recommendations for knowledge graphs generation. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds.) ICWE 2021. LNCS, vol. 12706, pp. 337–352. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74296-6_26
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Van Assche, D. (2022). Balancing RDF Generation from Heterogeneous Data Sources. In: Groth, P., et al. The Semantic Web: ESWC 2022 Satellite Events. ESWC 2022. Lecture Notes in Computer Science, vol 13384. Springer, Cham. https://doi.org/10.1007/978-3-031-11609-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-11609-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11608-7
Online ISBN: 978-3-031-11609-4
eBook Packages: Computer ScienceComputer Science (R0)