Skip to main content

Balancing RDF Generation from Heterogeneous Data Sources

  • Conference paper
  • First Online:
The Semantic Web: ESWC 2022 Satellite Events (ESWC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13384))

Included in the following conference series:

Abstract

Knowledge graphs in RDF are often generated from heterogeneous data sources to power services. However, knowledge graph generation is an unbalanced effort for producers compared to consumers of a knowledge graph. In this paper, I present my research about (i) investigating current RDF knowledge graph production and consumption approaches, and (ii) how to involve the consumer into a hybrid RDF generation approach to reduce the necessary resources for generating RDF for producers & consumers. I discuss the shortcomings of existing approaches for RDF generation from heterogeneous data sources (i.e., materialization and virtualization) and how I will address these: a Systematic Literature Review; an analysis and a set of guidelines for producers to select the right approach for an use case; and a combined hybrid approach to balance the producer’s and consumer’s effort in RDF generation. I already performed a Systematic Literature Review to get an overview of the existing approaches for RDF production from heterogeneous data sources. These results will be used to establish a set of producer guidelines, a benchmark to compare the current materialization and virtualization approaches, and evaluate the proposed hybrid approach. Thanks to my research, knowledge graph production and consumption will be more balanced and accessible to smaller companies and individuals. This way, they can focus on providing better services on top of a knowledge graph instead of being limited by the lack of computing resources to harvest enormous amounts of data from the Web and integrate it into a knowledge graph.

D. Van Assche—Supervised by Anastasia Dimou https://doi.org/0000-0003-2138-7972 & Ben De Meester https://doi.org/0000-0003-0248-0987.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/oeg-upm/morph-kgc.

  2. 2.

    https://virtuoso.openlinksw.com/.

  3. 3.

    https://www.websemanticsjournal.org/.

  4. 4.

    https://www.websemanticsjournal.org/.

  5. 5.

    https://icwe2021.webengineering.org/.

References

  1. Bansal, S., Kagemann, S.: Integrating big data: a semantic extract-transform-load framework. Computer 48(3), 42–50 (2015)

    Article  Google Scholar 

  2. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. In: Spaccapietra, S. (ed.) Journal on Data Semantics X. LNCS, vol. 4900, pp. 133–173. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-77688-8_5

    Chapter  MATH  Google Scholar 

  3. Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. Working group recommendation, World Wide Web Consortium (W3C) (2012)

    Google Scholar 

  4. Slepicka, J., Yin, C., Szekely, P.A., Knoblock, C.A.: Kr2rml: an alternative interpretation of r2rml for heterogenous sources. In: Proceedings of the 6th International Workshop on Consuming Linked Data (COLD 2015) (2015)

    Google Scholar 

  5. Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of heterogeneous databases into RDF, and application to the construction of a SKOS taxonomical reference. In: Monfort, V., Krempels, K.-H., Majchrzak, T.A., Turk, Ž (eds.) WEBIST 2015. LNBIP, vol. 246, pp. 275–296. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30996-5_14

    Chapter  Google Scholar 

  6. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a Generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the 7th Workshop on Linked Data on the Web (2014)

    Google Scholar 

  7. Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., Vidal, M.-E.: SDM-RDFizer: an RML interpreter for the efficient creation of RDF knowledge graphs. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020)

    Google Scholar 

  8. Chaves-Fraga, D., Ruckhaus, E., Priyatna, F., Vidal, M.-E., Corcho, O.: Enhancing virtual ontology based access over tabular data with Morph-CSV. Semant. Web 12(6), 869–902 (2021)

    Article  Google Scholar 

  9. Haesendonck, G., Maroy, W., Heyvaert, P., Verborgh, R., Dimou, A.: Parallel RDF generation from heterogeneous big data. In: Proceedings of the International Workshop on Semantic Big Data - SBD 2019 (2019)

    Google Scholar 

  10. Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. Recommendation, World Wide Web Consortium (W3C) (2013)

    Google Scholar 

  11. Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL extension for generating RDF from heterogeneous formats. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 35–50. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_3

    Chapter  Google Scholar 

  12. Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-X: an opinionated approach to SPARQL anything. In: Further with Knowledge Graphs - Proceedings of the 17th International Conference on Semantic Systems, 6–9 September 2021, Amsterdam, The Netherlands, pp. 58–73 (2021)

    Google Scholar 

  13. Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between RDF and XML with XSPARQL. J. Data Semant. 3, 147–185 (2012)

    Article  Google Scholar 

  14. García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: ShExML: improving the usability of heterogeneous data mapping languages for first-time users. PeerJ Comput. Sci. 318, e318 (2020)

    Article  Google Scholar 

  15. Prud’hommeaux, E.: Shape Expressions 1.0 Primer. Member submission, World Wide Web Consortium (W3C) (2014)

    Google Scholar 

  16. Priyatna, F., Corcho, O., Sequeda, J.: Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph. In: Proceedings of the 23rd International Conference on World Wide web, pp. 479–490 (2014)

    Google Scholar 

  17. Sequeda, J.F., Miranker, D.P.: Ultrawrap: SPARQL execution on relational data. J. Web Semant. 22, 19–39 (2013)

    Article  Google Scholar 

  18. Calvanese, D., et al.: Ontop: answering SPARQL queries over relational databases. Semant. Web J. 3, 471–487 (2017)

    Google Scholar 

  19. Mami, M.N., Graux, D., Scerri, S., Jabeen, H., Auer, S., Lehmann, J.: Squerall: virtual ontology-based access to heterogeneous and large data sources. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11779, pp. 229–245. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30796-7_15

    Chapter  Google Scholar 

  20. Endris, K.M., Rohde, P.D., Vidal, M.-E., Auer, S.: Ontario: federated query processing against a semantic data lake. In: Database and Expert Systems Applications: 30th International Conference, DEXA, Part I, pp. 379–395 (2019)

    Google Scholar 

  21. Khan, Y., Zimmermann, A., Jha, A., Gadepally, V., D’Aquin, M., Sahay, R.: One size does not fit all: querying web polystores. IEEE Access 7, 9598–9617 (2019)

    Article  Google Scholar 

  22. Chaves-Fraga, D., Priyatna, F., Cimmino, A., Toledo, J., Ruckhaus, E., Corcho, O.: GTFS-Madrid-bench: a benchmark for virtual knowledge graph access in the transport domain. J. Web Semant. 65, 100596 (2020)

    Article  Google Scholar 

  23. Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf. Syst. (IJSWIS) 2, 1–24 (2009)

    Google Scholar 

  24. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2), 158–182 (2005). Selcted Papers from the International Semantic Web Conference, 2004

    Google Scholar 

  25. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: Sp\(\hat{}\) 2bench: a SPARQL performance benchmark. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 222–233 (2009)

    Google Scholar 

  26. Hasnain, A., et al.: Biofed: federated query processing over life sciences linked open data. J. Biomed. Semant. 8(1), 1–19 (2017)

    Article  Google Scholar 

  27. Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark–performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29

    Chapter  Google Scholar 

  28. Rivero, C.R., Schultz, A., Bizer, C., Ruiz Cortés, D.: Benchmarking the performance of linked data translation systems. In: LDOW 2012: WWW2012 Workshop on Linked Data on the Web (2012)

    Google Scholar 

  29. Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: reality check for OBDA systems. In: EDBT, pp. 617–628 (2015)

    Google Scholar 

  30. Chaves-Fraga, D., Endris, K.M., Iglesias, E., Corcho, O., Vidal, M.-E.: What are the parameters that affect the construction of a knowledge graph? In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (eds.) OTM 2019. LNCS, vol. 11877, pp. 695–713. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33246-4_43

    Chapter  Google Scholar 

  31. Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)

    Article  Google Scholar 

  32. Machado, G.V., Cunha, Í., Pereira, A.C.M., Oliveira, L.B.: DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J. Internet Serv. Appl. 10(1), 1–15 (2019). https://doi.org/10.1186/s13174-019-0121-z

    Article  Google Scholar 

  33. Van Assche, D., et al.: Leveraging web of things W3C recommendations for knowledge graphs generation. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds.) ICWE 2021. LNCS, vol. 12706, pp. 337–352. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74296-6_26

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dylan Van Assche .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Van Assche, D. (2022). Balancing RDF Generation from Heterogeneous Data Sources. In: Groth, P., et al. The Semantic Web: ESWC 2022 Satellite Events. ESWC 2022. Lecture Notes in Computer Science, vol 13384. Springer, Cham. https://doi.org/10.1007/978-3-031-11609-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11609-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11608-7

  • Online ISBN: 978-3-031-11609-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics