research-article

μ-Bench: Real-world Micro Benchmarking for SPARQL Query Processing over Knowledge Graphs

Authors:

Muhammad Saleem,

Axel-Cyrille Ngonga NgomoAuthors Info & Claims

IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs

Pages 39 - 47

https://doi.org/10.1145/3579051.3579054

Published: 13 February 2023 Publication History

Abstract

Real-world SPARQL querying benchmarks, which make use of the real-world RDF datasets and/or SPARQL queries, are the key element in testing the performance of different RDF Knowledge graph management systems in real-world settings. Over the last years, various real-world SPARQL querying benchmarks have been proposed. Although useful for general purpose SPARQL benchmarking, they do not allow generating microbenchmarks, i.e., small customized benchmarks according to the user specified criteria for a specific use case. These microbenchmarks are important to perform component-based testing, hence pinpoint pros and cons of the systems at micro level. We propose μ-Bench, a microbenchmarking framework for SPARQL query processing over RDF knowledge graphs. The framework makes use of the real-world (collected from query logs of public SPARQL endpoints) SPARQL queries to generate customized benchmarks according to the user defined criteria. The framework utilizes various clustering algorithms to select diverse benchmarks from the given input query log. We generated various microbenchmarks and evaluated state-of-the-art knowledge graph engines. The evaluation results show that specialized microbenchmarking is crucial for identifying the limitations of the various SPARQL query processing engines and other corresponding components.

References

[1]

Waqas Ali, Muhammad Saleem, Bin Yao, Aidan Hogan, and Axel-Cyrille Ngonga Ngomo. 2021. A survey of RDF stores & SPARQL engines for querying knowledge graphs. The VLDB Journal (2021), 1–26.

[2]

Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC. 197–212. https://doi.org/10.1007/978-3-319-11964-9_13

Digital Library

[3]

Marcelo Arenas, Claudio Gutiérrez, and Jorge Pérez. 2009. On the Semantics of SPARQL. In Semantic Web Information Management - A Model-Based Perspective. Springer, 281–307. https://doi.org/10.1007/978-3-642-04329-1_13

[4]

Marcelo Arenas and Jorge Pérez. 2012. Federation and Navigation in SPARQL 1.1. Springer.

[5]

Sören Auer, Jens Lehmann, and Sebastian Hellmann. 2009. Linkedgeodata: Adding a spatial dimension to the web of data. In International Semantic Web Conference. Springer, 731–746.

Digital Library

[6]

Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark van Harmelen, Rafael S. Gonçalves, and Cristina Garilao. 2012. FishMark: A Linked Data Application Benchmark. In Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems. 1–15. http://ceur-ws.org/Vol-943/SSWS_HPCSW2012_paper1.pdf

[7]

Luigi Bellomarini, Emanuel Sallinger, and Sahar Vahdati. 2020. Knowledge graphs: the layered perspective. In Knowledge Graphs and Big Data Processing. Springer, Cham, 20–34.

[8]

Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1–24. https://doi.org/10.4018/jswis.2009040101

[9]

Felix Conrads, Jens Lehmann, Muhammad Saleem, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2017. IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores. In ISWC. Springer, 48–65. https://doi.org/10.1007/978-3-319-68204-4_5

Digital Library

[10]

Gianluca Demartini, Iliya Enchev, Marcin Wylot, Joël Gapany, and Philippe Cudré-Mauroux. 2011. BowlognaBench - Benchmarking RDF Analytics. In Data-Driven Process Discovery and Analysis SIMPDA. Springer, 82–102. https://doi.org/10.1007/978-3-642-34044-4_5

[11]

Anastasia Dimou, Sahar Vahdati, Angelo Di Iorio, Christoph Lange, Ruben Verborgh, and Erik Mannens. 2017. Challenges as enablers for high quality Linked Data: insights from the Semantic Publishing Challenge. PeerJ Computer Science 3(2017), e105.

[12]

Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. ACM, 619–630. https://doi.org/10.1145/2723372.2742786

Digital Library

[13]

Olaf Görlitz, Matthias Thimm, and Steffen Staab. 2012. SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data. In ISWC. Springer, 116–132. https://doi.org/10.1007/978-3-642-35176-1_8

Digital Library

[14]

Manzoor Ali Hashim Khan, Axel-Cyrille Ngonga Ngomo, and Muhammad Saleem. 2021. When is the Peak Performance Reached? An Analysis of RDF Triple Stores. In Further with Knowledge Graphs: Proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands, Vol. 53. IOS Press, 154.

[15]

Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. In ISWC. Springer, 454–469. https://doi.org/10.1007/978-3-642-25073-6_29

[16]

Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In ISWC. Springer, 261–269. https://doi.org/10.1007/978-3-319-25010-6_15

Digital Library

[17]

Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Sem. 48(2018), 85–125. https://doi.org/10.1016/j.websem.2017.12.005

[18]

Muhammad Saleem, Maulik R. Kamdar, Aftab Iqbal, Shanmukha Sampath, Helena F. Deus, and Axel-Cyrille Ngonga Ngomo. 2014. Big linked cancer data: Integrating linked TCGA and . Journal of Web Semantics 27-28 (2014), 34–41. https://doi.org/10.1016/j.websem.2014.07.004 Semantic Web Challenge 2013.

Digital Library

[19]

Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. FEASIBLE: a feature-based SPARQL benchmark generation framework. In ISWC. Springer, 52–69.

[20]

Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. Feasible: A feature-based sparql benchmark generation framework. In International Semantic Web Conference. Springer, 52–69.

Digital Library

[21]

Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. 2018. CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation. In SEMANTICS(Procedia Computer Science, Vol. 137). Elsevier, 163–174. https://doi.org/10.1016/j.procs.2018.09.016

[22]

Muhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2017. SQCFramework: SPARQL Query Containment Benchmark Generation Framework. In K-CAP. 28:1–28:8. https://doi.org/10.1145/3148011.3148017

Digital Library

[23]

Muhammad Saleem, Gábor Szárnyas, Felix Conrads, Syed Ahmad Chan Bukhari, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2019. How representative is a sparql benchmark? an analysis of rdf triplestore benchmarks. In The World Wide Web Conference. 1623–1633.

Digital Library

[24]

Michael Schmidt 2009. SP2Bench: A SPARQL Performance Benchmark. In Semantic Web Information Management - A Model-Based Perspective. 371–393. https://doi.org/10.1007/978-3-642-04329-1_16

[25]

Claus Stadlera, Muhammad Saleema, Qaiser Mehmoodb, Carlos Buil-Arandac, Michel Dumontierd, Aidan Hogane, and Axel-Cyrille Ngonga Ngomoa. 2022. LSQ 2.0: A Linked Dataset of SPARQL Query Logs. Semantic Web Journal(2022).

[26]

Gábor Szárnyas, Benedek Izsó, István Ráth, and Dániel Varró. 2018. The Train Benchmark: Cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17, 4 (2018), 1365–1393. https://doi.org/10.1007/s10270-016-0571-8

Digital Library

[27]

Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter A. Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An early look at the LDBC Social Network Benchmark’s Business Intelligence workload. In GRADES-NDA at SIGMOD. ACM, 9:1–9:11. https://doi.org/10.1145/3210259.3210268

Digital Library

[28]

Hongyan Wu 2014. BioBenchmark Toyama 2012: An evaluation of the performance of triple stores on biological data. J. Biomedical Semantics 5 (2014), 32. https://doi.org/10.1186/2041-1480-5-32

Index Terms

μ-Bench: Real-world Micro Benchmarking for SPARQL Query Processing over Knowledge Graphs
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Presentation of retrieval results

Recommendations

Evaluating and Benchmarking SPARQL Query Containment Solvers
ISWC '13: Proceedings of the 12th International Semantic Web Conference - Part II

Query containment is the problem of deciding if the answers to a query are included in those of another query for any queried database. This problem is very important for query optimization purposes. In the SPARQL context, it can be equally useful. This ...
Flexible query processing for SPARQL
Question answering over Linked Data

Flexible querying techniques can enhance users’ access to complex, heterogeneous datasets in settings such as Linked Data, where the user may not always know how a query should be formulated in order to retrieve the desired answers. This paper presents ...
Schema-Agnostic Query Rewriting in SPARQL 1.1
The Semantic Web – ISWC 2014
Abstract
SPARQL 1.1 supports the use of ontologies to enrich query results with logical entailments, and OWL 2 provides a dedicated fragment OWL QL for this purpose. Typical implementations use the OWL QL schema to rewrite a conjunctive query into an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs

October 2022

134 pages

ISBN:9781450399876

DOI:10.1145/3579051

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

IJCKG 2022

IJCKG 2022: 11th International Joint Conference On Knowledge Graphs

October 27 - 28, 2022

Hangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten