skip to main content
10.1145/3579051.3579054acmotherconferencesArticle/Chapter ViewAbstractPublication PagesijckgConference Proceedingsconference-collections
research-article

μ-Bench: Real-world Micro Benchmarking for SPARQL Query Processing over Knowledge Graphs

Published: 13 February 2023 Publication History

Abstract

Real-world SPARQL querying benchmarks, which make use of the real-world RDF datasets and/or SPARQL queries, are the key element in testing the performance of different RDF Knowledge graph management systems in real-world settings. Over the last years, various real-world SPARQL querying benchmarks have been proposed. Although useful for general purpose SPARQL benchmarking, they do not allow generating microbenchmarks, i.e., small customized benchmarks according to the user specified criteria for a specific use case. These microbenchmarks are important to perform component-based testing, hence pinpoint pros and cons of the systems at micro level. We propose μ-Bench, a microbenchmarking framework for SPARQL query processing over RDF knowledge graphs. The framework makes use of the real-world (collected from query logs of public SPARQL endpoints) SPARQL queries to generate customized benchmarks according to the user defined criteria. The framework utilizes various clustering algorithms to select diverse benchmarks from the given input query log. We generated various microbenchmarks and evaluated state-of-the-art knowledge graph engines. The evaluation results show that specialized microbenchmarking is crucial for identifying the limitations of the various SPARQL query processing engines and other corresponding components.

References

[1]
Waqas Ali, Muhammad Saleem, Bin Yao, Aidan Hogan, and Axel-Cyrille Ngonga Ngomo. 2021. A survey of RDF stores & SPARQL engines for querying knowledge graphs. The VLDB Journal (2021), 1–26.
[2]
Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC. 197–212. https://doi.org/10.1007/978-3-319-11964-9_13
[3]
Marcelo Arenas, Claudio Gutiérrez, and Jorge Pérez. 2009. On the Semantics of SPARQL. In Semantic Web Information Management - A Model-Based Perspective. Springer, 281–307. https://doi.org/10.1007/978-3-642-04329-1_13
[4]
Marcelo Arenas and Jorge Pérez. 2012. Federation and Navigation in SPARQL 1.1. Springer.
[5]
Sören Auer, Jens Lehmann, and Sebastian Hellmann. 2009. Linkedgeodata: Adding a spatial dimension to the web of data. In International Semantic Web Conference. Springer, 731–746.
[6]
Samantha Bail, Sandra Alkiviadous, Bijan Parsia, David Workman, Mark van Harmelen, Rafael S. Gonçalves, and Cristina Garilao. 2012. FishMark: A Linked Data Application Benchmark. In Proceedings of the Joint Workshop on Scalable and High-Performance Semantic Web Systems. 1–15. http://ceur-ws.org/Vol-943/SSWS_HPCSW2012_paper1.pdf
[7]
Luigi Bellomarini, Emanuel Sallinger, and Sahar Vahdati. 2020. Knowledge graphs: the layered perspective. In Knowledge Graphs and Big Data Processing. Springer, Cham, 20–34.
[8]
Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1–24. https://doi.org/10.4018/jswis.2009040101
[9]
Felix Conrads, Jens Lehmann, Muhammad Saleem, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. 2017. IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores. In ISWC. Springer, 48–65. https://doi.org/10.1007/978-3-319-68204-4_5
[10]
Gianluca Demartini, Iliya Enchev, Marcin Wylot, Joël Gapany, and Philippe Cudré-Mauroux. 2011. BowlognaBench - Benchmarking RDF Analytics. In Data-Driven Process Discovery and Analysis SIMPDA. Springer, 82–102. https://doi.org/10.1007/978-3-642-34044-4_5
[11]
Anastasia Dimou, Sahar Vahdati, Angelo Di Iorio, Christoph Lange, Ruben Verborgh, and Erik Mannens. 2017. Challenges as enablers for high quality Linked Data: insights from the Semantic Publishing Challenge. PeerJ Computer Science 3(2017), e105.
[12]
Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. ACM, 619–630. https://doi.org/10.1145/2723372.2742786
[13]
Olaf Görlitz, Matthias Thimm, and Steffen Staab. 2012. SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data. In ISWC. Springer, 116–132. https://doi.org/10.1007/978-3-642-35176-1_8
[14]
Manzoor Ali Hashim Khan, Axel-Cyrille Ngonga Ngomo, and Muhammad Saleem. 2021. When is the Peak Performance Reached? An Analysis of RDF Triple Stores. In Further with Knowledge Graphs: Proceedings of the 17th International Conference on Semantic Systems, 6-9 September 2021, Amsterdam, The Netherlands, Vol. 53. IOS Press, 154.
[15]
Mohamed Morsey, Jens Lehmann, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2011. DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data. In ISWC. Springer, 454–469. https://doi.org/10.1007/978-3-642-25073-6_29
[16]
Muhammad Saleem, Muhammad Intizar Ali, Aidan Hogan, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. LSQ: The Linked SPARQL Queries Dataset. In ISWC. Springer, 261–269. https://doi.org/10.1007/978-3-319-25010-6_15
[17]
Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation. J. Web Sem. 48(2018), 85–125. https://doi.org/10.1016/j.websem.2017.12.005
[18]
Muhammad Saleem, Maulik R. Kamdar, Aftab Iqbal, Shanmukha Sampath, Helena F. Deus, and Axel-Cyrille Ngonga Ngomo. 2014. Big linked cancer data: Integrating linked TCGA and . Journal of Web Semantics 27-28 (2014), 34–41. https://doi.org/10.1016/j.websem.2014.07.004 Semantic Web Challenge 2013.
[19]
Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. FEASIBLE: a feature-based SPARQL benchmark generation framework. In ISWC. Springer, 52–69.
[20]
Muhammad Saleem, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2015. Feasible: A feature-based sparql benchmark generation framework. In International Semantic Web Conference. Springer, 52–69.
[21]
Muhammad Saleem, Alexander Potocki, Tommaso Soru, Olaf Hartig, and Axel-Cyrille Ngonga Ngomo. 2018. CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation. In SEMANTICS(Procedia Computer Science, Vol. 137). Elsevier, 163–174. https://doi.org/10.1016/j.procs.2018.09.016
[22]
Muhammad Saleem, Claus Stadler, Qaiser Mehmood, Jens Lehmann, and Axel-Cyrille Ngonga Ngomo. 2017. SQCFramework: SPARQL Query Containment Benchmark Generation Framework. In K-CAP. 28:1–28:8. https://doi.org/10.1145/3148011.3148017
[23]
Muhammad Saleem, Gábor Szárnyas, Felix Conrads, Syed Ahmad Chan Bukhari, Qaiser Mehmood, and Axel-Cyrille Ngonga Ngomo. 2019. How representative is a sparql benchmark? an analysis of rdf triplestore benchmarks. In The World Wide Web Conference. 1623–1633.
[24]
Michael Schmidt 2009. SP2Bench: A SPARQL Performance Benchmark. In Semantic Web Information Management - A Model-Based Perspective. 371–393. https://doi.org/10.1007/978-3-642-04329-1_16
[25]
Claus Stadlera, Muhammad Saleema, Qaiser Mehmoodb, Carlos Buil-Arandac, Michel Dumontierd, Aidan Hogane, and Axel-Cyrille Ngonga Ngomoa. 2022. LSQ 2.0: A Linked Dataset of SPARQL Query Logs. Semantic Web Journal(2022).
[26]
Gábor Szárnyas, Benedek Izsó, István Ráth, and Dániel Varró. 2018. The Train Benchmark: Cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17, 4 (2018), 1365–1393. https://doi.org/10.1007/s10270-016-0571-8
[27]
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter A. Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An early look at the LDBC Social Network Benchmark’s Business Intelligence workload. In GRADES-NDA at SIGMOD. ACM, 9:1–9:11. https://doi.org/10.1145/3210259.3210268
[28]
Hongyan Wu 2014. BioBenchmark Toyama 2012: An evaluation of the performance of triple stores on biological data. J. Biomedical Semantics 5 (2014), 32. https://doi.org/10.1186/2041-1480-5-32

Index Terms

  1. μ-Bench: Real-world Micro Benchmarking for SPARQL Query Processing over Knowledge Graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IJCKG '22: Proceedings of the 11th International Joint Conference on Knowledge Graphs
    October 2022
    134 pages
    ISBN:9781450399876
    DOI:10.1145/3579051
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IJCKG 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 59
      Total Downloads
    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media