Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata

Lam, An Ngoc; Elvesæter, Brian; Martin-Recuerda, Francisco

doi:10.1007/978-3-031-33455-9_40

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13870))

Included in the following conference series:

European Semantic Web Conference

924 Accesses
1 Citations

Abstract

In this paper, we present an evaluation of the performance of five representative RDF triplestores, including GraphDB, Jena Fuseki, Neptune, RDFox, and Stardog, and one experimental SPARQL query engine, QLever. We compare importing time, loading time, and exporting time using a complete version of the knowledge graph Wikidata, and we also evaluate query performances using 328 queries defined by Wikidata users. To put this evaluation into context with respect to previous evaluations, we also analyze the query performances of these systems using a prominent synthetic benchmark: SP\(^2\)Bench. We observed that most of the systems we considered for the evaluation were able to complete the execution of almost all the queries defined by Wikidata users before the timeout we established. We noticed, however, that the time needed by most systems to import and export Wikidata might be longer than required in some industrial and academic projects, where information is represented, enriched, and stored using different representation means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13
Chapter Google Scholar
Amazon AWS: Amazon Neptune Official Website. https://aws.amazon.com/neptune/
Amazon Web Services: Amazon EC2 Instance Types - Memory Optimized. https://aws.amazon.com/ec2/instance-types/#Memory_Optimized. Accessed 12 Dec 2022
Amazon Web Services: Amazon Neptune Pricing. https://aws.amazon.com/neptune/pricing/. Accessed 12 Dec 2022
Angles, R., Aranda, C.B., Hogan, A., Rojas, C., Vrgoc̆, D.: WDBench: A Wikidata Graph Query Benchmark. In: The Semantic Web–ISWC 2022. ISWC 2022. Lecture Notes in Computer Science, vol. pp. 714–731 13489. Springer, Cham. https://doi.org/10.1007/978-3-031-19433-7_41
Apache Jena: Apache Jena Fuseki Documentation. https://jena.apache.org/documentation/fuseki2/
Apache Jena: Apache Jena TDB xloader. https://jena.apache.org/documentation/tdb/tdb-xloader.html. Accessed 12 Dec 2022
Bail, S., et al.: FishMark: A Linked Data Application Benchmark. CEUR (2012)
Google Scholar
Hannah, B., Björn, B.: QLever GitHub repository. https://github.com/ad-freiburg/qlever
Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Seman. Web Inf. Syst. (IJSWIS) 5(2), 1–24 (2009)
Article Google Scholar
Blazegraph: Blazegraph Official Website. https://blazegraph.com/
Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudré-Mauroux, P.: BowlognaBench—benchmarking RDF analytics. In: Aberer, K., Damiani, E., Dillon, T. (eds.) SIMPDA 2011. LNBIP, vol. 116, pp. 82–102. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34044-4_5
Chapter Google Scholar
Erling, O., et al.: The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 619–630 (2015)
Google Scholar
Fahl, W., Holzheim, T., Westerinen, A., Lange, C., Decker, S.: Getting and hosting your own copy of Wikidata. In: Proceedings of the 3rd Wikidata Workshop 2022. CEUR-WS.org (2022). https://ceur-ws.org/Vol-3262/paper9.pdf
GitHub: Analysis and supplementary information for the paper, including queries, execution logs, query results and scripts. https://github.com/SINTEF-9012/rdf-triplestore-benchmark. Accessed 13 Mar 2023
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Seman. 3(2–3), 158–182 (2005)
Article Google Scholar
Hogan, A., Riveros, C., Rojas, C., Soto, A.: A worst-case optimal join algorithm for SPARQL. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 258–275. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_15
Chapter Google Scholar
Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology benchmark. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 125–139. Springer, Heidelberg (2006). https://doi.org/10.1007/11762256_12
Chapter Google Scholar
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29
Chapter Google Scholar
Ontotext: GraphDB Official Website. https://graphdb.ontotext.com/
Ontotext: GraphDB Requirements. https://graphdb.ontotext.com/documentation/enterprise/requirements.html. Accessed 12 Dec 2022
OST: RDFox Documentation: Managing Data Stores. https://docs.oxfordsemantic.tech/5.4/data-stores.html#. Accessed 12 Dec 2022
OST: RDFox Documentation: Operations on Data Stores, persist-ds. https://docs.oxfordsemantic.tech/5.4/data-stores.html#persist-ds. Accessed 12 Dec 2022
Oxford Semantic Technologies: RDFox Official Website. https://www.oxfordsemantic.tech/product
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Chapter Google Scholar
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Chapter Google Scholar
Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S.A.C., Mehmood, Q., Ngonga Ngomo, A.C.: How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks. In: The World Wide Web Conference, pp. 1623–1633 (2019)
Google Scholar
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(^2\)Bench: A SPARQL performance benchmark. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 222–233. IEEE (2009)
Google Scholar
Singh, G., Bhatia, S., Mutharaju, R.: OWL2Bench: a benchmark for OWL 2 reasoners. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 81–96. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_6
Chapter Google Scholar
Stardog: Stardog Capacity Planning. https://docs.stardog.com/operating-stardog/server-administration/capacity-planning. Accessed 12 Dec 2022
Stardog: Stardog Official Website. https://www.stardog.com/
Stardog: 7 Steps to Fast SPARQL Queries. https://www.stardog.com/blog/7-steps-to-fast-sparql-queries/ (2017). Accessed 12 Dec 2022
Szárnyas, G., Izsó, B., Ráth, I., Varró, D.: The train benchmark: cross-technology performance evaluation of continuous model queries. Softw. Syst. Model. 17(4), 1365–1393 (2018)
Article Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
W3C: RDF 1.1 Concepts and Abstract Syntax, W3C Recommendation (2014). https://www.w3.org/TR/rdf11-concepts/. Accessed 12 Dec 2022
W3C: SPARQL 1.1 Query Language, W3C Recommendation (2013). https://www.w3.org/TR/sparql11-query/. Accessed 12 Dec 2022
WDQS Search Team: WDQS Backend Alternatives: The process, details and result. Technical report, Wikimedia Foundation (2022). https://www.wikidata.org/wiki/File:WDQS_Backend_Alternatives_working_paper.pdf
Wikidata: SPARQL query service/queries/examples. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples. Accessed 12 Dec 2022
Wikidata: SPARQL query service/WDQS backend update. https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update. Accessed 12 Dec 2022
Wu, H., Fujiwara, T., Yamamoto, Y., Bolleman, J., Yamaguchi, A.: BioBenchmark toyama 2012: an evaluation of the performance of triple stores on biological data. J. Biomed. Seman. 5(1), 1–11 (2014)
Google Scholar

Download references

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable feedback and the companies Ontotext and Oxford Semantic Technologies (OST) for their support during the evaluation. This work has been funded by The Research Council of Norway projects SkyTrack (No 309714), DataBench Norway (No 310134) and SIRIUS Centre (No 237898), and the European Commission projects DataBench (No 780966), VesselAI (No 957237), Iliad (No 101037643), enRichMyData (No 101070284) and Graph-Massivizer (No 101093202).

Author information

Authors and Affiliations

SINTEF AS, Forskningsveien 1, 0373, Oslo, Norway
An Ngoc Lam, Brian Elvesæter & Francisco Martin-Recuerda

Authors

An Ngoc Lam
View author publications
You can also search for this author in PubMed Google Scholar
Brian Elvesæter
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Martin-Recuerda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to An Ngoc Lam .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Catia Pesquita
University of London, London, UK
Ernesto Jimenez-Ruiz
Rensselaer Polytechnic Institute, Troy, MI, USA
Jamie McCusker
Universidade de Lisboa, Lisbon, Portugal
Daniel Faria
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
EURECOM, Biot, France
Raphael Troncy
University of Mannheim, Mannheim, Germany
Sven Hertling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lam, A.N., Elvesæter, B., Martin-Recuerda, F. (2023). Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_40

Download citation

DOI: https://doi.org/10.1007/978-3-031-33455-9_40
Published: 22 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33454-2
Online ISBN: 978-3-031-33455-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of a Representative Selection of SPARQL Query Engines Using Wikidata