Abstract
In this paper, we present SPARQL QED, a system generating out-of-the-box datasets for SPARQL queries over linked data. QED distinguishes the queries according to the different SPARQL features and creates, for each query, a small but exhaustive dataset comprising linked data and the query answers over this data. These datasets can support the development of applications based on SPARQL query answering in various ways. For instance, they may serve as SPARQL compliance tests or can be used for learning in query-by-example systems. We ensure that the created datasets are diverse and cover various practical use cases and, of course, that the sets of answers included are the correct ones. Example tests generated based on queries and data from DBpedia have shown bugs in Jena and Virtuoso.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The 2009 tests actually contain some more such queries, but these consider an empty dataset and hence are rather unrealistic.
- 3.
For readability, we generally drop prefix declarations.
- 4.
- 5.
We obtained the query from LSQ: http://aksw.github.io/LSQ/.
- 6.
- 7.
The term “datasets” usually denotes the sets of data included in the test cases. However, we may also use it for the entire test cases consisting of a query, data, and answers (e.g., in the acronym QED); this is to emphasize that they can be applied for multiple purposes, besides correctness testing, and should be clear from context.
- 8.
- 9.
- 10.
In Jena, the bug has been solved by now (see https://issues.apache.org/jira/projects/JENA/issues/JENA-1633).
- 11.
The datasets can be found in the GitHub repository of QED.
- 12.
- 13.
The LSQ format also specifies characteristics such as the number of answers to the query, which is only useful if we also consider the DBpedia data considered with the generation of the query set. Alternatively, we could have applied the LSQ framework to generate LSQ-formatted queries for the current DBpedia version.
References
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_18
Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. PVLDB 11(2), 149–161 (2017)
Bornea, M., Dolby, J., Fokoue, A., Kementsietsidis, A., Srinivas, K., Vaziri, M.: An executable specification for SPARQL. In: Cellary, W., Mokbel, M.F., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2016. LNCS, vol. 10042, pp. 298–305. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48743-4_24
Diaz, G.I., Arenas, M., Benedikt, M.: SPARQLByE: querying RDF data by example. PVLDB 9(13), 1533–1536 (2016)
Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)
Fariha, A., Sarwar, S.M., Meliou, A.: SQuID: semantic similarity-aware query intent discovery. In: SIGMOD 2018, pp. 1745–1748 (2018)
Görlitz, O., Thimm, M., Staab, S.: SPLODGE: systematic generation of SPARQL benchmark queries for linked open data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 116–132. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_8
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)
Lehmann, J., Bühmann, L.: AutoSPARQL: let users query your knowledge base. In: Antoniou, G., et al. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 63–79. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_5
Lissandrini, M., Mottin, D., Velegrakis, Y., Palpanas, T.: X2q: your personal example-based graph explorer. In: PVLDB 2018 (2018)
Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology benchmark. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 125–139. Springer, Heidelberg (2006). https://doi.org/10.1007/11762256_12
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29
Potoniec, J.: An on-line learning to query system. In: ISWC 2016 Posters & Demonstrations Track (2016)
Rafes, K., Nauroy, J., Germain, C.: Certifying the interoperability of RDF database systems. In: Proceedings of the 2nd Workshop on Linked Data Quality (2015)
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Saleem, M., Hasnainb, A., Ngonga Ngomo, A.C.: LargeRDFBench: a billion triples benchmark for SPARQL endpoint federation. J. Web Sem. (2017)
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015, Part I. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(^2\) Bench: a SPARQL performance benchmark. In: ICDE 2009, pp. 222–233 (2009)
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37
Thompson, B.B., Personick, M., Cutcher, M.: The bigdata® RDF graph database. In: Linked Data Management, pp. 193–237 (2014)
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D., Ding, L.: Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26(4), 33–39 (2003)
Wu, H., Fujiwara, T., Yamamoto, Y., Bolleman, J.T., Yamaguchi, A.: Biobenchmark toyama 2012: an evaluation of the performance of triple stores on biological data. J. Biomed. Semant. 5, 32 (2014)
Acknowledgments
This work is partly supported by the German Research Foundation (DFG) in the Cluster of Excellence “Center for Advancing Electronics Dresden” in CRC 912.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Thost, V., Dolby, J. (2019). QED: Out-of-the-Box Datasets for SPARQL Query Evaluation. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-21348-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)