QED: Out-of-the-Box Datasets for SPARQL Query Evaluation

Thost, Veronika; Dolby, Julian

doi:10.1007/978-3-030-21348-0_32

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Included in the following conference series:

European Semantic Web Conference

2582 Accesses
1 Citations

Abstract

In this paper, we present SPARQL QED, a system generating out-of-the-box datasets for SPARQL queries over linked data. QED distinguishes the queries according to the different SPARQL features and creates, for each query, a small but exhaustive dataset comprising linked data and the query answers over this data. These datasets can support the development of applications based on SPARQL query answering in various ways. For instance, they may serve as SPARQL compliance tests or can be used for learning in query-by-example systems. We ensure that the created datasets are diverse and cover various practical use cases and, of course, that the sets of answers included are the correct ones. Example tests generated based on queries and data from DBpedia have shown bugs in Jena and Virtuoso.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.w3.org/2009/sparql/docs/tests/.
2.
The 2009 tests actually contain some more such queries, but these consider an empty dataset and hence are rather unrealistic.
3.
For readability, we generally drop prefix declarations.
4.
https://www.w3.org/TR/sparql11-query/.
5.
We obtained the query from LSQ: http://aksw.github.io/LSQ/.
6.
http://wiki.dbpedia.org/develop/datasets/dbpedia-version-2016-10.
7.
The term “datasets” usually denotes the sets of data included in the test cases. However, we may also use it for the entire test cases consisting of a query, data, and answers (e.g., in the acronym QED); this is to emphasize that they can be applied for multiple purposes, besides correctness testing, and should be clear from context.
8.
http://aksw.github.io/LSQ/.
9.
http://emina.github.io/kodkod/index.html.
10.
In Jena, the bug has been solved by now (see https://issues.apache.org/jira/projects/JENA/issues/JENA-1633).
11.
The datasets can be found in the GitHub repository of QED.
12.
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.
13.
The LSQ format also specifies characteristics such as the number of answers to the query, which is only useful if we also consider the DBpedia data considered with the generation of the query set. Alternatively, we could have applied the LSQ framework to generate LSQ-formatted queries for the current DBpedia version.

References

Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13
Chapter Google Scholar
Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_18
Chapter Google Scholar
Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)
Article Google Scholar
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. PVLDB 11(2), 149–161 (2017)
Google Scholar
Bornea, M., Dolby, J., Fokoue, A., Kementsietsidis, A., Srinivas, K., Vaziri, M.: An executable specification for SPARQL. In: Cellary, W., Mokbel, M.F., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2016. LNCS, vol. 10042, pp. 298–305. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48743-4_24
Chapter Google Scholar
Diaz, G.I., Arenas, M., Benedikt, M.: SPARQLByE: querying RDF data by example. PVLDB 9(13), 1533–1536 (2016)
Google Scholar
Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)
Google Scholar
Fariha, A., Sarwar, S.M., Meliou, A.: SQuID: semantic similarity-aware query intent discovery. In: SIGMOD 2018, pp. 1745–1748 (2018)
Google Scholar
Görlitz, O., Thimm, M., Staab, S.: SPLODGE: systematic generation of SPARQL benchmark queries for linked open data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 116–132. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_8
Chapter Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)
Article Google Scholar
Lehmann, J., Bühmann, L.: AutoSPARQL: let users query your knowledge base. In: Antoniou, G., et al. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 63–79. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_5
Chapter Google Scholar
Lissandrini, M., Mottin, D., Velegrakis, Y., Palpanas, T.: X2q: your personal example-based graph explorer. In: PVLDB 2018 (2018)
Google Scholar
Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology benchmark. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 125–139. Springer, Heidelberg (2006). https://doi.org/10.1007/11762256_12
Chapter Google Scholar
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29
Chapter Google Scholar
Potoniec, J.: An on-line learning to query system. In: ISWC 2016 Posters & Demonstrations Track (2016)
Google Scholar
Rafes, K., Nauroy, J., Germain, C.: Certifying the interoperability of RDF database systems. In: Proceedings of the 2nd Workshop on Linked Data Quality (2015)
Google Scholar
Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15
Chapter Google Scholar
Saleem, M., Hasnainb, A., Ngonga Ngomo, A.C.: LargeRDFBench: a billion triples benchmark for SPARQL endpoint federation. J. Web Sem. (2017)
Google Scholar
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015, Part I. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Chapter Google Scholar
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(^2\) Bench: a SPARQL performance benchmark. In: ICDE 2009, pp. 222–233 (2009)
Google Scholar
Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37
Chapter Google Scholar
Thompson, B.B., Personick, M., Cutcher, M.: The bigdata® RDF graph database. In: Linked Data Management, pp. 193–237 (2014)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D., Ding, L.: Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26(4), 33–39 (2003)
Google Scholar
Wu, H., Fujiwara, T., Yamamoto, Y., Bolleman, J.T., Yamaguchi, A.: Biobenchmark toyama 2012: an evaluation of the performance of triple stores on biological data. J. Biomed. Semant. 5, 32 (2014)
Article Google Scholar

Download references

Acknowledgments

This work is partly supported by the German Research Foundation (DFG) in the Cluster of Excellence “Center for Advancing Electronics Dresden” in CRC 912.

Author information

Authors and Affiliations

MIT-IBM-Watson AI Lab, IBM Research, Cambridge, MA, USA
Veronika Thost
IBM Research, Yorktown Heights, NY, USA
Julian Dolby

Authors

Veronika Thost
View author publications
You can also search for this author in PubMed Google Scholar
Julian Dolby
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Veronika Thost .

Editor information

Editors and Affiliations

Wright State University, Dayton, OH, USA
Pascal Hitzler
KMi, The Open University, Milton Keynes, UK
Miriam Fernández
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz
Maastricht University, Maastricht, The Netherlands
Amrapali Zaveri
Heriot-Watt University, Edinburgh, UK
Alasdair J.G. Gray
IBM Research, Dublin, Ireland
Vanessa Lopez
The Australian National University, Canberra, ACT, Australia
Armin Haller
Jönköping University, Jönköping, Sweden
Karl Hammar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thost, V., Dolby, J. (2019). QED: Out-of-the-Box Datasets for SPARQL Query Evaluation. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-21348-0_32
Published: 25 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics