Skip to main content

QED: Out-of-the-Box Datasets for SPARQL Query Evaluation

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Included in the following conference series:

Abstract

In this paper, we present SPARQL QED, a system generating out-of-the-box datasets for SPARQL queries over linked data. QED distinguishes the queries according to the different SPARQL features and creates, for each query, a small but exhaustive dataset comprising linked data and the query answers over this data. These datasets can support the development of applications based on SPARQL query answering in various ways. For instance, they may serve as SPARQL compliance tests or can be used for learning in query-by-example systems. We ensure that the created datasets are diverse and cover various practical use cases and, of course, that the sets of answers included are the correct ones. Example tests generated based on queries and data from DBpedia have shown bugs in Jena and Virtuoso.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.w3.org/2009/sparql/docs/tests/.

  2. 2.

    The 2009 tests actually contain some more such queries, but these consider an empty dataset and hence are rather unrealistic.

  3. 3.

    For readability, we generally drop prefix declarations.

  4. 4.

    https://www.w3.org/TR/sparql11-query/.

  5. 5.

    We obtained the query from LSQ: http://aksw.github.io/LSQ/.

  6. 6.

    http://wiki.dbpedia.org/develop/datasets/dbpedia-version-2016-10.

  7. 7.

    The term “datasets” usually denotes the sets of data included in the test cases. However, we may also use it for the entire test cases consisting of a query, data, and answers (e.g., in the acronym QED); this is to emphasize that they can be applied for multiple purposes, besides correctness testing, and should be clear from context.

  8. 8.

    http://aksw.github.io/LSQ/.

  9. 9.

    http://emina.github.io/kodkod/index.html.

  10. 10.

    In Jena, the bug has been solved by now (see https://issues.apache.org/jira/projects/JENA/issues/JENA-1633).

  11. 11.

    The datasets can be found in the GitHub repository of QED.

  12. 12.

    https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples.

  13. 13.

    The LSQ format also specifies characteristics such as the number of answers to the query, which is only useful if we also consider the DBpedia data considered with the generation of the query set. Alternatively, we could have applied the LSQ framework to generate LSQ-formatted queries for the current DBpedia version.

References

  1. Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13

    Chapter  Google Scholar 

  2. Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL web-querying infrastructure: ready for action? In: Alani, H., et al. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_18

    Chapter  Google Scholar 

  3. Bizer, C., Schultz, A.: The berlin SPARQL benchmark. Int. J. Semantic Web Inf. Syst. 5(2), 1–24 (2009)

    Article  Google Scholar 

  4. Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. PVLDB 11(2), 149–161 (2017)

    Google Scholar 

  5. Bornea, M., Dolby, J., Fokoue, A., Kementsietsidis, A., Srinivas, K., Vaziri, M.: An executable specification for SPARQL. In: Cellary, W., Mokbel, M.F., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds.) WISE 2016. LNCS, vol. 10042, pp. 298–305. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48743-4_24

    Chapter  Google Scholar 

  6. Diaz, G.I., Arenas, M., Benedikt, M.: SPARQLByE: querying RDF data by example. PVLDB 9(13), 1533–1536 (2016)

    Google Scholar 

  7. Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)

    Google Scholar 

  8. Fariha, A., Sarwar, S.M., Meliou, A.: SQuID: semantic similarity-aware query intent discovery. In: SIGMOD 2018, pp. 1745–1748 (2018)

    Google Scholar 

  9. Görlitz, O., Thimm, M., Staab, S.: SPLODGE: systematic generation of SPARQL benchmark queries for linked open data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 116–132. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_8

    Chapter  Google Scholar 

  10. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Sem. 3(2–3), 158–182 (2005)

    Article  Google Scholar 

  11. Lehmann, J., Bühmann, L.: AutoSPARQL: let users query your knowledge base. In: Antoniou, G., et al. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 63–79. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21034-1_5

    Chapter  Google Scholar 

  12. Lissandrini, M., Mottin, D., Velegrakis, Y., Palpanas, T.: X2q: your personal example-based graph explorer. In: PVLDB 2018 (2018)

    Google Scholar 

  13. Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology benchmark. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 125–139. Springer, Heidelberg (2006). https://doi.org/10.1007/11762256_12

    Chapter  Google Scholar 

  14. Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_29

    Chapter  Google Scholar 

  15. Potoniec, J.: An on-line learning to query system. In: ISWC 2016 Posters & Demonstrations Track (2016)

    Google Scholar 

  16. Rafes, K., Nauroy, J., Germain, C.: Certifying the interoperability of RDF database systems. In: Proceedings of the 2nd Workshop on Linked Data Quality (2015)

    Google Scholar 

  17. Saleem, M., Ali, M.I., Hogan, A., Mehmood, Q., Ngomo, A.-C.N.: LSQ: the linked SPARQL queries dataset. In: Arenas, M., et al. (eds.) ISWC 2015, Part II. LNCS, vol. 9367, pp. 261–269. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25010-6_15

    Chapter  Google Scholar 

  18. Saleem, M., Hasnainb, A., Ngonga Ngomo, A.C.: LargeRDFBench: a billion triples benchmark for SPARQL endpoint federation. J. Web Sem. (2017)

    Google Scholar 

  19. Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015, Part I. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4

    Chapter  Google Scholar 

  20. Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP\(^2\) Bench: a SPARQL performance benchmark. In: ICDE 2009, pp. 222–233 (2009)

    Google Scholar 

  21. Schmidt, M., Görlitz, O., Haase, P., Ladwig, G., Schwarte, A., Tran, T.: FedBench: a benchmark suite for federated semantic data query processing. In: Aroyo, L., et al. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 585–600. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_37

    Chapter  Google Scholar 

  22. Thompson, B.B., Personick, M., Cutcher, M.: The bigdata® RDF graph database. In: Linked Data Management, pp. 193–237 (2014)

    Google Scholar 

  23. Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D., Ding, L.: Supporting scalable, persistent semantic web applications. IEEE Data Eng. Bull. 26(4), 33–39 (2003)

    Google Scholar 

  24. Wu, H., Fujiwara, T., Yamamoto, Y., Bolleman, J.T., Yamaguchi, A.: Biobenchmark toyama 2012: an evaluation of the performance of triple stores on biological data. J. Biomed. Semant. 5, 32 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partly supported by the German Research Foundation (DFG) in the Cluster of Excellence “Center for Advancing Electronics Dresden” in CRC 912.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veronika Thost .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thost, V., Dolby, J. (2019). QED: Out-of-the-Box Datasets for SPARQL Query Evaluation. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21348-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21347-3

  • Online ISBN: 978-3-030-21348-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics