Skip to main content

Cost- and Robustness-Based Query Optimization for Linked Data Fragments

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12506))

Abstract

Client-side SPARQL query processing enables evaluating queries over RDF datasets published on the Web without producing high loads on the data providers’ servers. Triple Pattern Fragment (TPF) servers provide means to publish highly available RDF data on the Web and clients to evaluate SPARQL queries over them have been proposed. For clients to devise efficient query plans that minimize both the number of requests submitted to the server as well as the overall execution time, it is key to accurately estimate join cardinalities to appropriately place physical join operators. However, collecting accurate and fine-grained statistics from remote sources is a challenging task, and clients typically rely on the metadata provided by the TPF server. Addressing this shortcoming, we propose CROP, a cost- and robust-based query optimizer to devise efficient plans combining both cost and robustness of query plans. The idea of robustness is determining the impact of join cardinality estimation errors on the cost of a query plan and to avoid plans where this impact is very high. In our experimental study, we show that our concept of robustness complements the cost model and improves the efficiency of query plans. Additionally, we show that our approach outperforms existing TPF clients in terms of overall runtime and number of requests.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://fragments.dbpedia.org/2014/en.

  2. 2.

    https://github.com/comunica/comunica/tree/master/packages/actor-init-sparql.

  3. 3.

    https://github.com/maribelacosta/nlde.

  4. 4.

    \(\llbracket \cdot \rrbracket \) denote Iverson brackets that evaluate to 1 if its logical proposition is true and to 0 otherwise.

  5. 5.

    Complex queries do not contain placeholders, leading to one distinct query in C1, C2, and C3.

  6. 6.

    http://www.rdfhdt.org/datasets/.

  7. 7.

    https://github.com/Lars-H/crop_analysis.

  8. 8.

    https://github.com/Lars-H/crop.

  9. 9.

    https://github.com/LinkedDataFragments/Server.js.

References

  1. Acosta, M., Vidal, M.-E.: Networks of Linked Data Eddies: an adaptive web query processing engine for RDF data. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 111–127. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_7

    Chapter  Google Scholar 

  2. Acosta, M., Vidal, M.-E., Sure-Vetter, Y.: Diefficiency metrics: measuring the continuous efficiency of query processing approaches. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 3–19. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_1

    Chapter  Google Scholar 

  3. Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 197–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_13

    Chapter  Google Scholar 

  4. Azzam, A., Fernández, J.D., Acosta, M., Beno, M., Polleres, A.: SMART-KG: hybrid shipping for SPARQL querying on the web. In: WWW 2020: The Web Conference 2020 (2020)

    Google Scholar 

  5. Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: ACM SIGMOD International Conference on Management of Data (2005)

    Google Scholar 

  6. Babu, S., Bizarro, P., DeWitt, D.J.: Proactive re-optimization. In: ACM SIGMOD International Conference on Management of Data (2005)

    Google Scholar 

  7. Charalambidis, A., Troumpoukis, A., Konstantopoulos, S.: SemaGrow: optimizing federated SPARQL queries. In: SEMANTICS 2015 (2015)

    Google Scholar 

  8. Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: COLD 2011 (2011)

    Google Scholar 

  9. Hartig, O., Buil-Aranda, C.: Bindings-restricted triple pattern fragments. In: Debruyne, C., et al. (eds.) OTM 2016. LNCS, vol. 10033, pp. 762–779. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48472-3_48

    Chapter  Google Scholar 

  10. Heling, L., Acosta, M., Maleshkova, M., Sure-Vetter, Y.: Querying large knowledge graphs over triple pattern fragments: an empirical study. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 86–102. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_6

    Chapter  Google Scholar 

  11. Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25(1), 43–82 (2000)

    Article  Google Scholar 

  12. Minier, T., Skaf-Molli, H., Molli, P.: Sage: web preemption for public SPARQL query services. In: WWW 2019: The Web Conference 2019 (2019)

    Google Scholar 

  13. Montoya, G., Skaf-Molli, H., Hose, K.: The Odyssey approach for optimizing federated SPARQL queries. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 471–489. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_28

    Chapter  Google Scholar 

  14. Quilitz, B., Leser, U.: Querying distributed RDF data sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68234-9_39

    Chapter  Google Scholar 

  15. Saleem, M., Potocki, A., Soru, T., Hartig, O., Ngomo, A.N.: CostFed: cost-based query optimization for SPARQL endpoint federation. In: SEMANTICS 2018 (2018)

    Google Scholar 

  16. Taelman, R., Van Herwegen, J., Vander Sande, M., Verborgh, R.: Comunica: a modular SPARQL query engine for the web. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 239–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00668-6_15

    Chapter  Google Scholar 

  17. Verborgh, R., et al.: Triple pattern fragments: a low-cost knowledge graph interface for the web. J. Web Semant. 37, 184–206 (2016)

    Article  Google Scholar 

  18. Wolf, F., Brendle, M., May, N., Willems, P.R., Sattler, K., Grossniklaus, M.: Robustness metrics for relational query execution plans. PVLDB 1(11), 1360–1372 (2018)

    Google Scholar 

  19. Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. SIGMOD Rec. 44(3), 25–36 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

This work is funded by the German BMBF in QUOCA, FKZ 01IS17042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars Heling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heling, L., Acosta, M. (2020). Cost- and Robustness-Based Query Optimization for Linked Data Fragments. In: Pan, J.Z., et al. The Semantic Web – ISWC 2020. ISWC 2020. Lecture Notes in Computer Science(), vol 12506. Springer, Cham. https://doi.org/10.1007/978-3-030-62419-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62419-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62418-7

  • Online ISBN: 978-3-030-62419-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics