Skip to main content

An Empirical Evaluation of RDF Graph Partitioning Techniques

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11313))

Abstract

With the significant growth of RDF data sources in both numbers and volume comes the need to improve the scalability of RDF storage and querying solutions. Current implementations employ various RDF graph partitioning techniques. However, choosing the most suitable partitioning for a given RDF graph and application is not a trivial task. To the best of our knowledge, no detailed empirical evaluation exists to evaluate the performance of these techniques. In this work, we present an empirical evaluation of RDF graph partitioning techniques applied to real-world RDF data sets and benchmark queries. We evaluate the selected RDF graph partitioning techniques in terms of their partitioning time, partitioning imbalance (in sizes), and query run time performances achieved, based on real-world data sets and queries selected using the FEASIBLE benchmark generation framework.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    TCGA: http://tcga.deri.ie/.

  2. 2.

    UniProt: http://www.uniprot.org/statistics/.

  3. 3.

    http://glaros.dtc.umn.edu/gkhome/metis/metis/download.

  4. 4.

    Please see T-Test tab of the excel sheet goo.gl/fxa4cJ.

References

  1. Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2

    Chapter  Google Scholar 

  2. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Kliemann, L., Sanders, P. (eds.) Algorithm Engineering. LNCS, vol. 9220, pp. 117–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49487-6_4

    Chapter  Google Scholar 

  3. Charalambidis, A., et al.: SemaGrow: optimizing federated SPARQL queries. In: SEMANTICS (2015)

    Google Scholar 

  4. Erling, O., Mikhailov, I.: Towards web scale RDF. In: Proceedings of SSWS (2008)

    Google Scholar 

  5. Janke, D., et al.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. JWS (2018)

    Google Scholar 

  6. Galárraga, L., et al.: Partout: a distributed engine for efficient RDF processing. In: WWW (2014)

    Google Scholar 

  7. Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting void descriptions. In: COLD (2011)

    Google Scholar 

  8. Gurajada, S., et al.: Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD (2014)

    Google Scholar 

  9. Hammoud, M., et al.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: VLDB (2015)

    Google Scholar 

  10. Harris, S., et al.: 4store: the design and implementation of a clustered RDF store. In: SSWS (2009)

    Google Scholar 

  11. Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K., et al. (eds.) The Semantic Web. ISWC 2007, ASWC 2007. Lecture Notes in Computer Science, vol. 4825, pp. 211–224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_16

    Chapter  Google Scholar 

  12. Herodotou, H., et al.: Query optimization techniques for partitioned tables. In: SIGMOD (2011)

    Google Scholar 

  13. Huang, J., et al.: Scalable SPARQL querying of large RDF graphs. In: VLDB (2011)

    Google Scholar 

  14. Janke, D., et al.: Koral: a glass box profiling system for individual components of distributed RDF stores. In: BLINK-ISWC (2017)

    Google Scholar 

  15. Karypis, G., et al.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM JSC 20, 359–392 (1998)

    MathSciNet  MATH  Google Scholar 

  16. Khandelwal, A., et al.: ZipG: a memory-efficient graph store for interactive queries. In: ACM ICMD (2017)

    Google Scholar 

  17. Neumann, T., et al.: The RDF-3X engine for scalable management of RDF data. In: VLDB (2010)

    Google Scholar 

  18. Owens, A., et al.: Clustered TDB: a clustered triple store for Jena (2008)

    Google Scholar 

  19. Saleem, M., Mehmood, Q., Ngonga Ngomo, A.C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4

    Chapter  Google Scholar 

  20. Saleem, M., et al.: A fine-grained evaluation of SPARQL endpoint federation systems. SWJ (2016)

    Google Scholar 

  21. Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on Hadoop. In: Mika, P., et al. (eds.) The Semantic Web - ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11

    Google Scholar 

  22. Schätzle, A., et al.: S2RDF: RDF querying with SPARQL on spark. In: VLDB (2016)

    Google Scholar 

  23. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) The Semantic Web - ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38

    Chapter  Google Scholar 

  24. Tomaszuk, D., Skonieczny, Ł., Wood, D.: RDF graph partitions: a brief survey. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 256–264. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18422-7_23

    Chapter  Google Scholar 

  25. Wang, X., et al.: LHD: optimising linked data query processing using parallelisation. In: LDOW (2013)

    Google Scholar 

  26. Yan, Y., et al.: Efficient indices using graph partitioning in RDF triple stores. In: ICDE (2009)

    Google Scholar 

  27. Zeng, K., et al.: A distributed graph engine for web scale RDF data. In: Proceedings of the VLDB Endowment (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the H2020 project HOBBIT (no. 688227).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adnan Akhter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Akhter, A., Ngomo Ngonga, AC., Saleem, M. (2018). An Empirical Evaluation of RDF Graph Partitioning Techniques. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03667-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03666-9

  • Online ISBN: 978-3-030-03667-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics