Abstract
With the significant growth of RDF data sources in both numbers and volume comes the need to improve the scalability of RDF storage and querying solutions. Current implementations employ various RDF graph partitioning techniques. However, choosing the most suitable partitioning for a given RDF graph and application is not a trivial task. To the best of our knowledge, no detailed empirical evaluation exists to evaluate the performance of these techniques. In this work, we present an empirical evaluation of RDF graph partitioning techniques applied to real-world RDF data sets and benchmark queries. We evaluate the selected RDF graph partitioning techniques in terms of their partitioning time, partitioning imbalance (in sizes), and query run time performances achieved, based on real-world data sets and queries selected using the FEASIBLE benchmark generation framework.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
TCGA: http://tcga.deri.ie/.
- 2.
UniProt: http://www.uniprot.org/statistics/.
- 3.
- 4.
Please see T-Test tab of the excel sheet goo.gl/fxa4cJ.
References
Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Aroyo, L., et al. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_2
Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Kliemann, L., Sanders, P. (eds.) Algorithm Engineering. LNCS, vol. 9220, pp. 117–158. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49487-6_4
Charalambidis, A., et al.: SemaGrow: optimizing federated SPARQL queries. In: SEMANTICS (2015)
Erling, O., Mikhailov, I.: Towards web scale RDF. In: Proceedings of SSWS (2008)
Janke, D., et al.: Impact analysis of data placement strategies on query efforts in distributed RDF stores. JWS (2018)
Galárraga, L., et al.: Partout: a distributed engine for efficient RDF processing. In: WWW (2014)
Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting void descriptions. In: COLD (2011)
Gurajada, S., et al.: Triad: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD (2014)
Hammoud, M., et al.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: VLDB (2015)
Harris, S., et al.: 4store: the design and implementation of a clustered RDF store. In: SSWS (2009)
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: a federated repository for querying graph structured data from the web. In: Aberer, K., et al. (eds.) The Semantic Web. ISWC 2007, ASWC 2007. Lecture Notes in Computer Science, vol. 4825, pp. 211–224. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_16
Herodotou, H., et al.: Query optimization techniques for partitioned tables. In: SIGMOD (2011)
Huang, J., et al.: Scalable SPARQL querying of large RDF graphs. In: VLDB (2011)
Janke, D., et al.: Koral: a glass box profiling system for individual components of distributed RDF stores. In: BLINK-ISWC (2017)
Karypis, G., et al.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM JSC 20, 359–392 (1998)
Khandelwal, A., et al.: ZipG: a memory-efficient graph store for interactive queries. In: ACM ICMD (2017)
Neumann, T., et al.: The RDF-3X engine for scalable management of RDF data. In: VLDB (2010)
Owens, A., et al.: Clustered TDB: a clustered triple store for Jena (2008)
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Saleem, M., et al.: A fine-grained evaluation of SPARQL endpoint federation systems. SWJ (2016)
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on Hadoop. In: Mika, P., et al. (eds.) The Semantic Web - ISWC 2014. ISWC 2014. Lecture Notes in Computer Science, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11
Schätzle, A., et al.: S2RDF: RDF querying with SPARQL on spark. In: VLDB (2016)
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., et al. (eds.) The Semantic Web - ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Tomaszuk, D., Skonieczny, Ł., Wood, D.: RDF graph partitions: a brief survey. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 256–264. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18422-7_23
Wang, X., et al.: LHD: optimising linked data query processing using parallelisation. In: LDOW (2013)
Yan, Y., et al.: Efficient indices using graph partitioning in RDF triple stores. In: ICDE (2009)
Zeng, K., et al.: A distributed graph engine for web scale RDF data. In: Proceedings of the VLDB Endowment (2013)
Acknowledgements
This work was supported by the H2020 project HOBBIT (no. 688227).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Akhter, A., Ngomo Ngonga, AC., Saleem, M. (2018). An Empirical Evaluation of RDF Graph Partitioning Techniques. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-03667-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03666-9
Online ISBN: 978-3-030-03667-6
eBook Packages: Computer ScienceComputer Science (R0)