Towards Prescriptive Analyses of Querying Large Knowledge Graphs

Ragab, Mohamed

doi:10.1007/978-3-031-15743-1_59

Mohamed Ragab¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1652))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

991 Accesses
1 Citations

Abstract

Leveraging relational Big Data (BD) processing frameworks to process large-scale (RDF) graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. In this paper, we discuss how our work fills this timely research gap. Particularly, we investigate how to enable prescriptive analytics via ranking functions (called “BenchRank”). We present a research plan that builds on the state-of-the-art benchmarking efforts in the area of querying large RDF graphs. Finally, we present our research results of the proposed plan.

M. Ragab—Supervised by Riccardo Tommasini, LIRIS Lab, INSA Lyon, France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The relational schema impacts query joins, partitioning techniques impact data shuffling, whilst storage formats impact physical execution plans.
2.
We omit details about schema options (ST, VP, PT) and partitioning options (HP, SBP, PBP) due to space limits, however, still can be found in the project’s GitHub page: https://datasystemsgrouput.github.io/SPARKSQLRDFBenchmarking/.
3.
Each configuration C has a rank according to its running time of the queries.
4.
Kendall’s index is a common measure to compare the ordering of ranking functions.
5.
Conformance and Coherence results [7] are omitted due to space limits.

References

Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. VLDB 10(13), 2049–2060 (2017)
Google Scholar
Akhter, A., Ngomo Ngonga, A.-C., Saleem, M.: An empirical evaluation of RDF graph partitioning techniques. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) EKAW 2018. LNCS (LNAI), vol. 11313, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03667-6_1
Chapter Google Scholar
Arrascue Ayala, V.A..: Relational schemata for distributed SPARQL query processing. In: SBD (2019)
Google Scholar
Deb, K., Pratap, A., Agarwal, S.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Ivanov, T., Pergolesi, M.: The impact of columnar file formats on SQL-on-hadoop engine performance: a study on ORC and parquet. Concurr. Comput. Pract. Exp. 32(5), e5523 (2019)
Google Scholar
Moaawad, M.R., Mokhtar, H.M.O., Al Feel, H.T.: On-the-fly academic linked data integration. In: Proceedings of the International Conference on Compute and Data Analysis, pp. 114–122 (2017)
Google Scholar
Ragab, M., Awaysheh, F.M., Tommasini, R.: Bench-ranking: a first step towards prescriptive performance analyses for big data frameworks. In: IEEE Conference on Big Data (2021)
Google Scholar
Ragab, M., Tommasini, R., et al.: An in-depth investigation of large-scale RDF relational schema optimizations using Spark-SQL. In: DOLAP@EDBT/ICDT (2021)
Google Scholar
Ragab, M., Tommasini, R., Eyvazov, S., Sakr, S.: Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets. In: SBD (2020)
Google Scholar
Ragab, M., Tommasini, R., Sakr, S.: Benchmarking Spark-SQL under alliterative RDF relational storage backends. In: QuWeDa@ ISWC, pp. 67–82 (2019)
Google Scholar
Ragab, M., Tommasini, R., Sakr, S.: Comparing schema advancements for distributed RDF querying using SparkSQL. In: ISWC 2020 Demos and Industry Tracks (2020)
Google Scholar
Sakr, S., Bonifati, A., Voigt, H., et al.: The future is big graphs: a community view on graph processing systems. CACM 64(9), 62–71 (2021)
Article Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11
Chapter Google Scholar
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. VLDB 9(10), 804–815 (2016)
Google Scholar
Tommasini, R., Ragab, M., Falcetta, A., Valle, E.D., Sakr, S.: A first step towards a streaming linked data life-cycle. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 634–650. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_39
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Tartu University, Tartu, Estonia
Mohamed Ragab

Authors

Mohamed Ragab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Ragab .

Editor information

Editors and Affiliations

Politecnico di Torino, Turin, Italy
Silvia Chiusano
Politecnico di Torino, Turin, Italy
Tania Cerquitelli
Poznań University of Technology, Poznań, Poland
Robert Wrembel
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Genoa, Genoa, Italy
Barbara Catania
CNRS, Villeurbanne Cedex, France
Genoveva Vargas-Solar
University of Calabria, Rende, Italy
Ester Zumpano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ragab, M. (2022). Towards Prescriptive Analyses of Querying Large Knowledge Graphs. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_59

Download citation

DOI: https://doi.org/10.1007/978-3-031-15743-1_59
Published: 29 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15742-4
Online ISBN: 978-3-031-15743-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Prescriptive Analyses of Querying Large Knowledge Graphs