Skip to main content

Towards Prescriptive Analyses of Querying Large Knowledge Graphs

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1652))

Included in the following conference series:

Abstract

Leveraging relational Big Data (BD) processing frameworks to process large-scale (RDF) graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. In this paper, we discuss how our work fills this timely research gap. Particularly, we investigate how to enable prescriptive analytics via ranking functions (called “BenchRank”). We present a research plan that builds on the state-of-the-art benchmarking efforts in the area of querying large RDF graphs. Finally, we present our research results of the proposed plan.

M. Ragab—Supervised by Riccardo Tommasini, LIRIS Lab, INSA Lyon, France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The relational schema impacts query joins, partitioning techniques impact data shuffling, whilst storage formats impact physical execution plans.

  2. 2.

    We omit details about schema options (ST, VP, PT) and partitioning options (HP, SBP, PBP) due to space limits, however, still can be found in the project’s GitHub page: https://datasystemsgrouput.github.io/SPARKSQLRDFBenchmarking/.

  3. 3.

    Each configuration C has a rank according to its running time of the queries.

  4. 4.

    Kendall’s index is a common measure to compare the ordering of ranking functions.

  5. 5.

    Conformance and Coherence results [7] are omitted due to space limits.

References

  1. Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. VLDB 10(13), 2049–2060 (2017)

    Google Scholar 

  2. Akhter, A., Ngomo Ngonga, A.-C., Saleem, M.: An empirical evaluation of RDF graph partitioning techniques. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds.) EKAW 2018. LNCS (LNAI), vol. 11313, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03667-6_1

    Chapter  Google Scholar 

  3. Arrascue Ayala, V.A..: Relational schemata for distributed SPARQL query processing. In: SBD (2019)

    Google Scholar 

  4. Deb, K., Pratap, A., Agarwal, S.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  5. Ivanov, T., Pergolesi, M.: The impact of columnar file formats on SQL-on-hadoop engine performance: a study on ORC and parquet. Concurr. Comput. Pract. Exp. 32(5), e5523 (2019)

    Google Scholar 

  6. Moaawad, M.R., Mokhtar, H.M.O., Al Feel, H.T.: On-the-fly academic linked data integration. In: Proceedings of the International Conference on Compute and Data Analysis, pp. 114–122 (2017)

    Google Scholar 

  7. Ragab, M., Awaysheh, F.M., Tommasini, R.: Bench-ranking: a first step towards prescriptive performance analyses for big data frameworks. In: IEEE Conference on Big Data (2021)

    Google Scholar 

  8. Ragab, M., Tommasini, R., et al.: An in-depth investigation of large-scale RDF relational schema optimizations using Spark-SQL. In: DOLAP@EDBT/ICDT (2021)

    Google Scholar 

  9. Ragab, M., Tommasini, R., Eyvazov, S., Sakr, S.: Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets. In: SBD (2020)

    Google Scholar 

  10. Ragab, M., Tommasini, R., Sakr, S.: Benchmarking Spark-SQL under alliterative RDF relational storage backends. In: QuWeDa@ ISWC, pp. 67–82 (2019)

    Google Scholar 

  11. Ragab, M., Tommasini, R., Sakr, S.: Comparing schema advancements for distributed RDF querying using SparkSQL. In: ISWC 2020 Demos and Industry Tracks (2020)

    Google Scholar 

  12. Sakr, S., Bonifati, A., Voigt, H., et al.: The future is big graphs: a community view on graph processing systems. CACM 64(9), 62–71 (2021)

    Article  Google Scholar 

  13. Schätzle, A., Przyjaciel-Zablocki, M., Neu, A., Lausen, G.: Sempala: interactive SPARQL query processing on hadoop. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 164–179. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_11

    Chapter  Google Scholar 

  14. Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. VLDB 9(10), 804–815 (2016)

    Google Scholar 

  15. Tommasini, R., Ragab, M., Falcetta, A., Valle, E.D., Sakr, S.: A first step towards a streaming linked data life-cycle. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 634–650. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_39

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Ragab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ragab, M. (2022). Towards Prescriptive Analyses of Querying Large Knowledge Graphs. In: Chiusano, S., et al. New Trends in Database and Information Systems. ADBIS 2022. Communications in Computer and Information Science, vol 1652. Springer, Cham. https://doi.org/10.1007/978-3-031-15743-1_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15743-1_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15742-4

  • Online ISBN: 978-3-031-15743-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics