Skip to main content

CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13261))

Included in the following conference series:

  • 1584 Accesses

Abstract

Despite their potential, CONSTRUCT queries have gained little attraction so far among data practitioners, vendors and researchers. In this paper, we first exhibit performance bottlenecks of existing triplestores for supporting CONSTRUCT queries over large knowledge graphs. Then, we describe a novel Spark-based architecture for big triplestores, called TESS, that we have designed and implemented to overcome the above limitations by using parallel computing. TESS ensures ACID properties that are required for a sound and complete implementation of CONSTRUCT-based forward-chaining rules reasoning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Extraction, Transformation, Load.

  2. 2.

    https://spinrdf.org/spin.html.

  3. 3.

    https://www.w3.org/TR/shacl-af/#rules.

  4. 4.

    https://project-hobbit.eu/challenges/mighty-storage-challenge2018/.

  5. 5.

    https://community.openlinksw.com/t/sparql-query-limiting-results-to-100000-triples/2131.

References

  1. Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Rdf query answering using apache spark: review and assessment. In: 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pp. 54–59 (2018). https://doi.org/10.1109/ICDEW.2018.00016

  2. Bizer, C., Schultz, A.: The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst. 5, 1–24 (2009). https://doi.org/10.4018/jswis.2009040101

    Article  Google Scholar 

  3. Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020) https://doi.org/10.1007/s00778-019-00558-9, https://hal.archives-ouvertes.fr/hal-03118422

  4. Chawla, T., Singh, G., Pilli, E.S., Govil, M.: Storage, partitioning, indexing and retrieval in big rdf frameworks: a survey. Comput. Sci. Revi. 38, 100309 (2020). https://doi.org/10.1016/j.cosrev.2020.100309, https://www.sciencedirect.com/science/article/pii/S1574013720304093

  5. Chen, Y., Kokar, M., Moskal, J.: Sparql query generator (SQG). J. Data Semant. 10, 1–17 (2021). https://doi.org/10.1007/s13740-021-00133-y

    Article  Google Scholar 

  6. Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudre-Mauroux, P.: Bowlognabench-benchmarking RDF analytics, vol. 116 (2012). https://doi.org/10.1007/978-3-642-34044-4_5

  7. (GSK), G.: Project bellman. https://gsk-aiops.github.io/bellman/, Accessed 27 Nov 2021

  8. Hassanpour, S., O’Connor, M.J., Das, A.K.: Visualizing logical dependencies in SWRL rule bases. In: Dean, M., Hall, J., Rotolo, A., Tabet, S. (eds.) RuleML 2010. LNCS, vol. 6403, pp. 259–272. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16289-3_22

    Chapter  Google Scholar 

  9. Pointer, I.: Infoword. What is apache spark? the big data platform that crushed hadoop. https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html, Accessed 05 Dec 2021

  10. Laskowski, J.: The internals of delta lake. https://books.japila.pl/delta-lake-internals/, Accessed 05 Dec 2021

  11. Noy, N.F., Musen, M.A.: Specifying ontology views by traversal. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 713–725. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_49

    Chapter  Google Scholar 

  12. OpenLink Software: Virtuoso universal server. https://virtuoso.openlinksw.com/, Accessed 05 Dec 2021

  13. Palombi, O., Jouanot, F., Nziengam, N., Omidvar-Tehrani, B., Rousset, M.C., Sanchez, A.: Ontosides: ontology-based student progress monitoring on the national evaluation system of French medical schools. Artif. Intell. Med. 96, 59–67 (2019)

    Article  Google Scholar 

  14. Ragab, M., Sakr, S., Tommasini, R.: Benchmarking spark-SQL under alliterative rdf relational storage backends (2019)

    Google Scholar 

  15. Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4

    Chapter  Google Scholar 

  16. Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S.A.C., Mehmood, Q., Ngonga Ngomo, A.C.: How representative is a sparql benchmark? an analysis of rdf triplestore benchmarks. In: The World Wide Web Conference, WWW 2019, pp. 1623–1633. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3308558.3313556

  17. Stadler, C., Sejdiu, G., Graux, D., 0001, J.L.: Querying large-scale RDF datasets using the sansa framework. In: Suárez-Figueroa, M.C., Cheng, G., Gentile, A.L., Guéret, C., Keet, C.M., Bernstein, A. (eds.) Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 26–30 October 2019. CEUR Workshop Proceedings, vol. 2456, pp. 285–288. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2456/paper74.pdf

  18. The Apache Software Foundation: Apache spark. https://spark.apache.org/, Accessed 05 Dec 2021

  19. The Apache Software Foundation: Apache parquet. https://parquet.apache.org/, Accessed 05 Dec 2021

  20. The Apache Software Foundation: Hadoop cluster setup. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html, Accessed 05 Dec 2021

  21. The Linux Foundation: Delta lake documentation. https://delta.io/, Accessed 05 Dec 2021

  22. Zaharia, M., Ghodsi, A., Xin, R., Armbrust, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, Online Proceedings, 11–15 January 2021 (2021). www.cidrdb.org, http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

Download references

Acknowledgements

This work has been supported by the the French National Research Agency with projects LabEx PERSYVAL Lab (11-LABX-0025-01), DUNE SIDES 3.0 (ANR-16-DUNE -0002-02), P3IA MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and CE23 CQFD (ANR-18-CE23-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Sanchez-Ayte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sanchez-Ayte, A., Jouanot, F., Rousset, MC. (2022). CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore. In: Groth, P., et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06981-9_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06980-2

  • Online ISBN: 978-3-031-06981-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics