Abstract
Despite their potential, CONSTRUCT queries have gained little attraction so far among data practitioners, vendors and researchers. In this paper, we first exhibit performance bottlenecks of existing triplestores for supporting CONSTRUCT queries over large knowledge graphs. Then, we describe a novel Spark-based architecture for big triplestores, called TESS, that we have designed and implemented to overcome the above limitations by using parallel computing. TESS ensures ACID properties that are required for a sound and complete implementation of CONSTRUCT-based forward-chaining rules reasoning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Rdf query answering using apache spark: review and assessment. In: 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pp. 54–59 (2018). https://doi.org/10.1109/ICDEW.2018.00016
Bizer, C., Schultz, A.: The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst. 5, 1–24 (2009). https://doi.org/10.4018/jswis.2009040101
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020) https://doi.org/10.1007/s00778-019-00558-9, https://hal.archives-ouvertes.fr/hal-03118422
Chawla, T., Singh, G., Pilli, E.S., Govil, M.: Storage, partitioning, indexing and retrieval in big rdf frameworks: a survey. Comput. Sci. Revi. 38, 100309 (2020). https://doi.org/10.1016/j.cosrev.2020.100309, https://www.sciencedirect.com/science/article/pii/S1574013720304093
Chen, Y., Kokar, M., Moskal, J.: Sparql query generator (SQG). J. Data Semant. 10, 1–17 (2021). https://doi.org/10.1007/s13740-021-00133-y
Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudre-Mauroux, P.: Bowlognabench-benchmarking RDF analytics, vol. 116 (2012). https://doi.org/10.1007/978-3-642-34044-4_5
(GSK), G.: Project bellman. https://gsk-aiops.github.io/bellman/, Accessed 27 Nov 2021
Hassanpour, S., O’Connor, M.J., Das, A.K.: Visualizing logical dependencies in SWRL rule bases. In: Dean, M., Hall, J., Rotolo, A., Tabet, S. (eds.) RuleML 2010. LNCS, vol. 6403, pp. 259–272. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16289-3_22
Pointer, I.: Infoword. What is apache spark? the big data platform that crushed hadoop. https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html, Accessed 05 Dec 2021
Laskowski, J.: The internals of delta lake. https://books.japila.pl/delta-lake-internals/, Accessed 05 Dec 2021
Noy, N.F., Musen, M.A.: Specifying ontology views by traversal. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 713–725. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_49
OpenLink Software: Virtuoso universal server. https://virtuoso.openlinksw.com/, Accessed 05 Dec 2021
Palombi, O., Jouanot, F., Nziengam, N., Omidvar-Tehrani, B., Rousset, M.C., Sanchez, A.: Ontosides: ontology-based student progress monitoring on the national evaluation system of French medical schools. Artif. Intell. Med. 96, 59–67 (2019)
Ragab, M., Sakr, S., Tommasini, R.: Benchmarking spark-SQL under alliterative rdf relational storage backends (2019)
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S.A.C., Mehmood, Q., Ngonga Ngomo, A.C.: How representative is a sparql benchmark? an analysis of rdf triplestore benchmarks. In: The World Wide Web Conference, WWW 2019, pp. 1623–1633. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3308558.3313556
Stadler, C., Sejdiu, G., Graux, D., 0001, J.L.: Querying large-scale RDF datasets using the sansa framework. In: Suárez-Figueroa, M.C., Cheng, G., Gentile, A.L., Guéret, C., Keet, C.M., Bernstein, A. (eds.) Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 26–30 October 2019. CEUR Workshop Proceedings, vol. 2456, pp. 285–288. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2456/paper74.pdf
The Apache Software Foundation: Apache spark. https://spark.apache.org/, Accessed 05 Dec 2021
The Apache Software Foundation: Apache parquet. https://parquet.apache.org/, Accessed 05 Dec 2021
The Apache Software Foundation: Hadoop cluster setup. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html, Accessed 05 Dec 2021
The Linux Foundation: Delta lake documentation. https://delta.io/, Accessed 05 Dec 2021
Zaharia, M., Ghodsi, A., Xin, R., Armbrust, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, Online Proceedings, 11–15 January 2021 (2021). www.cidrdb.org, http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
Acknowledgements
This work has been supported by the the French National Research Agency with projects LabEx PERSYVAL Lab (11-LABX-0025-01), DUNE SIDES 3.0 (ANR-16-DUNE -0002-02), P3IA MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and CE23 CQFD (ANR-18-CE23-0003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sanchez-Ayte, A., Jouanot, F., Rousset, MC. (2022). CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore. In: Groth, P., et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-06981-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06980-2
Online ISBN: 978-3-031-06981-9
eBook Packages: Computer ScienceComputer Science (R0)