CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore

Sanchez-Ayte, Adam; Jouanot, Fabrice; Rousset, Marie-Christine

doi:10.1007/978-3-031-06981-9_26

Adam Sanchez-Ayte¹⁵,
Fabrice Jouanot¹⁵ &
Marie-Christine Rousset¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13261))

Included in the following conference series:

European Semantic Web Conference

1584 Accesses

Abstract

Despite their potential, CONSTRUCT queries have gained little attraction so far among data practitioners, vendors and researchers. In this paper, we first exhibit performance bottlenecks of existing triplestores for supporting CONSTRUCT queries over large knowledge graphs. Then, we describe a novel Spark-based architecture for big triplestores, called TESS, that we have designed and implemented to overcome the above limitations by using parallel computing. TESS ensures ACID properties that are required for a sound and complete implementation of CONSTRUCT-based forward-chaining rules reasoning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Querying Large Knowledge Graphs over Triple Pattern Fragments: An Empirical Study

Hashing the Hypertrie: Space- and Time-Efficient Indexing for SPARQL in Tensors

Parallel Materialization of Datalog Programs with Spark for Scalable Reasoning

Notes

1.
Extraction, Transformation, Load.
2.
https://spinrdf.org/spin.html.
3.
https://www.w3.org/TR/shacl-af/#rules.
4.
https://project-hobbit.eu/challenges/mighty-storage-challenge2018/.
5.
https://community.openlinksw.com/t/sparql-query-limiting-results-to-100000-triples/2131.

References

Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Rdf query answering using apache spark: review and assessment. In: 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pp. 54–59 (2018). https://doi.org/10.1109/ICDEW.2018.00016
Bizer, C., Schultz, A.: The berlin sparql benchmark. Int. J. Semantic Web Inf. Syst. 5, 1–24 (2009). https://doi.org/10.4018/jswis.2009040101
Article Google Scholar
Bonifati, A., Martens, W., Timm, T.: An analytical study of large SPARQL query logs. VLDB J. 29(2–3), 655–679 (2020) https://doi.org/10.1007/s00778-019-00558-9, https://hal.archives-ouvertes.fr/hal-03118422
Chawla, T., Singh, G., Pilli, E.S., Govil, M.: Storage, partitioning, indexing and retrieval in big rdf frameworks: a survey. Comput. Sci. Revi. 38, 100309 (2020). https://doi.org/10.1016/j.cosrev.2020.100309, https://www.sciencedirect.com/science/article/pii/S1574013720304093
Chen, Y., Kokar, M., Moskal, J.: Sparql query generator (SQG). J. Data Semant. 10, 1–17 (2021). https://doi.org/10.1007/s13740-021-00133-y
Article Google Scholar
Demartini, G., Enchev, I., Wylot, M., Gapany, J., Cudre-Mauroux, P.: Bowlognabench-benchmarking RDF analytics, vol. 116 (2012). https://doi.org/10.1007/978-3-642-34044-4_5
(GSK), G.: Project bellman. https://gsk-aiops.github.io/bellman/, Accessed 27 Nov 2021
Hassanpour, S., O’Connor, M.J., Das, A.K.: Visualizing logical dependencies in SWRL rule bases. In: Dean, M., Hall, J., Rotolo, A., Tabet, S. (eds.) RuleML 2010. LNCS, vol. 6403, pp. 259–272. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16289-3_22
Chapter Google Scholar
Pointer, I.: Infoword. What is apache spark? the big data platform that crushed hadoop. https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html, Accessed 05 Dec 2021
Laskowski, J.: The internals of delta lake. https://books.japila.pl/delta-lake-internals/, Accessed 05 Dec 2021
Noy, N.F., Musen, M.A.: Specifying ontology views by traversal. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 713–725. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_49
Chapter Google Scholar
OpenLink Software: Virtuoso universal server. https://virtuoso.openlinksw.com/, Accessed 05 Dec 2021
Palombi, O., Jouanot, F., Nziengam, N., Omidvar-Tehrani, B., Rousset, M.C., Sanchez, A.: Ontosides: ontology-based student progress monitoring on the national evaluation system of French medical schools. Artif. Intell. Med. 96, 59–67 (2019)
Article Google Scholar
Ragab, M., Sakr, S., Tommasini, R.: Benchmarking spark-SQL under alliterative rdf relational storage backends (2019)
Google Scholar
Saleem, M., Mehmood, Q., Ngonga Ngomo, A.-C.: FEASIBLE: a feature-based SPARQL benchmark generation framework. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 52–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25007-6_4
Chapter Google Scholar
Saleem, M., Szárnyas, G., Conrads, F., Bukhari, S.A.C., Mehmood, Q., Ngonga Ngomo, A.C.: How representative is a sparql benchmark? an analysis of rdf triplestore benchmarks. In: The World Wide Web Conference, WWW 2019, pp. 1623–1633. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3308558.3313556
Stadler, C., Sejdiu, G., Graux, D., 0001, J.L.: Querying large-scale RDF datasets using the sansa framework. In: Suárez-Figueroa, M.C., Cheng, G., Gentile, A.L., Guéret, C., Keet, C.M., Bernstein, A. (eds.) Proceedings of the ISWC 2019 Satellite Tracks (Posters & Demonstrations, Industry, and Outrageous Ideas) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 26–30 October 2019. CEUR Workshop Proceedings, vol. 2456, pp. 285–288. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2456/paper74.pdf
The Apache Software Foundation: Apache spark. https://spark.apache.org/, Accessed 05 Dec 2021
The Apache Software Foundation: Apache parquet. https://parquet.apache.org/, Accessed 05 Dec 2021
The Apache Software Foundation: Hadoop cluster setup. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html, Accessed 05 Dec 2021
The Linux Foundation: Delta lake documentation. https://delta.io/, Accessed 05 Dec 2021
Zaharia, M., Ghodsi, A., Xin, R., Armbrust, M.: Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics. In: 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, Online Proceedings, 11–15 January 2021 (2021). www.cidrdb.org, http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

Download references

Acknowledgements

This work has been supported by the the French National Research Agency with projects LabEx PERSYVAL Lab (11-LABX-0025-01), DUNE SIDES 3.0 (ANR-16-DUNE -0002-02), P3IA MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and CE23 CQFD (ANR-18-CE23-0003).

Author information

Authors and Affiliations

Université Grenoble Alpes, Saint-Martin-d’Hères, France
Adam Sanchez-Ayte, Fabrice Jouanot & Marie-Christine Rousset

Authors

Adam Sanchez-Ayte
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Jouanot
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Christine Rousset
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Sanchez-Ayte .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, Noord-Holland, The Netherlands
Paul Groth
Universidad Simón Bolívar, Leibniz Information Centre for Science and Technology, Hannover, Niedersachsen, Germany
Maria-Esther Vidal
Institut Polytechnique de Paris "DIG", Télécom ParisTech, Palaiseau, France
Fabian Suchanek
University of Southern California, Marina del Rey, CA, USA
Pedro Szekley
IBM Research - Thomas J. Watson Research, Yorktown Heights, NY, USA
Pavan Kapanipathi
LaSIGE, Fac de Ciencias,Edif C6, Pis0 3, Universidade de Lisboa, Lisbon, Portugal
Catia Pesquita
University of Nantes, Nantes, France
Hala Skaf-Molli
Aalto University, Espoo, Finland
Minna Tamper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sanchez-Ayte, A., Jouanot, F., Rousset, MC. (2022). CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore. In: Groth, P., et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-06981-9_26
Published: 31 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06980-2
Online ISBN: 978-3-031-06981-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore