Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System?

Khelil, Abdallah; Mesmoudi, Amin; Galicia, Jorge; Senouci, Mohamed

doi:10.1007/978-3-030-32065-2_18

Abdallah Khelil^10,12,
Amin Mesmoudi^10,11,
Jorge Galicia¹⁰ &
…
Mohamed Senouci¹²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11815))

Included in the following conference series:

International Conference on Model and Data Engineering

733 Accesses

Abstract

Data representation facilities offered by RDF (Resource Description Framework) have made it very popular. It is now considered as a standard in several fields (Web, Biology, ...). Indeed, by lightening the notion of schema, RDF allows a flexibility in the representation of data. This popularity has given rise to large datasets and has consequently led to the need for efficient processing of these data. In this paper, we propose a novel approach that we name QDAG (Querying Data as Graphs) allowing query processing on RDF data. We propose to combine RDF graph exploration with physical fragmentation of triples. Graph exploration makes possible to exploit the structure of the graph and its semantics while the fragmentation allows to group the nodes of the graph having the same properties. Compared to the state of the art (i.e., gStore, RDF3X, Virtuoso), our approach offers a compromise between efficient query processing and scalability. In this regard, we conducted an experimental study using real and synthetic datasets to validate our approach with respect to scalability and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.freebase.com/.
2.
http://wiki.dbpedia.org.
3.
http://lodstats.aksw.org/.
4.
https://www.lias-lab.fr/publications/32823/RapportderechercheKHELIL.pdf.
5.
\(\phi \) is used to denote an empty element.
6.
https://github.com/pkumod/gStore.
7.
https://github.com/openlink/virtuoso-opensource.
8.
https://www.lias-lab.fr/publications/32595/khelil_rdf_processing_report.pdf.

References

Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment (2007)
Google Scholar
Aït-Kaci, H., Boyer, R., Lincoln, P., Nasr, R.: Efficient implementation of lattice operations. ACM Trans. Program. Lang. Syst. (TOPLAS) 11(1), 115–146 (1989)
Article Google Scholar
Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: The Semantic Web - ISWC 2014–13th International Semantic Web Conference, Riva del Garda, Italy, 19–23 October, pp. 197–212 (2014)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of ACM SIGMOD, pp. 1247–1250. ACM (2008)
Google Scholar
Briggs, M.: DB2 NoSQL graph store what, why & overview (2012)
Google Scholar
Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, p. 35 (2005)
Google Scholar
Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87. ACM (1986)
Google Scholar
Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)
Google Scholar
Graefe, G.: Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Article Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)
Article Google Scholar
Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)
Google Scholar
McBride, B.: Jena: a semantic web toolkit. IEEE Internet Comput. 6, 55–59 (2002)
Article Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Data Engineering (ICDE), pp. 984–994 (2011)
Google Scholar
Neumann, T., Weikum, G.: RDF-3x: a risc-style engine for RDF. Proc. VLDB Endowment 1(1), 647–659 (2008)
Article Google Scholar
Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_3
Chapter Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment 1(1), 1008–1019 (2008)
Article Google Scholar
Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment 4(8), 482–493 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIAS/ISAE-ENSMA, Chasseneuil-du-Poitou, France
Abdallah Khelil, Amin Mesmoudi & Jorge Galicia
University of Poitiers, Poitiers, France
Amin Mesmoudi
University Oran 1, Oran, Algeria
Abdallah Khelil & Mohamed Senouci

Authors

Abdallah Khelil
View author publications
You can also search for this author in PubMed Google Scholar
Amin Mesmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Galicia
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Senouci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdallah Khelil .

Editor information

Editors and Affiliations

UIUC Institute, Zhejiang University, Zhejiang, China
Klaus-Dieter Schewe
INPT-ENSEEIHT/IRIT, Toulouse, France
Neeraj Kumar Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khelil, A., Mesmoudi, A., Galicia, J., Senouci, M. (2019). Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System?. In: Schewe, KD., Singh, N. (eds) Model and Data Engineering. MEDI 2019. Lecture Notes in Computer Science(), vol 11815. Springer, Cham. https://doi.org/10.1007/978-3-030-32065-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-32065-2_18
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32064-5
Online ISBN: 978-3-030-32065-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics