Skip to main content

Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System?

  • Conference paper
  • First Online:
Model and Data Engineering (MEDI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11815))

Included in the following conference series:

  • 733 Accesses

Abstract

Data representation facilities offered by RDF (Resource Description Framework) have made it very popular. It is now considered as a standard in several fields (Web, Biology, ...). Indeed, by lightening the notion of schema, RDF allows a flexibility in the representation of data. This popularity has given rise to large datasets and has consequently led to the need for efficient processing of these data. In this paper, we propose a novel approach that we name QDAG (Querying Data as Graphs) allowing query processing on RDF data. We propose to combine RDF graph exploration with physical fragmentation of triples. Graph exploration makes possible to exploit the structure of the graph and its semantics while the fragmentation allows to group the nodes of the graph having the same properties. Compared to the state of the art (i.e., gStore, RDF3X, Virtuoso), our approach offers a compromise between efficient query processing and scalability. In this regard, we conducted an experimental study using real and synthetic datasets to validate our approach with respect to scalability and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.freebase.com/.

  2. 2.

    http://wiki.dbpedia.org.

  3. 3.

    http://lodstats.aksw.org/.

  4. 4.

    https://www.lias-lab.fr/publications/32823/RapportderechercheKHELIL.pdf.

  5. 5.

    \(\phi \) is used to denote an empty element.

  6. 6.

    https://github.com/pkumod/gStore.

  7. 7.

    https://github.com/openlink/virtuoso-opensource.

  8. 8.

    https://www.lias-lab.fr/publications/32595/khelil_rdf_processing_report.pdf.

References

  1. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment (2007)

    Google Scholar 

  2. Aït-Kaci, H., Boyer, R., Lincoln, P., Nasr, R.: Efficient implementation of lattice operations. ACM Trans. Program. Lang. Syst. (TOPLAS) 11(1), 115–146 (1989)

    Article  Google Scholar 

  3. Aluç, G., Hartig, O., Özsu, M.T., Daudjee, K.: Diversified stress testing of RDF data management systems. In: The Semantic Web - ISWC 2014–13th International Semantic Web Conference, Riva del Garda, Italy, 19–23 October, pp. 197–212 (2014)

    Google Scholar 

  4. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of ACM SIGMOD, pp. 1247–1250. ACM (2008)

    Google Scholar 

  5. Briggs, M.: DB2 NoSQL graph store what, why & overview (2012)

    Google Scholar 

  6. Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory HP Laboratories Bristol. HPL-2005-170, p. 35 (2005)

    Google Scholar 

  7. Deppisch, U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–87. ACM (1986)

    Google Scholar 

  8. Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull. 35(1), 3–8 (2012)

    Google Scholar 

  9. Graefe, G.: Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)

    Article  Google Scholar 

  10. Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005)

    Article  Google Scholar 

  11. Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)

    Google Scholar 

  12. McBride, B.: Jena: a semantic web toolkit. IEEE Internet Comput. 6, 55–59 (2002)

    Article  Google Scholar 

  13. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: Data Engineering (ICDE), pp. 984–994 (2011)

    Google Scholar 

  14. Neumann, T., Weikum, G.: RDF-3x: a risc-style engine for RDF. Proc. VLDB Endowment 1(1), 647–659 (2008)

    Article  Google Scholar 

  15. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In: Cruz, I., et al. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 30–43. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078_3

    Chapter  Google Scholar 

  16. Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment 1(1), 1008–1019 (2008)

    Article  Google Scholar 

  17. Zou, L., Mo, J., Chen, L., Özsu, M.T., Zhao, D.: gStore: answering SPARQL queries via subgraph matching. Proc. VLDB Endowment 4(8), 482–493 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdallah Khelil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khelil, A., Mesmoudi, A., Galicia, J., Senouci, M. (2019). Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System?. In: Schewe, KD., Singh, N. (eds) Model and Data Engineering. MEDI 2019. Lecture Notes in Computer Science(), vol 11815. Springer, Cham. https://doi.org/10.1007/978-3-030-32065-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32065-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32064-5

  • Online ISBN: 978-3-030-32065-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics