skip to main content
10.1145/3626246.3654695acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Querying Graph Databases at Scale

Published: 09 June 2024 Publication History

Abstract

The tutorial provides an in-depth overview of recent advances in algorithms and data structures for processing graph database queries. The focus will be on scalable algorithms that have been demonstrated to work over real world knowledge graphs. We will also present detailed performance comparisons of classical and recent algorithms. The tutorial will be divided into four sections. The first section will motivate the use of graph databases for querying knowledge graphs, and will introduce the attendees to graph data models and the query language landscape. The second section will discuss how to efficiently evaluate graph patterns, introducing the worst-case optimal join techniques and comparing them to classical join algorithms. The third section will discuss techniques for efficiently evaluating path queries and for constructing compact representations of potentially exponential sets of paths. In the final section we will introduce recent advances in compressed data structures that ease the high memory requirements of worst-case optimal join algorithms and also provide a template for evaluating path queries in a highly optimised manner.

References

[1]
Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, and Domagoj Vrgoc. Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv., 50(5), 2017.
[2]
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Dusan Zivkovic. Pg-schema: Schemas for property graphs. Proc. ACM Manag. Data, 1(2):198:1--198:25, 2023.
[3]
Renzo Angles, Aidan Hogan, Ora Lassila, Carlos Rojas, Daniel Schwabe, Pedro A. Szekely, and Domagoj Vrgoc. Multilayer graphs: a unified data model for graph databases. In Vasiliki Kalavri and Semih Salihoglu, editors, GRADES-NDA '22: Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Philadelphia, Pennsylvania, USA, 12 June 2022, pages 11:1--11:6. ACM, 2022.
[4]
Diego Arroyuelo, Aidan Hogan, Gonzalo Navarro, Juan L. Reutter, Javiel Rojas-Ledesma, and Adrián Soto. Worst-case optimal graph joins in almost no space. In Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, pages 102--114. ACM, 2021.
[5]
Pablo Barceló Baeza. Querying graph databases. In PODS 2013, pages 175--188, 2013.
[6]
Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 233--244, 2009.
[7]
World Wide Web Consortium et al. Rdf 1.1 concepts and abstract syntax. 2014.
[8]
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Fred Zemke. Graph pattern matching in GQL and SQL/PGQ. In SIGMOD '22, 2022.
[9]
Benjamín Farias, Carlos Rojas, and Domagoj Vrgoc. Evaluating regular path queries in GQL and SQL/PGQ: how far can the classical algorithms take us? CoRR, abs/2306.02194, 2023.
[10]
Nadime Francis, Amélie Gheerbrant, Paolo Guagliardo, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Liat Peterfreund, Alexandra Rogova, and Domagoj Vrgoc. A researcher's digest of GQL (invited talk). In ICDT 2023, 2023.
[11]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD 2018, 2018.
[12]
Steve Harris, Andy Seaborne, and Eric Prud'hommeaux. SPARQL 1.1 Query Language. W3C Recommendation, 2013.
[13]
Olaf Hartig. Foundations of RDF? and SPARQL? (An Alternative Approach to Statement-Level Metadata in RDF). In Juan L. Reutter and Divesh Srivastava, editors, Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7--9, 2017, volume 1912 of CEUR Workshop Proceedings. CEUR-WS.org, 2017.
[14]
Olaf Hartig, Andy Seaborne, Ruben Taelman, Gregory Williams, and Thomas Pellissier Tanon. SPARQL 1.2 Query Language. W3C Working Draft, 2023.
[15]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. Knowledge Graphs. ACM Comput. Surv., 54(4):71:1--71:37, 2022.
[16]
Aidan Hogan, Cristian Riveros, Carlos Rojas, and Adrián Soto. A worst-case optimal join algorithm for SPARQL. In Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtech Svátek, Isabel F. Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon, editors, The SemanticWeb - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26--30, 2019, Proceedings, Part I, volume 11778 of Lecture Notes in Computer Science, pages 258--275. Springer, 2019.
[17]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2):167--195, 2015.
[18]
Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[19]
Jia Li, Wenyue Zhao, Nikos Ntarmos, Yang Cao, and Peter Buneman. Mitra: A framework for multi-instance graph traversal. Proc. VLDB Endow., 16(10):2551-- 2564, 2023.
[20]
Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, and Adrian Bielefeldt. Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph. In ISWC 2018, 2018.
[21]
Wim Martens, Matthias Niewerth, Tina Popp, Carlos Rojas, Stijn Vansummeren, and Domagoj Vrgoc. Representing paths in graph database pattern matching. Proc. VLDB Endow., 16(7):1790--1803, 2023.
[22]
Alberto O. Mendelzon and Peter T. Wood. Finding regular simple paths in graph databases. In VLDB 1989, pages 185--193, 1989.
[23]
Amine Mhedhbi and Semih Salihoglu. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow., 12(11):1692--1704, 2019.
[24]
Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Jan Van den Bussche and Marcelo Arenas, editors, Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10--15, 2018, pages 111--124. ACM, 2018.
[25]
Dung T. Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Join processing for graph patterns: An old dog with new tricks. CoRR, abs/1503.04169, 2015.
[26]
Pierre-Yves Vandenbussche, Jürgen Umbrich, Luca Matteis, Aidan Hogan, and Carlos Buil Aranda. SPARQLES: Monitoring public SPARQL endpoints. Semantic Web, 8(6):1049--1065, 2017.
[27]
Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24--28, 2014, pages 96--106, 2014.
[28]
Denny Vrandecic and Markus Krötzsch. Wikidata: a free collaborative knowledgebase. Commun. ACM, 57(10):78--85, 2014.
[29]
Domagoj Vrgoc, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. Millenniumdb: A persistent, open-source, graph database. CoRR, abs/2111.01540, 2021.
[30]
Chengshuo Xu, Keval Vora, and Rajiv Gupta. Pnp: Pruning and prediction for point-to-point iterative graph analytics. In Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck, editors, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13--17, 2019, pages 587--600. ACM, 2019.

Cited By

View all
  • (2024)FPP-Hunter: Expert-Guided Discovery Of Functional Path Patterns2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825729(4023-4032)Online publication date: 15-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
June 2024
694 pages
ISBN:9798400704222
DOI:10.1145/3626246
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. basic graph patterns
  2. graph databases
  3. graph query languages
  4. knowledge graphs
  5. path queries
  6. worst-case optimal joins

Qualifiers

  • Tutorial

Funding Sources

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)20
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FPP-Hunter: Expert-Guided Discovery Of Functional Path Patterns2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825729(4023-4032)Online publication date: 15-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media