research-article

A Hierarchical Contraction Scheme for Querying Big Graphs

Authors:
Wenfei Fan

University of Edinburgh, Shenzhen Institute of Computing Sciences, & Beihang University, Edinburgh, United Kingdom

University of Edinburgh, Shenzhen Institute of Computing Sciences, & Beihang University, Edinburgh, United Kingdom
View Profile

,
Yuanhao Li

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Muyang Liu

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Can Lu

Shenzhen Institute of Computing Sciences, Shenzhen, China

Shenzhen Institute of Computing Sciences, Shenzhen, China
View Profile

SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataJune 2022Pages 1726–1740https://doi.org/10.1145/3514221.3517862

Published:11 June 2022Publication History

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 1726–1740

ABSTRACT

This paper proposes a scheme for querying big graphs with a single machine. The scheme iteratively contracts regular structures into supernodes and builds a hierarchy of contracted graphs, until the one at the top fits into the memory. For each query class Q in use, supernodes carry synopses SQ such that queries of Q are answered by using SQ if possible, and otherwise by drilling down to the next level with decontraction of a bounded size. Moreover, we show how to adapt a variety of existing sequential (single-machine) algorithms to the hierarchy by reusing their logic and data structures. We also provide a bounded incremental algorithm to maintain the contracted graphs in response to updates, such that its cost is determined by the sizes of changes to the input and output only. Using real-life and synthetic graphs, we experimentally verify that with a single machine, the hierarchy is able to compute exact query answers when memory is as small as 7.6% of graphs, speeds up various applications by 9.8 times on average, and is even 120.1 times faster than some parallel graph systems that use 6 machines.

Supplemental Material

SIGMOD22_moddm128.mp4

mp4

38.8 MB

Download

References

2006. Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml.Google Scholar
2006. UKWeb. http://law.di.unimi.it/webdata/uk-union-2006-06--2007-05/.Google Scholar
2012. Friendster. https://snap.stanford.edu/data/com-Friendster.html.Google Scholar
2020 a. GRAPE. https://github.com/alibaba/libgrape-lite.git.Google Scholar
2020 b. GraphScope. https://graphscope.io/.Google Scholar
Yousuf Ahmad, Omar Khattab, Arsal Malik, Ahmad Musleh, Mohammad Hammoud, Mucahid Kutlu, Mostafa Shehata, and Tamer Elsayed. 2018. LA3: A scalable link-and locality-aware linear algebra-based graph analytics system. PVLDB, Vol. 11, 8 (2018), 920--933.Google ScholarDigital Library
Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD. 1421--1432.Google Scholar
Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. Objectrank: Authority-based keyword search in databases. In VLDB, Vol. 4. 564--575.Google Scholar
Pablo Barceló Baeza. 2013. Querying graph databases. In PODS. 175--188.Google Scholar
Chris Barrett, Keith Bisset, Martin Holzer, Goran Konjevod, Madhav Marathe, and Dorothea Wagner. 2008. Engineering label-constrained shortest-path algorithms. In AAIM. Springer, 27--37.Google Scholar
Chris Barrett, Riko Jacob, and Madhav Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput., Vol. 30, 3 (2000), 809--837.Google ScholarDigital Library
Pavel Berkhin. 2005. A survey on PageRank computing. Internet mathematics, Vol. 2, 1 (2005), 73--120.Google Scholar
Nina Berry, Teresa Ko, Tim Moy, Julienne Smrcka, Jessica Turnley, and Ben Wu. 2004. Emergent clique formation in terrorist recruitment. In AAAI Workshop on Agent Organizations: Theory and Practice.Google Scholar
Maciej Besta and Torsten Hoefler. 2018. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations. CoRR, Vol. abs/1806.01799 (2018).Google Scholar
Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression techniques. In WWW. 595--602.Google Scholar
Béla Bollobás. 2013. Modern graph theory. Vol. 184. Springer Science & Business Media.Google Scholar
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarDigital Library
Yang Cao and Wenfei Fan. 2016. An Effective Syntax for Bounded Relational Queries. In SIGMOD.Google Scholar
Yang Cao, Wenfei Fan, and Ruizhe Huang. 2015. Making Pattern Queries Bounded in Big Graphs. In ICDE.Google Scholar
Yang Cao, Wenfei Fan, Yanghao Wang, and Ke Yi. 2020. Querying Shared Data with Security Heterogeneity. In SIGMOD. 575--585.Google Scholar
Yang Cao, Wenfei Fan, Yanghao Wang, Tengfei Yuan, Yanchao Li, and Laura Yu Chen. 2017. BEAS: Bounded Evaluation of SQL Queries. In SIGMOD.Google ScholarDigital Library
Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. Nxgraph: An efficient graph processing system on a single machine. In ICDE. IEEE, 409--420.Google Scholar
Sara Cohen. 2016. Data management for social networking. In PODS. 165--177.Google Scholar
Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI, Vol. 26, 10 (2004), 1367--1372.Google ScholarDigital Library
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google Scholar
Wenfei Fan, Floris Geerts, Yang Cao, and Ting Deng. 2015a. Querying Big Data by Accessing Small Data. In PODS.Google Scholar
Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental graph computations: Doable and undoable. In SIGMOD.Google Scholar
Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD. 157--168.Google Scholar
Wenfei Fan, Yuanhao Li, Muyang Liu, and Can Lu. 2021. Making Graphs Compact by Lossless Contraction. (2021). SIGMOD.Google Scholar
Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Distributed graph simulation: Impossibility and possibility. PVLDB, Vol. 7, 12 (2014), 1083--1094.Google ScholarDigital Library
Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015b. Association rules with graph patterns. PVLDB, Vol. 8, 12 (2015), 1502--1513.Google ScholarDigital Library
Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In SIGMOD.Google Scholar
Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. 2018. Parallelizing Sequential Graph Computations. TODS, Vol. 43, 4 (2018), 18:1--18:39.Google ScholarDigital Library
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and André s Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD. 1433--1445.Google ScholarDigital Library
Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness .W. H. Freeman and Company.Google ScholarDigital Library
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI. 17--30.Google Scholar
Claudio Gutierrez, Carlos A Hurtado, Alberto O Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. System Sci., Vol. 77, 3 (2011), 520--541.Google ScholarDigital Library
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. (2017).Google Scholar
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo$_rm iso$: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD.Google Scholar
Lifeng He, Yuyan Chao, Kenji Suzuki, and Kesheng Wu. 2009. Fast connected-component labeling. Pattern recognition, Vol. 42, 9 (2009), 1977--1987.Google Scholar
Martin Szummer Tommi Jaakkola and Martin Szummer. 2002. Partially labeled classification with Markov random walks. NIPS, Vol. 14 (2002), 945--952.Google Scholar
Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. 2008. Efficiently answering reachability queries on very large directed graphs. In SIGMOD. 595--608.Google Scholar
U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos. 2010. Patterns on the connected components of terabyte-scale graphs. In ICDM. 875--880.Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google Scholar
Ina Koch. 2001. Enumerating all connected maximal common subgraphs in two graphs. Theoretical Computer Science, Vol. 250, 1--2 (2001), 1--30.Google ScholarDigital Library
Walter Kropatsch. 1996. Building irregular pyramids by dual-graph contraction. In Vision Image and Signal Processing.Google Scholar
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI. 31--46.Google Scholar
Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of experts in social networks. In KDD.Google Scholar
Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph structure summarization. In SDM. SIAM, 454--465.Google Scholar
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sö ren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.Google ScholarCross Ref
Ulf Leser. 2005. A query language for biological networks. Bioinformatics, Vol. 21, suppl_2 (2005), ii33--ii39.Google Scholar
Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In SIGKDD.Google Scholar
Kingsly Leung and Christopher Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In ACSW.Google Scholar
Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv., Vol. 51, 3 (2018), 62:1--62:34.Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, Vol. 5, 8 (2012).Google ScholarDigital Library
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In EuroSys. 527--543.Google Scholar
Antonio Maccioni and Daniel J Abadi. 2016. Scalable pattern matching over compressed graphs via dedensification. In SIGKDD. 1755--1764.Google Scholar
Wim Martens and Tina Trautner. 2018. Evaluation and enumeration problems for regular path queries. In ICDT. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
Julian McAuley and Jure Leskovec. 2012. Learning to Discover Social Circles in Ego Networks. In NIPS.Google Scholar
Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability! But at what COST?. In HotOS.Google Scholar
Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, Vol. 1 (2015).Google ScholarCross Ref
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google Scholar
Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery, Vol. 24 (2012).Google ScholarDigital Library
Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. TODS, Vol. 34, 3 (2009), 16:1--16:45.Google ScholarDigital Library
Ganesan Ramalingam and Thomas Reps. 1996 a. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, Vol. 21, 2 (1996), 267--305.Google ScholarDigital Library
Ganesan Ramalingam and Thomas Reps. 1996 b. On the computational complexity of dynamic graph problems. Theoretical Computer Science, Vol. 158, 1--2 (1996), 233--277.Google ScholarDigital Library
Thomas Reps. 1998. Program analysis via graph reachability. Information and software technology, Vol. 40, 11--12 (1998), 701--726.Google Scholar
Royi Ronen and Oded Shmueli. 2009. SoQL: A language for querying and creating data in social networks. In ICDE. IEEE, 1595--1602.Google Scholar
George M Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2017. PuLP/XtraPuLP: Partitioning Tools for Extreme-Scale Graphs. Technical Report. Sandia National Lab (SNL-NM), Albuquerque, NM, US.Google Scholar
Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In WSDM. 540--546.Google Scholar
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, Vol. 1, 2 (1972), 146--160.Google ScholarDigital Library
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From" think like a vertex" to" think like a graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarDigital Library
Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google Scholar
Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. Landmark indexing for evaluation of label-constrained reachability queries. In SIGMOD. 345--358.Google Scholar
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In GRADES.Google ScholarDigital Library
W3C Recommendation. 2008. SPARQL Query Language for RDF. sl https://www.w3.org/TR/rdf-sparql-query/.Google Scholar
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 1307--1317.Google Scholar
Jin Y Yen. 1971. Finding the k shortest loopless paths in a network. Management Science, Vol. 17, 11 (1971), 712--716.Google ScholarDigital Library
Quan Yuan, Gao Cong, and Aixin Sun. 2014. Graph-based point-of-interest recommendation with geographical and temporal influences. In CIKM. 659--668.Google Scholar
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In OSDI. 301--316.Google ScholarDigital Library

Index Terms

A Hierarchical Contraction Scheme for Querying Big Graphs
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Graph-based database models
        Hierarchical data models

Recommendations

Making Graphs Compact by Lossless Contraction
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

This paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts, stars, cliques and paths into supernodes. The supernodes carry a synopsis S_Q for each query class Q to abstract key features of the contracted parts for ...
Read More
Making graphs compact by lossless contraction
Abstract
This paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts and regular structures into supernodes. The supernodes carry a synopsis $S_{Q}$ for each query class $Q$ in use, to abstract key features of the contracted ...
Read More
L(2,1)-labeling of dually chordal graphs and strongly orderable graphs

An L(2,1)-labeling of a graph G=(V,E) is a function f:V(G)->{0,1,2,...} such that |f(u)-f(v)|>=2 whenever uv@__ __E(G) and |f(u)-f(v)|>=1 whenever u and v are at distance two apart. The span of an L(2,1)-labeling f of G, denoted as SP"2(f,G), is the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph algorithms
graph contraction
graph data management
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)101
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Hierarchical Contraction Scheme for Querying Big Graphs

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Making Graphs Compact by Lossless Contraction

Making graphs compact by lossless contraction

L(2,1)-labeling of dually chordal graphs and strongly orderable graphs