ABSTRACT
This paper proposes a scheme for querying big graphs with a single machine. The scheme iteratively contracts regular structures into supernodes and builds a hierarchy of contracted graphs, until the one at the top fits into the memory. For each query class Q in use, supernodes carry synopses SQ such that queries of Q are answered by using SQ if possible, and otherwise by drilling down to the next level with decontraction of a bounded size. Moreover, we show how to adapt a variety of existing sequential (single-machine) algorithms to the hierarchy by reusing their logic and data structures. We also provide a bounded incremental algorithm to maintain the contracted graphs in response to updates, such that its cost is determined by the sizes of changes to the input and output only. Using real-life and synthetic graphs, we experimentally verify that with a single machine, the hierarchy is able to compute exact query answers when memory is as small as 7.6% of graphs, speeds up various applications by 9.8 times on average, and is even 120.1 times faster than some parallel graph systems that use 6 machines.
Supplemental Material
- 2006. Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml.Google Scholar
- 2006. UKWeb. http://law.di.unimi.it/webdata/uk-union-2006-06--2007-05/.Google Scholar
- 2012. Friendster. https://snap.stanford.edu/data/com-Friendster.html.Google Scholar
- 2020 a. GRAPE. https://github.com/alibaba/libgrape-lite.git.Google Scholar
- 2020 b. GraphScope. https://graphscope.io/.Google Scholar
- Yousuf Ahmad, Omar Khattab, Arsal Malik, Ahmad Musleh, Mohammad Hammoud, Mucahid Kutlu, Mostafa Shehata, and Tamer Elsayed. 2018. LA3: A scalable link-and locality-aware linear algebra-based graph analytics system. PVLDB, Vol. 11, 8 (2018), 920--933.Google ScholarDigital Library
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD. 1421--1432.Google Scholar
- Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. Objectrank: Authority-based keyword search in databases. In VLDB, Vol. 4. 564--575.Google Scholar
- Pablo Barceló Baeza. 2013. Querying graph databases. In PODS. 175--188.Google Scholar
- Chris Barrett, Keith Bisset, Martin Holzer, Goran Konjevod, Madhav Marathe, and Dorothea Wagner. 2008. Engineering label-constrained shortest-path algorithms. In AAIM. Springer, 27--37.Google Scholar
- Chris Barrett, Riko Jacob, and Madhav Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput., Vol. 30, 3 (2000), 809--837.Google ScholarDigital Library
- Pavel Berkhin. 2005. A survey on PageRank computing. Internet mathematics, Vol. 2, 1 (2005), 73--120.Google Scholar
- Nina Berry, Teresa Ko, Tim Moy, Julienne Smrcka, Jessica Turnley, and Ben Wu. 2004. Emergent clique formation in terrorist recruitment. In AAAI Workshop on Agent Organizations: Theory and Practice.Google Scholar
- Maciej Besta and Torsten Hoefler. 2018. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations. CoRR, Vol. abs/1806.01799 (2018).Google Scholar
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression techniques. In WWW. 595--602.Google Scholar
- Béla Bollobás. 2013. Modern graph theory. Vol. 184. Springer Science & Business Media.Google Scholar
- Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarDigital Library
- Yang Cao and Wenfei Fan. 2016. An Effective Syntax for Bounded Relational Queries. In SIGMOD.Google Scholar
- Yang Cao, Wenfei Fan, and Ruizhe Huang. 2015. Making Pattern Queries Bounded in Big Graphs. In ICDE.Google Scholar
- Yang Cao, Wenfei Fan, Yanghao Wang, and Ke Yi. 2020. Querying Shared Data with Security Heterogeneity. In SIGMOD. 575--585.Google Scholar
- Yang Cao, Wenfei Fan, Yanghao Wang, Tengfei Yuan, Yanchao Li, and Laura Yu Chen. 2017. BEAS: Bounded Evaluation of SQL Queries. In SIGMOD.Google ScholarDigital Library
- Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. Nxgraph: An efficient graph processing system on a single machine. In ICDE. IEEE, 409--420.Google Scholar
- Sara Cohen. 2016. Data management for social networking. In PODS. 165--177.Google Scholar
- Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI, Vol. 26, 10 (2004), 1367--1372.Google ScholarDigital Library
- Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google Scholar
- Wenfei Fan, Floris Geerts, Yang Cao, and Ting Deng. 2015a. Querying Big Data by Accessing Small Data. In PODS.Google Scholar
- Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental graph computations: Doable and undoable. In SIGMOD.Google Scholar
- Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD. 157--168.Google Scholar
- Wenfei Fan, Yuanhao Li, Muyang Liu, and Can Lu. 2021. Making Graphs Compact by Lossless Contraction. (2021). SIGMOD.Google Scholar
- Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Distributed graph simulation: Impossibility and possibility. PVLDB, Vol. 7, 12 (2014), 1083--1094.Google ScholarDigital Library
- Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015b. Association rules with graph patterns. PVLDB, Vol. 8, 12 (2015), 1502--1513.Google ScholarDigital Library
- Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In SIGMOD.Google Scholar
- Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. 2018. Parallelizing Sequential Graph Computations. TODS, Vol. 43, 4 (2018), 18:1--18:39.Google ScholarDigital Library
- Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and André s Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD. 1433--1445.Google ScholarDigital Library
- Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness .W. H. Freeman and Company.Google ScholarDigital Library
- Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI. 17--30.Google Scholar
- Claudio Gutierrez, Carlos A Hurtado, Alberto O Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. System Sci., Vol. 77, 3 (2011), 520--541.Google ScholarDigital Library
- William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. (2017).Google Scholar
- Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo$_rm iso$: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD.Google Scholar
- Lifeng He, Yuyan Chao, Kenji Suzuki, and Kesheng Wu. 2009. Fast connected-component labeling. Pattern recognition, Vol. 42, 9 (2009), 1977--1987.Google Scholar
- Martin Szummer Tommi Jaakkola and Martin Szummer. 2002. Partially labeled classification with Markov random walks. NIPS, Vol. 14 (2002), 945--952.Google Scholar
- Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. 2008. Efficiently answering reachability queries on very large directed graphs. In SIGMOD. 595--608.Google Scholar
- U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos. 2010. Patterns on the connected components of terabyte-scale graphs. In ICDM. 875--880.Google Scholar
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google Scholar
- Ina Koch. 2001. Enumerating all connected maximal common subgraphs in two graphs. Theoretical Computer Science, Vol. 250, 1--2 (2001), 1--30.Google ScholarDigital Library
- Walter Kropatsch. 1996. Building irregular pyramids by dual-graph contraction. In Vision Image and Signal Processing.Google Scholar
- Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI. 31--46.Google Scholar
- Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of experts in social networks. In KDD.Google Scholar
- Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph structure summarization. In SDM. SIAM, 454--465.Google Scholar
- Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sö ren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.Google ScholarCross Ref
- Ulf Leser. 2005. A query language for biological networks. Bioinformatics, Vol. 21, suppl_2 (2005), ii33--ii39.Google Scholar
- Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In SIGKDD.Google Scholar
- Kingsly Leung and Christopher Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In ACSW.Google Scholar
- Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv., Vol. 51, 3 (2018), 62:1--62:34.Google ScholarDigital Library
- Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, Vol. 5, 8 (2012).Google ScholarDigital Library
- Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In EuroSys. 527--543.Google Scholar
- Antonio Maccioni and Daniel J Abadi. 2016. Scalable pattern matching over compressed graphs via dedensification. In SIGKDD. 1755--1764.Google Scholar
- Wim Martens and Tina Trautner. 2018. Evaluation and enumeration problems for regular path queries. In ICDT. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Julian McAuley and Jure Leskovec. 2012. Learning to Discover Social Circles in Ego Networks. In NIPS.Google Scholar
- Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability! But at what COST?. In HotOS.Google Scholar
- Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, Vol. 1 (2015).Google ScholarCross Ref
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google Scholar
- Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery, Vol. 24 (2012).Google ScholarDigital Library
- Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. TODS, Vol. 34, 3 (2009), 16:1--16:45.Google ScholarDigital Library
- Ganesan Ramalingam and Thomas Reps. 1996 a. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, Vol. 21, 2 (1996), 267--305.Google ScholarDigital Library
- Ganesan Ramalingam and Thomas Reps. 1996 b. On the computational complexity of dynamic graph problems. Theoretical Computer Science, Vol. 158, 1--2 (1996), 233--277.Google ScholarDigital Library
- Thomas Reps. 1998. Program analysis via graph reachability. Information and software technology, Vol. 40, 11--12 (1998), 701--726.Google Scholar
- Royi Ronen and Oded Shmueli. 2009. SoQL: A language for querying and creating data in social networks. In ICDE. IEEE, 1595--1602.Google Scholar
- George M Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2017. PuLP/XtraPuLP: Partitioning Tools for Extreme-Scale Graphs. Technical Report. Sandia National Lab (SNL-NM), Albuquerque, NM, US.Google Scholar
- Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In WSDM. 540--546.Google Scholar
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, Vol. 1, 2 (1972), 146--160.Google ScholarDigital Library
- Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From" think like a vertex" to" think like a graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarDigital Library
- Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google Scholar
- Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. Landmark indexing for evaluation of label-constrained reachability queries. In SIGMOD. 345--358.Google Scholar
- Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In GRADES.Google ScholarDigital Library
- W3C Recommendation. 2008. SPARQL Query Language for RDF. sl https://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 1307--1317.Google Scholar
- Jin Y Yen. 1971. Finding the k shortest loopless paths in a network. Management Science, Vol. 17, 11 (1971), 712--716.Google ScholarDigital Library
- Quan Yuan, Gao Cong, and Aixin Sun. 2014. Graph-based point-of-interest recommendation with geographical and temporal influences. In CIKM. 659--668.Google Scholar
- Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In OSDI. 301--316.Google ScholarDigital Library
Index Terms
- A Hierarchical Contraction Scheme for Querying Big Graphs
Recommendations
Making Graphs Compact by Lossless Contraction
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataThis paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts, stars, cliques and paths into supernodes. The supernodes carry a synopsis S_Q for each query class Q to abstract key features of the contracted parts for ...
Making graphs compact by lossless contraction
AbstractThis paper proposes a scheme to reduce big graphs to small graphs. It contracts obsolete parts and regular structures into supernodes. The supernodes carry a synopsis for each query class in use, to abstract key features of the contracted ...
L(2,1)-labeling of dually chordal graphs and strongly orderable graphs
An L(2,1)-labeling of a graph G=(V,E) is a function f:V(G)->{0,1,2,...} such that |f(u)-f(v)|>=2 whenever uv@__ __E(G) and |f(u)-f(v)|>=1 whenever u and v are at distance two apart. The span of an L(2,1)-labeling f of G, denoted as SP"2(f,G), is the ...
Comments