skip to main content
10.1145/3514221.3517862acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

A Hierarchical Contraction Scheme for Querying Big Graphs

Authors Info & Claims
Published:11 June 2022Publication History

ABSTRACT

This paper proposes a scheme for querying big graphs with a single machine. The scheme iteratively contracts regular structures into supernodes and builds a hierarchy of contracted graphs, until the one at the top fits into the memory. For each query class Q in use, supernodes carry synopses SQ such that queries of Q are answered by using SQ if possible, and otherwise by drilling down to the next level with decontraction of a bounded size. Moreover, we show how to adapt a variety of existing sequential (single-machine) algorithms to the hierarchy by reusing their logic and data structures. We also provide a bounded incremental algorithm to maintain the contracted graphs in response to updates, such that its cost is determined by the sizes of changes to the input and output only. Using real-life and synthetic graphs, we experimentally verify that with a single machine, the hierarchy is able to compute exact query answers when memory is as small as 7.6% of graphs, speeds up various applications by 9.8 times on average, and is even 120.1 times faster than some parallel graph systems that use 6 machines.

Skip Supplemental Material Section

Supplemental Material

SIGMOD22_moddm128.mp4

mp4

38.8 MB

References

  1. 2006. Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml.Google ScholarGoogle Scholar
  2. 2006. UKWeb. http://law.di.unimi.it/webdata/uk-union-2006-06--2007-05/.Google ScholarGoogle Scholar
  3. 2012. Friendster. https://snap.stanford.edu/data/com-Friendster.html.Google ScholarGoogle Scholar
  4. 2020 a. GRAPE. https://github.com/alibaba/libgrape-lite.git.Google ScholarGoogle Scholar
  5. 2020 b. GraphScope. https://graphscope.io/.Google ScholarGoogle Scholar
  6. Yousuf Ahmad, Omar Khattab, Arsal Malik, Ahmad Musleh, Mohammad Hammoud, Mucahid Kutlu, Mostafa Shehata, and Tamer Elsayed. 2018. LA3: A scalable link-and locality-aware linear algebra-based graph analytics system. PVLDB, Vol. 11, 8 (2018), 920--933.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD. 1421--1432.Google ScholarGoogle Scholar
  8. Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. Objectrank: Authority-based keyword search in databases. In VLDB, Vol. 4. 564--575.Google ScholarGoogle Scholar
  9. Pablo Barceló Baeza. 2013. Querying graph databases. In PODS. 175--188.Google ScholarGoogle Scholar
  10. Chris Barrett, Keith Bisset, Martin Holzer, Goran Konjevod, Madhav Marathe, and Dorothea Wagner. 2008. Engineering label-constrained shortest-path algorithms. In AAIM. Springer, 27--37.Google ScholarGoogle Scholar
  11. Chris Barrett, Riko Jacob, and Madhav Marathe. 2000. Formal-language-constrained path problems. SIAM J. Comput., Vol. 30, 3 (2000), 809--837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pavel Berkhin. 2005. A survey on PageRank computing. Internet mathematics, Vol. 2, 1 (2005), 73--120.Google ScholarGoogle Scholar
  13. Nina Berry, Teresa Ko, Tim Moy, Julienne Smrcka, Jessica Turnley, and Ben Wu. 2004. Emergent clique formation in terrorist recruitment. In AAAI Workshop on Agent Organizations: Theory and Practice.Google ScholarGoogle Scholar
  14. Maciej Besta and Torsten Hoefler. 2018. Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations. CoRR, Vol. abs/1806.01799 (2018).Google ScholarGoogle Scholar
  15. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression techniques. In WWW. 595--602.Google ScholarGoogle Scholar
  16. Béla Bollobás. 2013. Modern graph theory. Vol. 184. Springer Science & Business Media.Google ScholarGoogle Scholar
  17. Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yang Cao and Wenfei Fan. 2016. An Effective Syntax for Bounded Relational Queries. In SIGMOD.Google ScholarGoogle Scholar
  19. Yang Cao, Wenfei Fan, and Ruizhe Huang. 2015. Making Pattern Queries Bounded in Big Graphs. In ICDE.Google ScholarGoogle Scholar
  20. Yang Cao, Wenfei Fan, Yanghao Wang, and Ke Yi. 2020. Querying Shared Data with Security Heterogeneity. In SIGMOD. 575--585.Google ScholarGoogle Scholar
  21. Yang Cao, Wenfei Fan, Yanghao Wang, Tengfei Yuan, Yanchao Li, and Laura Yu Chen. 2017. BEAS: Bounded Evaluation of SQL Queries. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. Nxgraph: An efficient graph processing system on a single machine. In ICDE. IEEE, 409--420.Google ScholarGoogle Scholar
  23. Sara Cohen. 2016. Data management for social networking. In PODS. 165--177.Google ScholarGoogle Scholar
  24. Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. TPAMI, Vol. 26, 10 (2004), 1367--1372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.Google ScholarGoogle Scholar
  26. Wenfei Fan, Floris Geerts, Yang Cao, and Ting Deng. 2015a. Querying Big Data by Accessing Small Data. In PODS.Google ScholarGoogle Scholar
  27. Wenfei Fan, Chunming Hu, and Chao Tian. 2017. Incremental graph computations: Doable and undoable. In SIGMOD.Google ScholarGoogle Scholar
  28. Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD. 157--168.Google ScholarGoogle Scholar
  29. Wenfei Fan, Yuanhao Li, Muyang Liu, and Can Lu. 2021. Making Graphs Compact by Lossless Contraction. (2021). SIGMOD.Google ScholarGoogle Scholar
  30. Wenfei Fan, Xin Wang, and Yinghui Wu. 2014. Distributed graph simulation: Impossibility and possibility. PVLDB, Vol. 7, 12 (2014), 1083--1094.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015b. Association rules with graph patterns. PVLDB, Vol. 8, 12 (2015), 1502--1513.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In SIGMOD.Google ScholarGoogle Scholar
  33. Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. 2018. Parallelizing Sequential Graph Computations. TODS, Vol. 43, 4 (2018), 18:1--18:39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and André s Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD. 1433--1445.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Garey and David Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness .W. H. Freeman and Company.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI. 17--30.Google ScholarGoogle Scholar
  37. Claudio Gutierrez, Carlos A Hurtado, Alberto O Mendelzon, and Jorge Pérez. 2011. Foundations of semantic web databases. J. Comput. System Sci., Vol. 77, 3 (2011), 520--541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. (2017).Google ScholarGoogle Scholar
  39. Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo$_rm iso$: Towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD.Google ScholarGoogle Scholar
  40. Lifeng He, Yuyan Chao, Kenji Suzuki, and Kesheng Wu. 2009. Fast connected-component labeling. Pattern recognition, Vol. 42, 9 (2009), 1977--1987.Google ScholarGoogle Scholar
  41. Martin Szummer Tommi Jaakkola and Martin Szummer. 2002. Partially labeled classification with Markov random walks. NIPS, Vol. 14 (2002), 945--952.Google ScholarGoogle Scholar
  42. Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. 2008. Efficiently answering reachability queries on very large directed graphs. In SIGMOD. 595--608.Google ScholarGoogle Scholar
  43. U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos. 2010. Patterns on the connected components of terabyte-scale graphs. In ICDM. 875--880.Google ScholarGoogle Scholar
  44. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google ScholarGoogle Scholar
  45. Ina Koch. 2001. Enumerating all connected maximal common subgraphs in two graphs. Theoretical Computer Science, Vol. 250, 1--2 (2001), 1--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Walter Kropatsch. 1996. Building irregular pyramids by dual-graph contraction. In Vision Image and Signal Processing.Google ScholarGoogle Scholar
  47. Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In OSDI. 31--46.Google ScholarGoogle Scholar
  48. Theodoros Lappas, Kun Liu, and Evimaria Terzi. 2009. Finding a team of experts in social networks. In KDD.Google ScholarGoogle Scholar
  49. Kristen LeFevre and Evimaria Terzi. 2010. GraSS: Graph structure summarization. In SDM. SIAM, 454--465.Google ScholarGoogle Scholar
  50. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sö ren Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.Google ScholarGoogle ScholarCross RefCross Ref
  51. Ulf Leser. 2005. A query language for biological networks. Bioinformatics, Vol. 21, suppl_2 (2005), ii33--ii39.Google ScholarGoogle Scholar
  52. Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In SIGKDD.Google ScholarGoogle Scholar
  53. Kingsly Leung and Christopher Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In ACSW.Google ScholarGoogle Scholar
  54. Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. 2018. Graph Summarization Methods and Applications: A Survey. ACM Comput. Surv., Vol. 51, 3 (2018), 62:1--62:34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, Vol. 5, 8 (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In EuroSys. 527--543.Google ScholarGoogle Scholar
  57. Antonio Maccioni and Daniel J Abadi. 2016. Scalable pattern matching over compressed graphs via dedensification. In SIGKDD. 1755--1764.Google ScholarGoogle Scholar
  58. Wim Martens and Tina Trautner. 2018. Evaluation and enumeration problems for regular path queries. In ICDT. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  59. Julian McAuley and Jure Leskovec. 2012. Learning to Discover Social Circles in Ego Networks. In NIPS.Google ScholarGoogle Scholar
  60. Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability! But at what COST?. In HotOS.Google ScholarGoogle Scholar
  61. Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2015. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, Vol. 1 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  62. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  63. Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in social media. Data Mining and Knowledge Discovery, Vol. 24 (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics and complexity of SPARQL. TODS, Vol. 34, 3 (2009), 16:1--16:45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Ganesan Ramalingam and Thomas Reps. 1996 a. An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, Vol. 21, 2 (1996), 267--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ganesan Ramalingam and Thomas Reps. 1996 b. On the computational complexity of dynamic graph problems. Theoretical Computer Science, Vol. 158, 1--2 (1996), 233--277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Thomas Reps. 1998. Program analysis via graph reachability. Information and software technology, Vol. 40, 11--12 (1998), 701--726.Google ScholarGoogle Scholar
  68. Royi Ronen and Oded Shmueli. 2009. SoQL: A language for querying and creating data in social networks. In ICDE. IEEE, 1595--1602.Google ScholarGoogle Scholar
  69. George M Slota, Sivasankaran Rajamanickam, and Kamesh Madduri. 2017. PuLP/XtraPuLP: Partitioning Tools for Extreme-Scale Graphs. Technical Report. Sandia National Lab (SNL-NM), Albuquerque, NM, US.Google ScholarGoogle Scholar
  70. Stergios Stergiou, Dipen Rughwani, and Kostas Tsioutsiouliklis. 2018. Shortcutting label propagation for distributed connected components. In WSDM. 540--546.Google ScholarGoogle Scholar
  71. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, Vol. 1, 2 (1972), 146--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From" think like a vertex" to" think like a graph". PVLDB, Vol. 7, 3 (2013), 193--204.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580.Google ScholarGoogle Scholar
  74. Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. Landmark indexing for evaluation of label-constrained reachability queries. In SIGMOD. 345--358.Google ScholarGoogle Scholar
  75. Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In GRADES.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. W3C Recommendation. 2008. SPARQL Query Language for RDF. sl https://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  77. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 1307--1317.Google ScholarGoogle Scholar
  78. Jin Y Yen. 1971. Finding the k shortest loopless paths in a network. Management Science, Vol. 17, 11 (1971), 712--716.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Quan Yuan, Gao Cong, and Aixin Sun. 2014. Graph-based point-of-interest recommendation with geographical and temporal influences. In CIKM. 659--668.Google ScholarGoogle Scholar
  80. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In OSDI. 301--316.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Hierarchical Contraction Scheme for Querying Big Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
      June 2022
      2597 pages
      ISBN:9781450392495
      DOI:10.1145/3514221

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader