skip to main content
10.1145/3448016.3457331acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

Efficient Graph Summarization using Weighted LSH at Billion-Scale

Published:18 June 2021Publication History

ABSTRACT

Summarizing graphs is of paramount importance due to diverse applications of large-scale graph analysis. A popular family of summarization methods is the group-based approach. The general idea consists of merging nodes of the original graph into supernodes of the summary graph, encoding original edges into superedges/correction set edges, and dropping certain superedges or correction set edges (for lossy summarization). The current state of the art has several steps in its computation that are serious bottlenecks in terms of running time and scalability. In this work, we propose algorithm LDME, a correction set based graph summarization algorithm that produces compact output representations in a fast and scalable manner. To achieve this, we introduce (1) weighted locality sensitive hashing to drastically reduce the number comparisons required to find good node merges, (2) an efficient way to compute the best quality merges that produces more compact outputs, and (3) a new sort-based encoding algorithm that is faster and more robust. More interestingly, our algorithm provides performance tuning settings to allow the option of trading compression for running time. On high compression settings, LDME achieves compression equal to or better than the state of the art with up to 53x speedup in running time. On high speed settings, LDME achieves up to two orders of magnitude speedup with only slightly lower compression.

Skip Supplemental Material Section

Supplemental Material

3448016.3457331.mp4

mp4

44.8 MB

References

  1. Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., and Smola, A. J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web (2013), pp. 37--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apostolico, A., and Drovandi, G. Graph compression by bfs. Algorithms 2, 3 (2009), 1031--1044.Google ScholarGoogle ScholarCross RefCross Ref
  3. Besta, M., Weber, S., Gianinazzi, L., Gerstenberger, R., Ivanov, A., Oltchik, Y., and Hoefler, T. Slim graph: Practical lossy graph compression for approximate graph processing, storage, and analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (New York, NY, USA, 2019), SC '19, Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boldi, P., and Vigna, S. The webgraph framework i: compression techniques. In Proceedings of the 13th International Conference on World Wide Web (2004), pp. 595--602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cook, D. J., and Holder, L. B. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1993), 231--255.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dunne, C., and Shneiderman, B. Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2013), pp. 3247--3256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fan, W., Li, J., Wang, X., and Wu, Y. Query preserving graph compression. In Proceedings of the 38th ACM SIGMOD International Conference on Management of Data (2012), pp. 157--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gou, X., Zou, L., Zhao, C., and Yang, T. Fast and accurate graph stream summarization. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE) (2019), pp. 1118--1129.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hübler, C., Kriegel, H.-P., Borgwardt, K., and Ghahramani, Z. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM) (2008), pp. 283--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Khan, A., Bhowmick, S. S., and Bonchi, F. Summarizing static and dynamic big graphs.Google ScholarGoogle Scholar
  11. Khan, K. U. Set-based approach for lossless graph summarization using locality sensitive hashing. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops (2015), pp. 255--259.Google ScholarGoogle ScholarCross RefCross Ref
  12. Khan, K. U., Nawaz, W., and Lee, Y.-K. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 1185--1207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ko, J., Kook, Y., and Shin, K. Incremental lossless graph summarization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Jul 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Vog: Summarizing and understanding large graphs. CoRR abs/1406.3411 (2014).Google ScholarGoogle Scholar
  15. Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Summarizing and understanding large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal 8, 3 (2015), 183--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kumar, K. A., and Efstathopoulos, P. Utility-driven graph summarization. Proceedings of the VLDB Endowment 12, 4 (2018), 335--347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lee, K., Jo, H., Ko, J., Lim, S., and Shin, K. Ssumm: Sparse summarization of massive graphs. In KDD: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2020), pp. 144--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. LeFevre, K., and Terzi, E. Grass: Graph structure summarization. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM) (2010), pp. 454--465.Google ScholarGoogle ScholarCross RefCross Ref
  19. Leskovec, J., and Faloutsos, C. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006), pp. 631--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, C., Baciu, G., and Wang, Y. Modulgraph: modularity-based visualization of massive graphs. In Proceedings of the SIGGRAPH Asia 2015 Visualization in High Performance Computing (2015), pp. 1--4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Li, C.-T., and Lin, S.-D. Egocentric information abstraction for heterogeneous social networks. In Proceedings of the 1st International Conference on Advances in Social Network Analysis and Mining (2009), pp. 255--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liberty, E. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), pp. 581--588.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liu, X., Tian, Y., He, Q., Lee, W.-C., and McPherson, J. Distributed graph summarization. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014), pp. 799--808.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu, Y., Dighe, A., Safavi, T., and Koutra, D. A graph summarization: A survey. CoRR abs/1612.04883 (2016).Google ScholarGoogle Scholar
  25. Maccioni, A., and Abadi, D. J. Scalable pattern matching over compressed graphs via dedensification. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1755--1764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maiya, A. S., and Berger-Wolf, T. Y. Sampling community structure. In Proceedings of the 19th International Conference on World Wide Web (2010), pp. 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Navlakha, S., Rastogi, R., and Shrivastava, N. Graph summarization with bounded error. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2008), pp. 419--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Riondato, M., Garc'ia-Soriano, D., and Bonchi, F. Graph summarization with quality guarantees. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM) (2014), pp. 947--952.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rossi, R. A., and Zhou, R. Graphzip: a clique-based sparse graph compression method. Journal of Big Data 5, 1 (2018), 10.Google ScholarGoogle ScholarCross RefCross Ref
  30. Shah, N., Koutra, D., Jin, L., Zou, T., Gallagher, B., and Faloutsos, C. On summarizing large-scale dynamic graphs. IEEE Data Eng. Bull. 40, 3 (2017), 75--88.Google ScholarGoogle Scholar
  31. Shin, K., Ghoting, A., Kim, M., and Raghavan, H. Sweg: Lossless and lossy summarization of web-scale graphs. In Proceedings of the 28th International Conference on World Wide Web (2019), pp. 1679--1690.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shrivastava, A. Simple and efficient weighted minwise hashing. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NeurIPS) (2016), pp. 1498--1506.Google ScholarGoogle Scholar
  33. Shrivastava, A., and Li, P. Improved densification of one permutation hashing. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI) (2014), pp. 732--741.Google ScholarGoogle Scholar
  34. Spielman, D. A., and Srivastava, N. Graph sparsification by effective resistances. SIAM Journal on Computing 40, 6 (2011), 1913--1926.Google ScholarGoogle Scholar
  35. Tang, N., Chen, Q., and Mitra, P. Graph stream summarization: From big bang to big crunch. In Proceedings of the 2016 International Conference on Management of Data (New York, NY, USA, 2016), SIGMOD '16, Association for Computing Machinery, p. 1481--1496.Google ScholarGoogle Scholar
  36. Tian, Y., Hankins, R. A., and Patel, J. M. Efficient aggregation for graph summarization. In Proceedings of the 34th ACM SIGMOD International Conference on Management of Data (2008), pp. 567--580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yan, N., Hasani, S., Asudeh, A., and Li, C. Generating preview tables for entity graphs. In Proceedings of the 2016 International Conference on Management of Data (2016), pp. 1797--1811.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zeqian Shen, Kwan-Liu Ma, and Eliassi-Rad, T. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Transactions on Visualization and Computer Graphics 12, 6 (2006), 1427--1439.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Graph Summarization using Weighted LSH at Billion-Scale

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
          June 2021
          2969 pages
          ISBN:9781450383431
          DOI:10.1145/3448016

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader