research-article

Efficient Graph Summarization using Weighted LSH at Billion-Scale

Authors:
Quinton Yong

University of Victoria, Victoria, BC, Canada

University of Victoria, Victoria, BC, Canada
View Profile

,
Mahdi Hajiabadi

University of Victoria, Victoria, BC, Canada

University of Victoria, Victoria, BC, Canada
View Profile

,
Venkatesh Srinivasan

University of Victoria, Victoria, BC, Canada

University of Victoria, Victoria, BC, Canada
View Profile

,
Alex Thomo

University of Victoria, Victoria, BC, Canada

University of Victoria, Victoria, BC, Canada
View Profile

SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataJune 2021Pages 2357–2365https://doi.org/10.1145/3448016.3457331

Published:18 June 2021Publication History

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2357–2365

ABSTRACT

Summarizing graphs is of paramount importance due to diverse applications of large-scale graph analysis. A popular family of summarization methods is the group-based approach. The general idea consists of merging nodes of the original graph into supernodes of the summary graph, encoding original edges into superedges/correction set edges, and dropping certain superedges or correction set edges (for lossy summarization). The current state of the art has several steps in its computation that are serious bottlenecks in terms of running time and scalability. In this work, we propose algorithm LDME, a correction set based graph summarization algorithm that produces compact output representations in a fast and scalable manner. To achieve this, we introduce (1) weighted locality sensitive hashing to drastically reduce the number comparisons required to find good node merges, (2) an efficient way to compute the best quality merges that produces more compact outputs, and (3) a new sort-based encoding algorithm that is faster and more robust. More interestingly, our algorithm provides performance tuning settings to allow the option of trading compression for running time. On high compression settings, LDME achieves compression equal to or better than the state of the art with up to 53x speedup in running time. On high speed settings, LDME achieves up to two orders of magnitude speedup with only slightly lower compression.

Supplemental Material

3448016.3457331.mp4

mp4

44.8 MB

Download

Available for Download

pdf

Read me (107.9 KB)

zip

Source Code (48.5 MB)

References

Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., and Smola, A. J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web (2013), pp. 37--48.Google ScholarDigital Library
Apostolico, A., and Drovandi, G. Graph compression by bfs. Algorithms 2, 3 (2009), 1031--1044.Google ScholarCross Ref
Besta, M., Weber, S., Gianinazzi, L., Gerstenberger, R., Ivanov, A., Oltchik, Y., and Hoefler, T. Slim graph: Practical lossy graph compression for approximate graph processing, storage, and analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (New York, NY, USA, 2019), SC '19, Association for Computing Machinery.Google ScholarDigital Library
Boldi, P., and Vigna, S. The webgraph framework i: compression techniques. In Proceedings of the 13th International Conference on World Wide Web (2004), pp. 595--602.Google ScholarDigital Library
Cook, D. J., and Holder, L. B. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1993), 231--255.Google ScholarCross Ref
Dunne, C., and Shneiderman, B. Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2013), pp. 3247--3256.Google ScholarDigital Library
Fan, W., Li, J., Wang, X., and Wu, Y. Query preserving graph compression. In Proceedings of the 38th ACM SIGMOD International Conference on Management of Data (2012), pp. 157--168.Google ScholarDigital Library
Gou, X., Zou, L., Zhao, C., and Yang, T. Fast and accurate graph stream summarization. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE) (2019), pp. 1118--1129.Google ScholarCross Ref
Hübler, C., Kriegel, H.-P., Borgwardt, K., and Ghahramani, Z. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM) (2008), pp. 283--292.Google ScholarDigital Library
Khan, A., Bhowmick, S. S., and Bonchi, F. Summarizing static and dynamic big graphs.Google Scholar
Khan, K. U. Set-based approach for lossless graph summarization using locality sensitive hashing. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops (2015), pp. 255--259.Google ScholarCross Ref
Khan, K. U., Nawaz, W., and Lee, Y.-K. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 1185--1207.Google ScholarDigital Library
Ko, J., Kook, Y., and Shin, K. Incremental lossless graph summarization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Jul 2020).Google ScholarDigital Library
Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Vog: Summarizing and understanding large graphs. CoRR abs/1406.3411 (2014).Google Scholar
Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Summarizing and understanding large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal 8, 3 (2015), 183--202.Google ScholarDigital Library
Kumar, K. A., and Efstathopoulos, P. Utility-driven graph summarization. Proceedings of the VLDB Endowment 12, 4 (2018), 335--347.Google ScholarDigital Library
Lee, K., Jo, H., Ko, J., Lim, S., and Shin, K. Ssumm: Sparse summarization of massive graphs. In KDD: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2020), pp. 144--154.Google ScholarDigital Library
LeFevre, K., and Terzi, E. Grass: Graph structure summarization. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM) (2010), pp. 454--465.Google ScholarCross Ref
Leskovec, J., and Faloutsos, C. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006), pp. 631--636.Google ScholarDigital Library
Li, C., Baciu, G., and Wang, Y. Modulgraph: modularity-based visualization of massive graphs. In Proceedings of the SIGGRAPH Asia 2015 Visualization in High Performance Computing (2015), pp. 1--4.Google ScholarDigital Library
Li, C.-T., and Lin, S.-D. Egocentric information abstraction for heterogeneous social networks. In Proceedings of the 1st International Conference on Advances in Social Network Analysis and Mining (2009), pp. 255--260.Google ScholarDigital Library
Liberty, E. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), pp. 581--588.Google ScholarDigital Library
Liu, X., Tian, Y., He, Q., Lee, W.-C., and McPherson, J. Distributed graph summarization. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014), pp. 799--808.Google ScholarDigital Library
Liu, Y., Dighe, A., Safavi, T., and Koutra, D. A graph summarization: A survey. CoRR abs/1612.04883 (2016).Google Scholar
Maccioni, A., and Abadi, D. J. Scalable pattern matching over compressed graphs via dedensification. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1755--1764.Google ScholarDigital Library
Maiya, A. S., and Berger-Wolf, T. Y. Sampling community structure. In Proceedings of the 19th International Conference on World Wide Web (2010), pp. 701--710.Google ScholarDigital Library
Navlakha, S., Rastogi, R., and Shrivastava, N. Graph summarization with bounded error. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2008), pp. 419--432.Google ScholarDigital Library
Riondato, M., Garc'ia-Soriano, D., and Bonchi, F. Graph summarization with quality guarantees. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM) (2014), pp. 947--952.Google ScholarDigital Library
Rossi, R. A., and Zhou, R. Graphzip: a clique-based sparse graph compression method. Journal of Big Data 5, 1 (2018), 10.Google ScholarCross Ref
Shah, N., Koutra, D., Jin, L., Zou, T., Gallagher, B., and Faloutsos, C. On summarizing large-scale dynamic graphs. IEEE Data Eng. Bull. 40, 3 (2017), 75--88.Google Scholar
Shin, K., Ghoting, A., Kim, M., and Raghavan, H. Sweg: Lossless and lossy summarization of web-scale graphs. In Proceedings of the 28th International Conference on World Wide Web (2019), pp. 1679--1690.Google ScholarDigital Library
Shrivastava, A. Simple and efficient weighted minwise hashing. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NeurIPS) (2016), pp. 1498--1506.Google Scholar
Shrivastava, A., and Li, P. Improved densification of one permutation hashing. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI) (2014), pp. 732--741.Google Scholar
Spielman, D. A., and Srivastava, N. Graph sparsification by effective resistances. SIAM Journal on Computing 40, 6 (2011), 1913--1926.Google Scholar
Tang, N., Chen, Q., and Mitra, P. Graph stream summarization: From big bang to big crunch. In Proceedings of the 2016 International Conference on Management of Data (New York, NY, USA, 2016), SIGMOD '16, Association for Computing Machinery, p. 1481--1496.Google Scholar
Tian, Y., Hankins, R. A., and Patel, J. M. Efficient aggregation for graph summarization. In Proceedings of the 34th ACM SIGMOD International Conference on Management of Data (2008), pp. 567--580.Google ScholarDigital Library
Yan, N., Hasani, S., Asudeh, A., and Li, C. Generating preview tables for entity graphs. In Proceedings of the 2016 International Conference on Management of Data (2016), pp. 1797--1811.Google ScholarDigital Library
Zeqian Shen, Kwan-Liu Ma, and Eliassi-Rad, T. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Transactions on Visualization and Computer Graphics 12, 6 (2006), 1427--1439.Google ScholarDigital Library

Index Terms

Efficient Graph Summarization using Weighted LSH at Billion-Scale

Recommendations

Incremental Lossless Graph Summarization
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable ...
Read More
Graph Summarization Methods and Applications: A Survey

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus ...
Read More
Lossless graph summarization using dense subgraphs discovery
IMCOM '15: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication

Dense subgraph discovery, in a large graph, is useful to solve the community search problem. Motivated from this, we propose a graph summarization method where we search and aggregate dense subgraphs into super nodes. Since the dense subgraphs have high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
graph summarization
jaccard similarity
weighted lsh
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 709
  Total Downloads
- Downloads (Last 12 months)169
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient Graph Summarization using Weighted LSH at Billion-Scale

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Incremental Lossless Graph Summarization

Graph Summarization Methods and Applications: A Survey

Lossless graph summarization using dense subgraphs discovery