ABSTRACT
Summarizing graphs is of paramount importance due to diverse applications of large-scale graph analysis. A popular family of summarization methods is the group-based approach. The general idea consists of merging nodes of the original graph into supernodes of the summary graph, encoding original edges into superedges/correction set edges, and dropping certain superedges or correction set edges (for lossy summarization). The current state of the art has several steps in its computation that are serious bottlenecks in terms of running time and scalability. In this work, we propose algorithm LDME, a correction set based graph summarization algorithm that produces compact output representations in a fast and scalable manner. To achieve this, we introduce (1) weighted locality sensitive hashing to drastically reduce the number comparisons required to find good node merges, (2) an efficient way to compute the best quality merges that produces more compact outputs, and (3) a new sort-based encoding algorithm that is faster and more robust. More interestingly, our algorithm provides performance tuning settings to allow the option of trading compression for running time. On high compression settings, LDME achieves compression equal to or better than the state of the art with up to 53x speedup in running time. On high speed settings, LDME achieves up to two orders of magnitude speedup with only slightly lower compression.
Supplemental Material
Available for Download
- Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., and Smola, A. J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web (2013), pp. 37--48.Google ScholarDigital Library
- Apostolico, A., and Drovandi, G. Graph compression by bfs. Algorithms 2, 3 (2009), 1031--1044.Google ScholarCross Ref
- Besta, M., Weber, S., Gianinazzi, L., Gerstenberger, R., Ivanov, A., Oltchik, Y., and Hoefler, T. Slim graph: Practical lossy graph compression for approximate graph processing, storage, and analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (New York, NY, USA, 2019), SC '19, Association for Computing Machinery.Google ScholarDigital Library
- Boldi, P., and Vigna, S. The webgraph framework i: compression techniques. In Proceedings of the 13th International Conference on World Wide Web (2004), pp. 595--602.Google ScholarDigital Library
- Cook, D. J., and Holder, L. B. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1993), 231--255.Google ScholarCross Ref
- Dunne, C., and Shneiderman, B. Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2013), pp. 3247--3256.Google ScholarDigital Library
- Fan, W., Li, J., Wang, X., and Wu, Y. Query preserving graph compression. In Proceedings of the 38th ACM SIGMOD International Conference on Management of Data (2012), pp. 157--168.Google ScholarDigital Library
- Gou, X., Zou, L., Zhao, C., and Yang, T. Fast and accurate graph stream summarization. In Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE) (2019), pp. 1118--1129.Google ScholarCross Ref
- Hübler, C., Kriegel, H.-P., Borgwardt, K., and Ghahramani, Z. Metropolis algorithms for representative subgraph sampling. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM) (2008), pp. 283--292.Google ScholarDigital Library
- Khan, A., Bhowmick, S. S., and Bonchi, F. Summarizing static and dynamic big graphs.Google Scholar
- Khan, K. U. Set-based approach for lossless graph summarization using locality sensitive hashing. In Proceedings of the 31st IEEE International Conference on Data Engineering Workshops (2015), pp. 255--259.Google ScholarCross Ref
- Khan, K. U., Nawaz, W., and Lee, Y.-K. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 1185--1207.Google ScholarDigital Library
- Ko, J., Kook, Y., and Shin, K. Incremental lossless graph summarization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Jul 2020).Google ScholarDigital Library
- Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Vog: Summarizing and understanding large graphs. CoRR abs/1406.3411 (2014).Google Scholar
- Koutra, D., Kang, U., Vreeken, J., and Faloutsos, C. Summarizing and understanding large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal 8, 3 (2015), 183--202.Google ScholarDigital Library
- Kumar, K. A., and Efstathopoulos, P. Utility-driven graph summarization. Proceedings of the VLDB Endowment 12, 4 (2018), 335--347.Google ScholarDigital Library
- Lee, K., Jo, H., Ko, J., Lim, S., and Shin, K. Ssumm: Sparse summarization of massive graphs. In KDD: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2020), pp. 144--154.Google ScholarDigital Library
- LeFevre, K., and Terzi, E. Grass: Graph structure summarization. In Proceedings of the 10th SIAM International Conference on Data Mining (SDM) (2010), pp. 454--465.Google ScholarCross Ref
- Leskovec, J., and Faloutsos, C. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006), pp. 631--636.Google ScholarDigital Library
- Li, C., Baciu, G., and Wang, Y. Modulgraph: modularity-based visualization of massive graphs. In Proceedings of the SIGGRAPH Asia 2015 Visualization in High Performance Computing (2015), pp. 1--4.Google ScholarDigital Library
- Li, C.-T., and Lin, S.-D. Egocentric information abstraction for heterogeneous social networks. In Proceedings of the 1st International Conference on Advances in Social Network Analysis and Mining (2009), pp. 255--260.Google ScholarDigital Library
- Liberty, E. Simple and deterministic matrix sketching. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), pp. 581--588.Google ScholarDigital Library
- Liu, X., Tian, Y., He, Q., Lee, W.-C., and McPherson, J. Distributed graph summarization. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (2014), pp. 799--808.Google ScholarDigital Library
- Liu, Y., Dighe, A., Safavi, T., and Koutra, D. A graph summarization: A survey. CoRR abs/1612.04883 (2016).Google Scholar
- Maccioni, A., and Abadi, D. J. Scalable pattern matching over compressed graphs via dedensification. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 1755--1764.Google ScholarDigital Library
- Maiya, A. S., and Berger-Wolf, T. Y. Sampling community structure. In Proceedings of the 19th International Conference on World Wide Web (2010), pp. 701--710.Google ScholarDigital Library
- Navlakha, S., Rastogi, R., and Shrivastava, N. Graph summarization with bounded error. In Proceedings of the ACM SIGMOD International Conference on Management of Data (2008), pp. 419--432.Google ScholarDigital Library
- Riondato, M., Garc'ia-Soriano, D., and Bonchi, F. Graph summarization with quality guarantees. In Proceedings of the 14th IEEE International Conference on Data Mining (ICDM) (2014), pp. 947--952.Google ScholarDigital Library
- Rossi, R. A., and Zhou, R. Graphzip: a clique-based sparse graph compression method. Journal of Big Data 5, 1 (2018), 10.Google ScholarCross Ref
- Shah, N., Koutra, D., Jin, L., Zou, T., Gallagher, B., and Faloutsos, C. On summarizing large-scale dynamic graphs. IEEE Data Eng. Bull. 40, 3 (2017), 75--88.Google Scholar
- Shin, K., Ghoting, A., Kim, M., and Raghavan, H. Sweg: Lossless and lossy summarization of web-scale graphs. In Proceedings of the 28th International Conference on World Wide Web (2019), pp. 1679--1690.Google ScholarDigital Library
- Shrivastava, A. Simple and efficient weighted minwise hashing. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NeurIPS) (2016), pp. 1498--1506.Google Scholar
- Shrivastava, A., and Li, P. Improved densification of one permutation hashing. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI) (2014), pp. 732--741.Google Scholar
- Spielman, D. A., and Srivastava, N. Graph sparsification by effective resistances. SIAM Journal on Computing 40, 6 (2011), 1913--1926.Google Scholar
- Tang, N., Chen, Q., and Mitra, P. Graph stream summarization: From big bang to big crunch. In Proceedings of the 2016 International Conference on Management of Data (New York, NY, USA, 2016), SIGMOD '16, Association for Computing Machinery, p. 1481--1496.Google Scholar
- Tian, Y., Hankins, R. A., and Patel, J. M. Efficient aggregation for graph summarization. In Proceedings of the 34th ACM SIGMOD International Conference on Management of Data (2008), pp. 567--580.Google ScholarDigital Library
- Yan, N., Hasani, S., Asudeh, A., and Li, C. Generating preview tables for entity graphs. In Proceedings of the 2016 International Conference on Management of Data (2016), pp. 1797--1811.Google ScholarDigital Library
- Zeqian Shen, Kwan-Liu Ma, and Eliassi-Rad, T. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Transactions on Visualization and Computer Graphics 12, 6 (2006), 1427--1439.Google ScholarDigital Library
Index Terms
- Efficient Graph Summarization using Weighted LSH at Billion-Scale
Recommendations
Incremental Lossless Graph Summarization
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningGiven a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable ...
Graph Summarization Methods and Applications: A Survey
While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus ...
Lossless graph summarization using dense subgraphs discovery
IMCOM '15: Proceedings of the 9th International Conference on Ubiquitous Information Management and CommunicationDense subgraph discovery, in a large graph, is useful to solve the community search problem. Motivated from this, we propose a graph summarization method where we search and aggregate dense subgraphs into super nodes. Since the dense subgraphs have high ...
Comments