Skip to main content
Log in

An effective graph summarization and compression technique for a large-scaled graph

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Graphs are widely used in various applications, and their size is becoming larger over the passage of time. It is necessary to reduce their size to minimize main memory needs and to save the storage space on disk. For these purposes, graph summarization and compression approaches have been studied in various existing studies to reduce the size of a large graph. Graph summarization aggregates nodes having similar structural properties to represent a graph with reduced main memory requirements. Whereas graph compression applies various encoding techniques so that the resultant graph needs lesser storage space on disk. Considering usefulness of both the paradigms, we propose to obtain best of the both worlds by combining summarization and compression approaches. Hence, we present a greedy-based algorithm that greatly reduces the size of a large graph by applying both the compression and summarization. We also propose a novel cost model for calculating the compression ratio considering both the compression and summarization strategies. The algorithm uses the proposed cost model to determine whether to perform one or both of them in every iteration. Through comprehensive experiments on real-world datasets, we show that our proposed algorithm achieves a better compression ratio than only applying summarization approaches by up to 16%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Source: https://techcrunch.com/2017/06/27/facebook-2-billion-users/ Last accessd on 09/22/2017.

References

  1. Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VoG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp 91–99

  2. Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 965–973

  3. Khan KU, Nawaz W, Lee YK (2015) Set-based approximate approach for lossless graph summarization. Computing 97(12):1185–1207

    Article  MathSciNet  Google Scholar 

  4. Koutra D, Kang U, Vreeken J, Faloutsos C (2015) Summarizing and understanding large graphs. Stat Anal Data Min ASA Data Sci J 8(3):183–202. https://doi.org/10.1002/sam.11267

  5. Khan KU (2015) Set-based approach for lossless graph summarization using locality sensitive hashing. In: 31st IEEE International Conference on Data Engineering Workshops (ICDEW), 2015. IEEE, pp 255–259

  6. LeFevre K, Terzi E (2010) Grass: graph structure summarization. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, pp 454–465

  7. Shi L, Tong H, Tang J, Lin C (2014) Flow-based influence graph visual summarization. In: 2014 IEEE International Conference on Data Mining (ICDM), pp 983–988. https://doi.org/10.1109/ICDM.2014.128

  8. Shi L, Tong H, Tang J, Lin C (2015) Vegas: visual influence graph summarization on citation networks. IEEE Trans Knowl Data Eng 27(12):3417–3431

    Article  Google Scholar 

  9. Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data. ACM, pp 419–432

  10. Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions—I. Math Program 14(1):265–294

    Article  MathSciNet  Google Scholar 

  11. Liakos P, Papakonstantinopoulou K, Sioutis M (2014) Pushing the envelope in graph compression. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. ACM

  12. Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp 567–580

  13. Tang N, Chen Q, Mitra P (2016) Graph stream summarization: from big bang to big crunch. In: Proceedings of the 2016 International Conference on Management of Data. ACM

  14. Boldi P, Vigna S (2004) The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web. ACM, pp 595–602

  15. Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 219–228

  16. Maserrat H, Pei J (2010) Neighbor query friendly compression of social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  17. Hernandez C, Navarro G (2014) Compressed representations for web and social graphs. Knowl Inf Syst 40(2):279

    Article  Google Scholar 

  18. Wu K, Shoshani A, Otoo E (2004) U.S. Patent No. 6,831,575. U.S. Patent and Trademark Office, Washington, DC

  19. Apostolico A, Drovandi G (2009) Graph compression by BFS. Algorithms 2(3):1031–1044

    Article  MathSciNet  Google Scholar 

  20. Faloutsos C, Megalooikonomou V (2007) On data mining, compression and Kolmogorov complexity. Data Min Knowl Discov 15:3–20

    Article  MathSciNet  Google Scholar 

  21. Seo H, Kim H, Park K, Han Y, Lee YK (2015) Summarization technique on a compressed graph for massive graph analysis. Korean Soc Big Data Serv 2(1):25–35

    Google Scholar 

  22. Otoo EJ, Shosahni A, Nordberg H (2001) Notes on design and implementation of compressed bit vectors. Lawrence Berkeley National Laboratory, Berkeley

    Google Scholar 

  23. Lim Y, Kang U, Faloutsos C (2014) Slashburn: graph compression and mining beyond caveman communities. IEEE Trans Knowl Data Eng 26(12):3077–3089

    Article  Google Scholar 

  24. van Schaik SJ, de Moor O (2011) A memory efficient reachability data structure through bit vector compression. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM

  25. Riondato M, Garcia-Soriano D, Bonchi F (2014) Graph summarization with quality guarantees. In: 2014 IEEE International Conference on Data Mining (ICDM). IEEE, pp 947–952

  26. Liu W, Kan A, Chan J, Bailey J, Leckie C, Pei J, Kotagiri R (2012) On compressing weighted time-evolving graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 2319–2322

  27. Khan KU et al (2017) Faster compression methods for a weighted graph using locality sensitive hashing. Inf Sci 421:237–253

    Article  MathSciNet  Google Scholar 

  28. Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 880–891

  29. Khan KU, Nawaz W, Lee YK (2017) Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20(3):543–570

    Article  Google Scholar 

  30. Liu Y, Dighe A, Safavi T, Koutra D (2016) A graph summarization: a survey. http://arxiv.org/abs/1612.04883

  31. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471

    Article  Google Scholar 

  32. SNAP Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MEST) (No. 2015R1A2A2A01008209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seo, H., Park, K., Han, Y. et al. An effective graph summarization and compression technique for a large-scaled graph. J Supercomput 76, 7906–7920 (2020). https://doi.org/10.1007/s11227-018-2245-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2245-5

Keywords

Navigation