Set-based approximate approach for lossless graph summarization

Khan, Kifayat Ullah; Nawaz, Waqas; Lee, Young-Koo

doi:10.1007/s00607-015-0454-9

Set-based approximate approach for lossless graph summarization

Published: 28 April 2015

Volume 97, pages 1185–1207, (2015)
Cite this article

Computing Aims and scope Submit manuscript

Kifayat Ullah Khan¹,
Waqas Nawaz¹ &
Young-Koo Lee¹

968 Accesses
Explore all metrics

Abstract

Graph summarization is valuable approach to analyze various real life phenomenon, like communities, influential nodes, and information flow in a big graph. To summarize a graph, nodes having similar neighbors are merged into super nodes and their corresponding edges are compressed into super edges. Existing methods find similar nodes either by nodes ordering or perform pairwise similarity computations. Compression-by-node ordering approaches are scalable but provide lesser compression due to exhaustive similarity computations of their counterparts. In this paper, we propose a novel set-based summarization approach that directly summarizes naturally occurring sets of similar nodes in a graph. Our approach is scalable since we avoid explicit similarity computations with non-similar nodes and merge sets of nodes in each iteration. Similarly, we provide good compression ratio as each set consists of highly similar nodes. To locate sets of similar nodes, we find candidate sets of similar nodes by using locality sensitive hashing. However, member nodes of every candidate set have varying similarities with each other. Therefore, we propose a heuristic based on similarity among degrees of candidate nodes, and a parameter-free pruning technique to effectively identify subset of highly similar nodes from candidate nodes. Through experiments on real world graphs, our approach requires lesser execution time than pairwise graph summarization, with margin of an order of magnitude in graphs containing nodes with highly diverse neighborhood, and produces summary at similar accuracy. Similarly, we observe comparable scalability against the compression-by-node ordering method, while providing better compression ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Set-based unified approach for summarization of a multi-attributed graph

Article 30 April 2016

An effective graph summarization and compression technique for a large-scaled graph

Article 15 January 2018

Reducing large graphs to small supergraphs: a unified approach

Article 10 March 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://snap.stanford.edu/data/index.html. Last accessed on 10/28/2014.

References

Boldi P, Vigna S (2004) The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, ACM, pp 595–602
Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of Compression and Complexity of Sequences 1997. IEEE, pp 21–29
Buttler D (2004) A short survey of document structure similarity algorithms. In: International conference on internet computing, pp 3–9
Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Eighth IEEE international conference on data mining, 2008. ICDM’08, IEEE, pp 103–112
Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228
Chum O, Philbin J, Isard M, Zisserman A (2007) Scalable near identical image and shot detection. In: Proceedings of the 6th ACM international conference on Image and video retrieval, ACM, pp 549–556
Ghazizadeh S, Chawathe SS (2002) Seus: structure extraction using summaries. In: Discovery science. Springer, Berlin, pp 71–85
Gorisse D, Cord M, Precioso F (2012) Locality-sensitive hashing for chi2 distance. IEEE Trans Pattern Anal Mach Intell 34(2):402–409
Article Google Scholar
Hernández C, Navarro G (2013) Compressed representations for web and social graphs. In: Knowledge and information systems, pp 1–35
Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces (survey article). ACM Trans Database Syst (TODS) 28(4):517–580
Article Google Scholar
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, ACM, pp 604–613
Kang U, Faloutsos C (2011) Beyond’caveman communities’: Hubs and spokes for graph compression and mining. In: 2011 IEEE 11th international conference on data mining (ICDM), IEEE, pp 300–309
Ketkar NS, Holder LB, Cook DJ (2005) Subdue: compression-based frequent pattern discovery in graph data. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, ACM, pp 71–76
Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, 2014, pp 91–99. doi:10.1137/1.9781611973440.11
LeFevre K, Terzi E (2010) Grass: graph structure summarization. Proceedings of the SIAM international conference on data mining, SDM 2010, April 29–May 1, 2010. Columbus, Ohio, pp 454–465
Google Scholar
Liu S, Chen L, Ni LM, Fan J (2011) Cim: categorical influence maximization. In: Proceedings of the 5th international conference on ubiquitous information management and communication, ACM, p 124
Liu S, Wang S, Zhu F, Zhang J, Krishnan R (2014) Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, pp 51–62
Macropol K, Singh A (2010) Scalable discovery of best clusters on large graphs. Proc VLDB Endowment 3(1–2):693–702
Article Google Scholar
Mu Y, Yan S (2010) Non-metric locality-sensitive hashing. In: AAAI
Nanopoulos A, Manolopoulos Y (2002) Efficient similarity search for market basket data. VLDB J 11(2):138–152
Article Google Scholar
Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 419–432
Qu Q, Zhu F, Yan X, Han J, Philip SY, Li H (2011) Efficient topological olap on information networks. In: Database systems for advanced applications. Springer, Berlin, pp 389–403
Qu Q, Liu S, Jensen CS, Zhu F, Faloutsos C (2014) Interestingness-driven diffusion process summarization in dynamic networks. In: Machine learning and knowledge discovery in databases. Springer, Berlin, pp 597–613
Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, Cambridge
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Article MATH Google Scholar
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 785–796
Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, ACM, pp 567–580
Toivonen H, Zhou F, Hartikainen A, Hinkka A (2011) Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 965–973
Tomar VS, Rose RC (2013) Efficient manifold learning for speech recognition using locality sensitive hashing. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6995–6999
Wang J, Shen HT, Song J, Ji J (2014) Hashing for similarity search: a survey. arXiv:1408.2927
Xiong Y, Zhu Y, Yu P (2014) Top-k similarity join in heterogeneous information networks. In: IEEE Trans Knowledge Data Eng, PP(99), 1. doi:10.1109/TKDE.2014.2373385
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: 2002 IEEE international conference on data mining, 2002. ICDM 2003. Proceedings, IEEE, pp 721–724
Yan X, Zhu F, Yu PS, Han J (2006) Feature-based similarity search in graph structures. ACM Trans Database Syst (TODS) 31(4):1418–1453
Article Google Scholar
Yin M, Wu B, Zeng Z (2012) Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, ACM, pp 137–144
Zhao P, Li X, Xin D, Han J (2011) Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 853–864
Zhu F, Zhang Z, Qu Q (2013) A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ACM, pp 821–832

Download references

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2014R1A2A1A05043734).

Author information

Authors and Affiliations

Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Republic of Korea
Kifayat Ullah Khan, Waqas Nawaz & Young-Koo Lee

Authors

Kifayat Ullah Khan
View author publications
You can also search for this author inPubMed Google Scholar
Waqas Nawaz
View author publications
You can also search for this author inPubMed Google Scholar
Young-Koo Lee
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, K.U., Nawaz, W. & Lee, YK. Set-based approximate approach for lossless graph summarization. Computing 97, 1185–1207 (2015). https://doi.org/10.1007/s00607-015-0454-9

Download citation

Received: 29 October 2014
Accepted: 07 April 2015
Published: 28 April 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s00607-015-0454-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Set-based approximate approach for lossless graph summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Set-based unified approach for summarization of a multi-attributed graph

An effective graph summarization and compression technique for a large-scaled graph

Reducing large graphs to small supergraphs: a unified approach

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now