Set-based unified approach for summarization of a multi-attributed graph

Khan, Kifayat Ullah; Nawaz, Waqas; Lee, Young-Koo

doi:10.1007/s11280-016-0388-y

Set-based unified approach for summarization of a multi-attributed graph

Published: 30 April 2016

Volume 20, pages 543–570, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Kifayat Ullah Khan¹,
Waqas Nawaz¹^nAff2 &
Young-Koo Lee¹

561 Accesses
9 Citations
Explore all metrics

Abstract

Rich availability of real world knowledge in a graph based on attributes of each vertex and its interactions, is a valuable source of information. However, it is hard to derive this useful knowledge since either graphs of current era do not fit in main memory or cannot be efficiently processed. In this regard, it is better to create a meaningful summary graph that is compact yet preserves intrinsic properties of its underlying graph. In this paper, we propose a summarization approach for a big graph, where each node is attached with multiple attributes. Main intuition behind our approach is based on a real life concept that tells “friends of friends have many common friends and also have similar likes and preferences”. We use this phenomenon as the basis in our paper to identify sets of nodes having common neighborhood and similar attributes, for summarization. Existing aggregation-based summarization methods use pairwise heuristic to find similar pairs of nodes for compression. Whereas, pairwise similarity computations can check both neighborhood as well as attributes similarities, however, it is impractical to summarize a big graph. For this purpose, we propose a set-based approach for efficient summarization. To identify each set, we adopt Locality Sensitive Hashing (LSH) to restrict similarity computations within candidate similar nodes only. Since, existing LSH techniques only consider neighborhood similarity in a graph, therefore we propose a Unified LSH approach to simultaneously consider both attributes and neighborhood similarities. Further, using Minimum Description Length (MDL) principle, we present a new technique to perform lossless summarization of each set by creating a super node or adding a new virtual node in summary graph. We evaluate our proposed approach with state of the art methods on synthetic and publicly available real world graphs and observe better results in terms of execution time, compression ratio, and number of corrections to restructure the original graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Set-based approximate approach for lossless graph summarization

Article 28 April 2015

Graph Summarization Based on Attribute-Connected Network

Dense Subgraphs Summarization: An Efficient Way to Summarize Large Scale Graphs by Super Nodes

Notes

Total number of minutes spent on Facebook each month: 640 Million. http://www.statisticbrain.com/facebook-statistics/. Last accessed on 03/07/2016

References

Boldi, P., Vigna, S.: The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, pp 595–602. ACM (2004)
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web. Comput. Netw. 33(1), 309–320 (2000)
Article Google Scholar
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the Web. Computer Networks and ISDN Systems 29(8), 1157–1166 (1997)
Article Google Scholar
Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 219–228. ACM (2009)
Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM (991)
Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense implicit communities in the Web graph. ACM Trans. Web (TWEB) 3(2), 7 (2009)
Google Scholar
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7(7), 517–528 (2014)
Article Google Scholar
Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 721–732 (2005)
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp, 518–529 (1999)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet MATH Google Scholar
Hernández, C., Navarro, G.: Compressed representations for Web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)
Article Google Scholar
Jakawat, W., Favre, C., Loudcher, S.: Olap on information networks: A new framework for dealing with bibliographic data. In: New Trends in Databases and Information Systems, pp 361–370. Springer (2014)
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543. ACM (2002)
Khan, K.U., Nawaz, W., Lee, Y.K.: Set-based unified approach for attributed graph summarization. In: Proceedings of Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on Social Computing and Networking (SocialCom) . IEEE (2014)
Khan, K.U., Nawaz, W., Lee, Y.K.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)
Article MathSciNet MATH Google Scholar
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia. doi:10.1137/1.9781611973440.11, pp 91–99 (2014)
Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: Summarizing and understanding large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal 8(3), 183–202 (2015). doi:10.1002/sam.11267
Article MathSciNet Google Scholar
LeFevre, K., Terzi, E.: Grass: Graph structure summarization. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, pp 454–465 (2010)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 177–187. ACM (2005)
Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: Parallel simrank at scale. Proc. VLDB Endow 9(1), 24–35 (2015). doi:10.14778/2850469.2850472
Article Google Scholar
Liakos, P., Papakonstantinopoulou, K., Sioutis, M.: Pushing the envelope in graph compression. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp 1549–1558. ACM (2014)
Lim, Y., Kang, U., Faloutsos, C.: Slashburn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26(12), 3077–3089 (2014)
Article Google Scholar
Lorrain, F., White, H.C.: Structural equivalence of individuals in social networks. J. Math. Sociol. 1(1), 49–80 (1971)
Article Google Scholar
Macropol, K., Singh, A.: 1–2. Proceedings of the VLDB Endowment 3, 693–702 (2010)
Article Google Scholar
Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp 419–432. ACM (2008)
Nawaz, W., Han, Y., Khan, K.U., Lee, Y.K.: Personalized email community detection using collaborative similarity measure. arXiv:13061300(2013)
Nawaz, W., Khan, K.U., Lee, Y.K.: Spore: shortest path overlapped regions and confined traversals towards graph clustering. Appl. Intell., 1–25 (2014a)
Nawaz, W., Khan, K.U., Lee, Y.K., Lee, S.: Intra graph clustering using collaborative similarity measure. Distributed and Parallel Databases, 1–21 (2014b)
Newman, M.E., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. rev. E 64(2), 026,118 (2001)
Perozzi, B., Akoglu, L., Iglesias Sánchez, P., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1346–1355. ACM (2014)
Qu, Q., Zhu, F., Yan, X., Han, J., Philip, S.Y., Li, H.: Efficient topological olap on information networks. In: Database Systems for Advanced Applications, pp 389–403. Springer (2011)
Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Springer, pp 597–613 (2014)
Rajaraman, A., Ullman, J.D., Ullman, J.D., Ullman, J.D.: Mining of massive datasets, vol, 77. Cambridge University Press, Cambridge (2012)
Google Scholar
Riondato, M., Garcia-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: 2014 IEEE International Conference on Data Mining (ICDM), pp 947–952. IEEE (2014)
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article MATH Google Scholar
Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd international conference on world wide Web, International World Wide Web Conferences Steering Committee, pp, 1089–1098 (2013)
Satuluri, V., Parthasarathy, S., Ruan, Y.: Local graph sparsification for scalable clustering. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp 721–732. ACM (2011)
Schaeffer, S.E.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)
Article MATH Google Scholar
Seidman, S.B.: Network structure and minimum degree. Soc. Networks 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Shah, N., Koutra, D., Zou, T., Gallagher, B., Faloutsos, C.: Timecrunch: Interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1055–1064. ACM (2015)
Shi, L., Tong, H., Tang, J., Lin, C.: Flow-based influence graph visual summarization. In: 2014 IEEE International Conference on Data Mining (ICDM), pp 983–988. IEEE (2014)
Shi, L., Tong, H., Tang, J., Lin, C.: Vegas: Visual influence graph summarization on citation networks. In: IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp 3417–3431 (2015)
Silva, A., Meira, W. Jr, Zaki, M.J.: Mining attribute-structure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment 5(5), 466–477 (2012)
Article Google Scholar
Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 939–948. ACM (2010)
Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp 567–580. ACM (2008)
Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 965–973. ACM (2011)
Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: A survey. arXiv:14082927 (2014)
Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on Data Mining (ICDM), pp 1151–1156. IEEE (2013)
Yin, M., Wu, B., Zeng, Z.: Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, pp 137–144. ACM (2012)
Yu, W., Lin, X., Zhang, W., McCann, J.A.: Fast all-pairs simrank assessment on large graphs and bipartite domains. IEEE Trans. Knowl. Data Eng. 27 (7), 1810–1823 (2015). doi:10.1109/TKDE.2014.2339828
Article Google Scholar
Zhang, J., Hong, X., Peng, Z., Li, Q.: Nestedcube: Towards online analytical processing on information-enhanced multidimensional network. In: Web-Age Information Management, pp 128–139. Springer (2012)
Zhao, P., Li, X., Xin, D., Han, J.: Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp 853–864. ACM (2011)
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2(1), 718–729 (2009)
Article Google Scholar
Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 821–832. ACM (2013)

Download references

Acknowledgment

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No.2015R1A2A2A01008209).

Author information

Waqas Nawaz
Present address: Institute of Information Systems, Innopolis University, Universitetskaya St. 1, Innopolis, Tatarstan Republic, 420500, Russia

Authors and Affiliations

Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, 446-701, Republic of Korea
Kifayat Ullah Khan, Waqas Nawaz & Young-Koo Lee

Authors

Kifayat Ullah Khan
View author publications
You can also search for this author in PubMed Google Scholar
Waqas Nawaz
View author publications
You can also search for this author in PubMed Google Scholar
Young-Koo Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, K.U., Nawaz, W. & Lee, YK. Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20, 543–570 (2017). https://doi.org/10.1007/s11280-016-0388-y

Download citation

Received: 18 August 2015
Revised: 06 March 2016
Accepted: 18 March 2016
Published: 30 April 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11280-016-0388-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Set-based unified approach for summarization of a multi-attributed graph

Abstract

Access this article

Similar content being viewed by others

Set-based approximate approach for lossless graph summarization

Graph Summarization Based on Attribute-Connected Network

Dense Subgraphs Summarization: An Efficient Way to Summarize Large Scale Graphs by Super Nodes

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Set-based unified approach for summarization of a multi-attributed graph

Abstract

Access this article

Similar content being viewed by others

Set-based approximate approach for lossless graph summarization

Graph Summarization Based on Attribute-Connected Network

Dense Subgraphs Summarization: An Efficient Way to Summarize Large Scale Graphs by Super Nodes

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation