Skip to main content
Log in

Set-based unified approach for summarization of a multi-attributed graph

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Rich availability of real world knowledge in a graph based on attributes of each vertex and its interactions, is a valuable source of information. However, it is hard to derive this useful knowledge since either graphs of current era do not fit in main memory or cannot be efficiently processed. In this regard, it is better to create a meaningful summary graph that is compact yet preserves intrinsic properties of its underlying graph. In this paper, we propose a summarization approach for a big graph, where each node is attached with multiple attributes. Main intuition behind our approach is based on a real life concept that tells “friends of friends have many common friends and also have similar likes and preferences”. We use this phenomenon as the basis in our paper to identify sets of nodes having common neighborhood and similar attributes, for summarization. Existing aggregation-based summarization methods use pairwise heuristic to find similar pairs of nodes for compression. Whereas, pairwise similarity computations can check both neighborhood as well as attributes similarities, however, it is impractical to summarize a big graph. For this purpose, we propose a set-based approach for efficient summarization. To identify each set, we adopt Locality Sensitive Hashing (LSH) to restrict similarity computations within candidate similar nodes only. Since, existing LSH techniques only consider neighborhood similarity in a graph, therefore we propose a Unified LSH approach to simultaneously consider both attributes and neighborhood similarities. Further, using Minimum Description Length (MDL) principle, we present a new technique to perform lossless summarization of each set by creating a super node or adding a new virtual node in summary graph. We evaluate our proposed approach with state of the art methods on synthetic and publicly available real world graphs and observe better results in terms of execution time, compression ratio, and number of corrections to restructure the original graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

Notes

  1. Total number of minutes spent on Facebook each month: 640 Million. http://www.statisticbrain.com/facebook-statistics/. Last accessed on 03/07/2016

References

  1. Boldi, P., Vigna, S.: The webgraph framework i: compression techniques. In: Proceedings of the 13th international conference on World Wide Web, pp 595–602. ACM (2004)

  2. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web. Comput. Netw. 33(1), 309–320 (2000)

    Article  Google Scholar 

  3. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the Web. Computer Networks and ISDN Systems 29(8), 1157–1166 (1997)

    Article  Google Scholar 

  4. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 219–228. ACM (2009)

  5. Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM (991)

  6. Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense implicit communities in the Web graph. ACM Trans. Web (TWEB) 3(2), 7 (2009)

    Google Scholar 

  7. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: Grami: Frequent subgraph and pattern mining in a single large graph. Proceedings of the VLDB Endowment 7(7), 517–528 (2014)

    Article  Google Scholar 

  8. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 721–732 (2005)

  9. Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp, 518–529 (1999)

  10. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hernández, C., Navarro, G.: Compressed representations for Web and social graphs. Knowl. Inf. Syst. 40(2), 279–313 (2014)

    Article  Google Scholar 

  12. Jakawat, W., Favre, C., Loudcher, S.: Olap on information networks: A new framework for dealing with bibliographic data. In: New Trends in Databases and Information Systems, pp 361–370. Springer (2014)

  13. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 538–543. ACM (2002)

  14. Khan, K.U., Nawaz, W., Lee, Y.K.: Set-based unified approach for attributed graph summarization. In: Proceedings of Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on Social Computing and Networking (SocialCom) . IEEE (2014)

  15. Khan, K.U., Nawaz, W., Lee, Y.K.: Set-based approximate approach for lossless graph summarization. Computing 97(12), 1185–1207 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: VOG: summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia. doi:10.1137/1.9781611973440.11, pp 91–99 (2014)

  17. Koutra, D., Kang, U., Vreeken, J., Faloutsos, C.: Summarizing and understanding large graphs. Statistical Analysis and Data Mining: The ASA Data Science Journal 8(3), 183–202 (2015). doi:10.1002/sam.11267

    Article  MathSciNet  Google Scholar 

  18. LeFevre, K., Terzi, E.: Grass: Graph structure summarization. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, pp 454–465 (2010)

  19. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 177–187. ACM (2005)

  20. Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: Parallel simrank at scale. Proc. VLDB Endow 9(1), 24–35 (2015). doi:10.14778/2850469.2850472

    Article  Google Scholar 

  21. Liakos, P., Papakonstantinopoulou, K., Sioutis, M.: Pushing the envelope in graph compression. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp 1549–1558. ACM (2014)

  22. Lim, Y., Kang, U., Faloutsos, C.: Slashburn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26(12), 3077–3089 (2014)

    Article  Google Scholar 

  23. Lorrain, F., White, H.C.: Structural equivalence of individuals in social networks. J. Math. Sociol. 1(1), 49–80 (1971)

    Article  Google Scholar 

  24. Macropol, K., Singh, A.: 1–2. Proceedings of the VLDB Endowment 3, 693–702 (2010)

    Article  Google Scholar 

  25. Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp 419–432. ACM (2008)

  26. Nawaz, W., Han, Y., Khan, K.U., Lee, Y.K.: Personalized email community detection using collaborative similarity measure. arXiv:13061300(2013)

  27. Nawaz, W., Khan, K.U., Lee, Y.K.: Spore: shortest path overlapped regions and confined traversals towards graph clustering. Appl. Intell., 1–25 (2014a)

  28. Nawaz, W., Khan, K.U., Lee, Y.K., Lee, S.: Intra graph clustering using collaborative similarity measure. Distributed and Parallel Databases, 1–21 (2014b)

  29. Newman, M.E., Strogatz, S.H., Watts, D.J.: Random graphs with arbitrary degree distributions and their applications. Phys. rev. E 64(2), 026,118 (2001)

  30. Perozzi, B., Akoglu, L., Iglesias Sánchez, P., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1346–1355. ACM (2014)

  31. Qu, Q., Zhu, F., Yan, X., Han, J., Philip, S.Y., Li, H.: Efficient topological olap on information networks. In: Database Systems for Advanced Applications, pp 389–403. Springer (2011)

  32. Qu, Q., Liu, S., Jensen, C.S., Zhu, F., Faloutsos, C.: Interestingness-driven diffusion process summarization in dynamic networks. In: Springer, pp 597–613 (2014)

  33. Rajaraman, A., Ullman, J.D., Ullman, J.D., Ullman, J.D.: Mining of massive datasets, vol, 77. Cambridge University Press, Cambridge (2012)

    Google Scholar 

  34. Riondato, M., Garcia-Soriano, D., Bonchi, F.: Graph summarization with quality guarantees. In: 2014 IEEE International Conference on Data Mining (ICDM), pp 947–952. IEEE (2014)

  35. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  36. Ruan, Y., Fuhry, D., Parthasarathy, S.: Efficient community detection in large networks using content and links. In: Proceedings of the 22nd international conference on world wide Web, International World Wide Web Conferences Steering Committee, pp, 1089–1098 (2013)

  37. Satuluri, V., Parthasarathy, S., Ruan, Y.: Local graph sparsification for scalable clustering. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp 721–732. ACM (2011)

  38. Schaeffer, S.E.: Graph clustering. Computer Science Review 1(1), 27–64 (2007)

    Article  MATH  Google Scholar 

  39. Seidman, S.B.: Network structure and minimum degree. Soc. Networks 5(3), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  40. Shah, N., Koutra, D., Zou, T., Gallagher, B., Faloutsos, C.: Timecrunch: Interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1055–1064. ACM (2015)

  41. Shi, L., Tong, H., Tang, J., Lin, C.: Flow-based influence graph visual summarization. In: 2014 IEEE International Conference on Data Mining (ICDM), pp 983–988. IEEE (2014)

  42. Shi, L., Tong, H., Tang, J., Lin, C.: Vegas: Visual influence graph summarization on citation networks. In: IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp 3417–3431 (2015)

  43. Silva, A., Meira, W. Jr, Zaki, M.J.: Mining attribute-structure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment 5(5), 466–477 (2012)

    Article  Google Scholar 

  44. Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 939–948. ACM (2010)

  45. Tian, Y., Hankins, R.A., Patel, J.M.: Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp 567–580. ACM (2008)

  46. Toivonen, H., Zhou, F., Hartikainen, A., Hinkka, A.: Compression of weighted graphs. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 965–973. ACM (2011)

  47. Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: A survey. arXiv:14082927 (2014)

  48. Yang, J., McAuley, J., Leskovec, J.: Community detection in networks with node attributes. In: 2013 IEEE 13th international conference on Data Mining (ICDM), pp 1151–1156. IEEE (2013)

  49. Yin, M., Wu, B., Zeng, Z.: Hmgraph olap: a novel framework for multi-dimensional heterogeneous network analysis. In: Proceedings of the fifteenth international workshop on Data warehousing and OLAP, pp 137–144. ACM (2012)

  50. Yu, W., Lin, X., Zhang, W., McCann, J.A.: Fast all-pairs simrank assessment on large graphs and bipartite domains. IEEE Trans. Knowl. Data Eng. 27 (7), 1810–1823 (2015). doi:10.1109/TKDE.2014.2339828

    Article  Google Scholar 

  51. Zhang, J., Hong, X., Peng, Z., Li, Q.: Nestedcube: Towards online analytical processing on information-enhanced multidimensional network. In: Web-Age Information Management, pp 128–139. Springer (2012)

  52. Zhao, P., Li, X., Xin, D., Han, J.: Graph cube: on warehousing and olap multidimensional networks. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp 853–864. ACM (2011)

  53. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2(1), 718–729 (2009)

    Article  Google Scholar 

  54. Zhu, F., Zhang, Z., Qu, Q.: A direct mining approach to efficient constrained graph pattern discovery. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp 821–832. ACM (2013)

Download references

Acknowledgment

This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No.2015R1A2A2A01008209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, K.U., Nawaz, W. & Lee, YK. Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20, 543–570 (2017). https://doi.org/10.1007/s11280-016-0388-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-016-0388-y

Keywords

Navigation