Skip to main content
Log in

GUISE: a uniform sampler for constructing frequency histogram of graphlets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose Guise, which uses a Markov Chain Monte Carlo sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that Guise obtains the GFD with very low rate of error within few minutes, whereas the exhaustive counting-based approach takes several days.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In  [1], the term “graphlet” has also been used for describing wavelet decomposition of graphs, our work is not related to this definition.

  2. GraphCrunch2 is a parallel library that uses all the available cores in a computer.

  3. A triangle is a size-3 graphlet which is shown as \(g_2\) in Fig. 1; scalable algorithms are available for counting triangles in graphs with millions of vertices and edges.

  4. Guise is an anagram of the bold letters in UnIform Sampling of GaphlEts.

  5. We sometimes use graphlet to mean a specific embedding of a graphlet, if it is clear from the context of the discussion.

  6. There is no order among the graphlets, so a line chart is probably not the most appropriate visual representation of a GFD; however, we found that visual comparison of two GFDs is easier using line charts.

  7. In GFD, graphlet counts are compared in a logarithm scale; since, \(\log 0\) is undefined, we initialize the graphlet count with 1.

  8. This is required only from a theoretical standpoint; in our experiment, we do not allocate any self-loop probability, unless needed.

  9. http://snap.stanford.edu/data/index.html and http://www-personal.umich.edu/~mejn/netdata.

References

  1. Azari Soufiani H, Airoldi EM (2012) Graphlet decomposition of a weighted network. ArXiv e-prints

  2. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  3. Baumes J, Goldberg M, Magdon-ismail M, Wallace W (2004) Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ symposium on intelligence and security informatics

  4. Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, pp 16–24. ACM, New York, NY, USA

  5. Becchetti L, Boldi P, Castillo C, Gionis A (2010) Efficient algorithms for large-scale local triangle counting. ACM Trans Knowl Discov Data 4(3):13-1–13-28

    Google Scholar 

  6. Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323:892–895

    Article  Google Scholar 

  7. Chen J, Hsu W, Lee ML, Ng SK (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp. 106–115

  8. Chung RK (1997) Spectral graph theory. American Mathematical Society, Providence, RI

    MATH  Google Scholar 

  9. Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:S95–S120

    Article  Google Scholar 

  10. Duke R, Lefmann H, Rodl V (1995) A fast approximation algorithm for computing the frequencies of subgraphs in a given graph. SIAM J Comput 24(3): 598–620

    Google Scholar 

  11. Eberle W, Holder L (2009) Graph-based approaches to insider threat detection. In: Proceedings of the 5th annual workshop on cyber security and information intelligence research: cyber security and information intelligence challenges and strategies

  12. Eckmann JP, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci USA 99(9):5825–5829

    Article  MathSciNet  Google Scholar 

  13. Erdös P, Rènyi A (1959) On random graphs. Publicationes Mathematicae (Debrecen), vol 6, pp 290–297

  14. Erdös P, Rènyi A (1960) On the evolution of random graphs. In: Publication of The Mathematical Institute of The Hungarian Academy of Sciences, pp 17–61

  15. Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM’99, pp 251–262 (1999)

  16. Foucault Welles B, Van Devender A, Contractor N (2010) Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI’10 extended abstracts on human factors in computing systems, CHI EA’10, pp 4027–4032

  17. Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the 11th annual international conference on research in computational molecular biology, RECOMB’07, pp 92–106

  18. Guruswami V (2000) Rapidly mixing markov chains: a comparison of techniques. A Survey

  19. Hasan MA, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, Springer, Science+Business Media, LLC, p 243. ISBN 978-1-4419-8461-6

  20. Kashani Z, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318

    Article  Google Scholar 

  21. Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758

    Article  Google Scholar 

  22. Kuchaiev O, Stevanović A, Hayes W, Pržulj N (2011) GraphCrunch 2: software tool for network modeling, alignment and clustering. BMC Bioinform 12(1):24

    Article  Google Scholar 

  23. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD’05, pp 177–187

  24. Lussier J, Bank J (2011) Local structure and evolution for cascade prediction. Stanford University Technical report

  25. Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1:237–354

    Article  MathSciNet  Google Scholar 

  26. Milenkovic T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257–273

    Google Scholar 

  27. Motwani S, Raghavan P (1995) Randomize algorithms. Cambridge University Press, Cambridge, MA

    Book  Google Scholar 

  28. Omidi S, Schreiber F, Masoudi-nejad A (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet Syst 84(5):385–395

    Article  Google Scholar 

  29. Pržulj N (2010) Biological network comparison using graphlet degree distribution. Bioinformatics 26(6):853–854

    Article  Google Scholar 

  30. Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515

    Google Scholar 

  31. Pržulj N, Corneil DG, Jurisica I (2006) Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8):974–980

    Article  Google Scholar 

  32. Schreiber F, Schwobbermeyer H (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. Trans Comput Syst Biol 3:89–104

    MathSciNet  Google Scholar 

  33. Shervashidze N, Vishwanathan SVN, Petri TH, Mehlhorn K, Borgwardt KM (2009) Efficient graphlet kernels for large graph comparison. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS), JMLR: workshop and conference proceedings, vol 5, pp 488–495. CSAIL

  34. Tyson JJ, Novak B (2010) Functional motifs in biochemical reaction networks. Annu Rev Phys Chem 61:219–240

    Article  Google Scholar 

  35. Vacic V, Lilia M. Iakoucheva SL, Radivojac P (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17:55–72

    Google Scholar 

  36. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge

    Book  Google Scholar 

  37. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442

    Google Scholar 

  38. Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153

    Article  Google Scholar 

  39. Zegura EW, Calvert KL, Donahoo MJ (1997) A quantitative comparison of graph-based models for internet topology. IEEE/ACM Trans Netw 5(6):770–783

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Al Hasan.

Additional information

This research is supported by Mohammad Hasan’s NSF CAREER Award (IIS-1149851).

Mahmudur Rahman and Mansurul Alam Bhuiyan contributed equally for this research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahman, M., Bhuiyan, M.A., Rahman, M. et al. GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38, 511–536 (2014). https://doi.org/10.1007/s10115-013-0673-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0673-3

Keywords

Navigation