GUISE: a uniform sampler for constructing frequency histogram of graphlets

Rahman, Mahmudur; Bhuiyan, Mansurul Alam; Rahman, Mahmuda; Hasan, Mohammad Al

doi:10.1007/s10115-013-0673-3

GUISE: a uniform sampler for constructing frequency histogram of graphlets

Regular Paper
Published: 18 August 2013

Volume 38, pages 511–536, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Mahmudur Rahman¹,
Mansurul Alam Bhuiyan¹,
Mahmuda Rahman² &
…
Mohammad Al Hasan¹

673 Accesses
17 Citations
Explore all metrics

Abstract

Graphlet frequency distribution (GFD) has recently become popular for characterizing large networks. However, the computation of GFD for a network requires the exact count of embedded graphlets in that network, which is a computationally expensive task. As a result, it is practically infeasible to compute the GFD for even a moderately large network. In this paper, we propose Guise, which uses a Markov Chain Monte Carlo sampling method for constructing the approximate GFD of a large network. Our experiments on networks with millions of nodes show that Guise obtains the GFD with very low rate of error within few minutes, whereas the exhaustive counting-based approach takes several days.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable and exact sampling method for probabilistic generative graph models

Article 21 April 2018

Sebastian Moreno, Joseph J. Pfeiffer III & Jennifer Neville

SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

Guided sampling for large graphs

Article 18 March 2020

Muhammad Irfan Yousuf & Suhyun Kim

Notes

In [1], the term “graphlet” has also been used for describing wavelet decomposition of graphs, our work is not related to this definition.
GraphCrunch2 is a parallel library that uses all the available cores in a computer.
A triangle is a size-3 graphlet which is shown as \(g_2\) in Fig. 1; scalable algorithms are available for counting triangles in graphs with millions of vertices and edges.
Guise is an anagram of the bold letters in UnIform Sampling of GaphlEts.
We sometimes use graphlet to mean a specific embedding of a graphlet, if it is clear from the context of the discussion.
There is no order among the graphlets, so a line chart is probably not the most appropriate visual representation of a GFD; however, we found that visual comparison of two GFDs is easier using line charts.
In GFD, graphlet counts are compared in a logarithm scale; since, \(\log 0\) is undefined, we initialize the graphlet count with 1.
This is required only from a theoretical standpoint; in our experiment, we do not allocate any self-loop probability, unless needed.
http://snap.stanford.edu/data/index.html and http://www-personal.umich.edu/~mejn/netdata.

References

Azari Soufiani H, Airoldi EM (2012) Graphlet decomposition of a weighted network. ArXiv e-prints
Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet Google Scholar
Baumes J, Goldberg M, Magdon-ismail M, Wallace W (2004) Discovering hidden groups in communication networks. In: Proceedings of the 2nd NSF/NIJ symposium on intelligence and security informatics
Becchetti L, Boldi P, Castillo C, Gionis A (2008) Efficient semi-streaming algorithms for local triangle counting in massive graphs. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’08, pp 16–24. ACM, New York, NY, USA
Becchetti L, Boldi P, Castillo C, Gionis A (2010) Efficient algorithms for large-scale local triangle counting. ACM Trans Knowl Discov Data 4(3):13-1–13-28
Google Scholar
Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science 323:892–895
Article Google Scholar
Chen J, Hsu W, Lee ML, Ng SK (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’06, pp. 106–115
Chung RK (1997) Spectral graph theory. American Mathematical Society, Providence, RI
MATH Google Scholar
Coleman JS (1988) Social capital in the creation of human capital. Am J Sociol 94:S95–S120
Article Google Scholar
Duke R, Lefmann H, Rodl V (1995) A fast approximation algorithm for computing the frequencies of subgraphs in a given graph. SIAM J Comput 24(3): 598–620
Google Scholar
Eberle W, Holder L (2009) Graph-based approaches to insider threat detection. In: Proceedings of the 5th annual workshop on cyber security and information intelligence research: cyber security and information intelligence challenges and strategies
Eckmann JP, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the world wide web. Proc Natl Acad Sci USA 99(9):5825–5829
Article MathSciNet Google Scholar
Erdös P, Rènyi A (1959) On random graphs. Publicationes Mathematicae (Debrecen), vol 6, pp 290–297
Erdös P, Rènyi A (1960) On the evolution of random graphs. In: Publication of The Mathematical Institute of The Hungarian Academy of Sciences, pp 17–61
Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, SIGCOMM’99, pp 251–262 (1999)
Foucault Welles B, Van Devender A, Contractor N (2010) Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds. In: CHI’10 extended abstracts on human factors in computing systems, CHI EA’10, pp 4027–4032
Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Proceedings of the 11th annual international conference on research in computational molecular biology, RECOMB’07, pp 92–106
Guruswami V (2000) Rapidly mixing markov chains: a comparison of techniques. A Survey
Hasan MA, Zaki MJ (2011) A survey of link prediction in social networks. In: Aggarwal CC (ed) Social network data analytics, Springer, Science+Business Media, LLC, p 243. ISBN 978-1-4419-8461-6
Kashani Z, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari E, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinform 10(1):318
Article Google Scholar
Kashtan N, Itzkovitz S, Milo R, Alon U (2004) Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746–1758
Article Google Scholar
Kuchaiev O, Stevanović A, Hayes W, Pržulj N (2011) GraphCrunch 2: software tool for network modeling, alignment and clustering. BMC Bioinform 12(1):24
Article Google Scholar
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, KDD’05, pp 177–187
Lussier J, Bank J (2011) Local structure and evolution for cascade prediction. Stanford University Technical report
Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. Found Trends Theor Comput Sci 1:237–354
Article MathSciNet Google Scholar
Milenkovic T, Pržulj N (2008) Uncovering biological network function via graphlet degree signatures. Cancer Inform 6:257–273
Google Scholar
Motwani S, Raghavan P (1995) Randomize algorithms. Cambridge University Press, Cambridge, MA
Book Google Scholar
Omidi S, Schreiber F, Masoudi-nejad A (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet Syst 84(5):385–395
Article Google Scholar
Pržulj N (2010) Biological network comparison using graphlet degree distribution. Bioinformatics 26(6):853–854
Article Google Scholar
Pržulj N, Corneil DG, Jurisica I (2004) Modeling interactome: scale-free or geometric? Bioinformatics 20(18):3508–3515
Google Scholar
Pržulj N, Corneil DG, Jurisica I (2006) Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8):974–980
Article Google Scholar
Schreiber F, Schwobbermeyer H (2005) Frequency concepts and pattern detection for the analysis of motifs in networks. Trans Comput Syst Biol 3:89–104
MathSciNet Google Scholar
Shervashidze N, Vishwanathan SVN, Petri TH, Mehlhorn K, Borgwardt KM (2009) Efficient graphlet kernels for large graph comparison. In: van Dyk D, Welling M (eds) Proceedings of the twelfth international conference on artificial intelligence and statistics (AISTATS), JMLR: workshop and conference proceedings, vol 5, pp 488–495. CSAIL
Tyson JJ, Novak B (2010) Functional motifs in biochemical reaction networks. Annu Rev Phys Chem 61:219–240
Article Google Scholar
Vacic V, Lilia M. Iakoucheva SL, Radivojac P (2010) Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol 17:55–72
Google Scholar
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge
Book Google Scholar
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442
Google Scholar
Wernicke S, Rasche F (2006) FANMOD: a tool for fast network motif detection. Bioinformatics 22(9):1152–1153
Article Google Scholar
Zegura EW, Calvert KL, Donahoo MJ (1997) A quantitative comparison of graph-based models for internet topology. IEEE/ACM Trans Netw 5(6):770–783
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Indiana University–Purdue University, Indianapolis, IN, USA
Mahmudur Rahman, Mansurul Alam Bhuiyan & Mohammad Al Hasan
Department of Computer Science, Syracuse University, Syracuse, NY, USA
Mahmuda Rahman

Authors

Mahmudur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Mansurul Alam Bhuiyan
View author publications
You can also search for this author in PubMed Google Scholar
Mahmuda Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Al Hasan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Al Hasan.

Additional information

This research is supported by Mohammad Hasan’s NSF CAREER Award (IIS-1149851).

Mahmudur Rahman and Mansurul Alam Bhuiyan contributed equally for this research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahman, M., Bhuiyan, M.A., Rahman, M. et al. GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38, 511–536 (2014). https://doi.org/10.1007/s10115-013-0673-3

Download citation

Received: 16 March 2013
Revised: 25 May 2013
Accepted: 14 July 2013
Published: 18 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10115-013-0673-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GUISE: a uniform sampler for constructing frequency histogram of graphlets

Abstract

Access this article

Similar content being viewed by others

Scalable and exact sampling method for probabilistic generative graph models

SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

Guided sampling for large graphs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Scalable and exact sampling method for probabilistic generative graph models

SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

Guided sampling for large graphs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation