Skip to main content
Log in

Fast analytical methods for finding significant labeled graph motifs

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, ACM, New York, pp 36–43

  • Ahmed NK, Neville J, Rossi RA, Duffield NG, Willke TL (2017) Graphlet decomposition: framework, algorithms, and applications. Knowl Inf Syst 50(3):689–722

    Article  Google Scholar 

  • Ashburner M, Ball CA, Blake JA (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29

    Article  Google Scholar 

  • Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  • Batagelj V, Mrvar M, Zavesnik M (2002) Network analysis of dictionaries. In: Language technologies, pp 135–142

  • Bindea G, Mlecnik B, Hackl H (2009) ClueGO: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093

    Article  Google Scholar 

  • Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22(18):2283–2290

    Article  Google Scholar 

  • Chen J, Hsu W, Lee ML, Ng S (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 106–115

  • Chung F, Lu L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882

    Article  MathSciNet  MATH  Google Scholar 

  • Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183

    Article  MathSciNet  Google Scholar 

  • Davis M, Liu W, Miller P, Hunter RF, Kee F (2014) Agwan: a generative model for labelled, weighted graphs. In: New frontiers in mining complex patterns: second international workshop, NFMCP 2013, pp 181–200

  • De Domenico M, Omodei E, Arenas A (2016) Quantifying the diaspora of knowledge in the last century. Appl Netw Sci 1:15

    Article  Google Scholar 

  • Durak N, Pinar A, Kolda TG, Seshadhri C (2012) Degree relations of triangles in real-world networks and graph models. In: Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM’12), pp 1712–1716

  • Erdös P, Rényi A (1959) On random graphs. Publ Math 6:290–297

    Google Scholar 

  • Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Kim M, Leskovec J (2011) Modeling social networks with node attributes using the multiplicative attribute graph model. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence, pp 400–409

  • Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York

    MATH  Google Scholar 

  • Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proceedings of the international symposium on string processing and information retrieval, vol 2476, pp. 1–10

  • Maere S, Heymans K, Kuiper M (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449

    Article  Google Scholar 

  • Meira LAA, Maximo VR, Fazenda AL, Conceicao AFD (2014) Acc-Motif: Accelerated Network Motif Detection. IEEE/ACM Trans Comput Biol Bioinform 11(5):853–862

    Article  Google Scholar 

  • Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  Google Scholar 

  • Milo R, Kashtan N, Itzkovitz S (2004) On the uniform generation of random graphs with prescibed degree sequences. arXiv:cond-mat/0312028

  • Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64:026118

    Article  Google Scholar 

  • Nowicki K, Snijders T (2001) Estimation and prediction for stochastic block structures. J Am Stat Assoc 96:1077–1087

    Article  MATH  Google Scholar 

  • Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. http://toreopsahl.com/2011/08/12

  • Park J, Newman M (2003) The origin of degree correlations in the internet and other networks. Phys Rev E 68:026112

    Article  Google Scholar 

  • Park J, Newman MEJ (2004) Statistical mechanics of networks. Phys Rev E 70(6):066117

    Article  MathSciNet  Google Scholar 

  • Pfeiffer III JJ, Moreno S, La Fond T, Neville J, Gallagher B (2014) Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on world wide web, pp 831–842

  • Picard F, Daudin JJ, Koskas M (2008) Assessing the exceptionality of network motifs. J Comput Biol 15(1):1–20

    Article  MathSciNet  Google Scholar 

  • Prasad TSK, Goel R, Kandasamy K, Keerthikumar S (2009) Human protein reference database—2009 update. Nucleic Acids Res 37(1):D767–D772

    Article  Google Scholar 

  • Prill R, Iglesias PA, Levchenko A (2005) Dynamic properties of network motifs contribute to biological network organization. PLoS Biol 3(11):e343

    Article  Google Scholar 

  • Ribeiro P, Silva F (2014) G-Tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28(2):337–377

    Article  MathSciNet  MATH  Google Scholar 

  • Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Gldener U, Mannhaupt G, Mnsterktter M, Mewes HW (2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32(18):5539–5545

    Article  Google Scholar 

  • Schbath S, Lacroix V, Sagot MF (2009) Assessing the exceptionality of coloured motifs in networks. J Bioinform Syst Biol 2009(1):616234

    Article  Google Scholar 

  • Seshadhri C, Kolda TG, Pinar A (2012) Community structure and scale-free collections of Erdos–Renyi graphs. Phys Rev E 85(5):056109

    Article  Google Scholar 

  • Shen-Orr SS, Milo R, Mangan S (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31:64–68

    Article  Google Scholar 

  • Sinha A, Shen Z, Song Y, Ma H, Eide D, Hsu B, Wang K (2015) An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th international conference on world wide web (WWW 15 Companion), pp 243–246

  • Squartini T, Garlaschelli D (2011) Analytical maximum-likelihood method to detect patterns in real networks. New J Phys 13(8):083001

    Article  Google Scholar 

  • Varshney LR, Chen BL, Paniagua E (2011) Structural properties of the Caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066

    Article  Google Scholar 

  • von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417:399–403

    Article  Google Scholar 

  • Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Simone Severini for insightful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Pulvirenti.

Additional information

Responsible editor: Srinivasan Parthasarathy.

This work has been partially supported by the U.S. National Science Foundation and National Institutes of Health under Grants NSF: MCB-1158273, IOS-1339362, MCB-1412232, MCB-1355462, IOS-0922738, MCB-0929338, and NIH: 2R01GM032877-25A1. This work has been also partially supported by the Italian MIUR Projects: PRISMA—CUP E61H12000140005 and CLARA—CUP E64G14000190008.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Micale, G., Giugno, R., Ferro, A. et al. Fast analytical methods for finding significant labeled graph motifs. Data Min Knowl Disc 32, 504–531 (2018). https://doi.org/10.1007/s10618-017-0544-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0544-8

Keywords

Navigation