Fast analytical methods for finding significant labeled graph motifs

Micale, Giovanni; Giugno, Rosalba; Ferro, Alfredo; Mongiovì, Misael; Shasha, Dennis; Pulvirenti, Alfredo

doi:10.1007/s10618-017-0544-8

Fast analytical methods for finding significant labeled graph motifs

Published: 02 November 2017

Volume 32, pages 504–531, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Giovanni Micale¹,
Rosalba Giugno²,
Alfredo Ferro⁵,
Misael Mongiovì³,
Dennis Shasha⁴ &
…
Alfredo Pulvirenti ORCID: orcid.org/0000-0002-9764-0295⁵

834 Accesses
17 Citations
2 Altmetric
Explore all metrics

Abstract

Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithmic Tools for Understanding the Motif Structure of Networks

Motif detection speed up by using equations based on the degree sequence

Article 12 July 2016

Counting Motifs in the Entire Biological Network from Noisy and Incomplete Data

References

Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, ACM, New York, pp 36–43
Ahmed NK, Neville J, Rossi RA, Duffield NG, Willke TL (2017) Graphlet decomposition: framework, algorithms, and applications. Knowl Inf Syst 50(3):689–722
Article Google Scholar
Ashburner M, Ball CA, Blake JA (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Article Google Scholar
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Article MathSciNet MATH Google Scholar
Batagelj V, Mrvar M, Zavesnik M (2002) Network analysis of dictionaries. In: Language technologies, pp 135–142
Bindea G, Mlecnik B, Hackl H (2009) ClueGO: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093
Article Google Scholar
Chen J, Yuan B (2006) Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22(18):2283–2290
Article Google Scholar
Chen J, Hsu W, Lee ML, Ng S (2006) NeMoFinder: dissecting genome-wide protein–protein interactions with meso-scale network motifs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 106–115
Chung F, Lu L (2002) The average distances in random graphs with given expected degrees. Proc Natl Acad Sci 99(25):15879–15882
Article MathSciNet MATH Google Scholar
Daudin JJ, Picard F, Robin S (2008) A mixture model for random graphs. Stat Comput 18(2):173–183
Article MathSciNet Google Scholar
Davis M, Liu W, Miller P, Hunter RF, Kee F (2014) Agwan: a generative model for labelled, weighted graphs. In: New frontiers in mining complex patterns: second international workshop, NFMCP 2013, pp 181–200
De Domenico M, Omodei E, Arenas A (2016) Quantifying the diaspora of knowledge in the last century. Appl Netw Sci 1:15
Article Google Scholar
Durak N, Pinar A, Kolda TG, Seshadhri C (2012) Degree relations of triangles in real-world networks and graph models. In: Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM’12), pp 1712–1716
Erdös P, Rényi A (1959) On random graphs. Publ Math 6:290–297
Google Scholar
Johnson NL, Kotz S, Kemp AW (1992) Univariate discrete distributions, 2nd edn. Wiley, New York
MATH Google Scholar
Kim M, Leskovec J (2011) Modeling social networks with node attributes using the multiplicative attribute graph model. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence, pp 400–409
Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York
MATH Google Scholar
Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Proceedings of the international symposium on string processing and information retrieval, vol 2476, pp. 1–10
Maere S, Heymans K, Kuiper M (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449
Article Google Scholar
Meira LAA, Maximo VR, Fazenda AL, Conceicao AFD (2014) Acc-Motif: Accelerated Network Motif Detection. IEEE/ACM Trans Comput Biol Bioinform 11(5):853–862
Article Google Scholar
Milo R, Shen-Orr S, Itzkovitz S et al (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827
Article Google Scholar
Milo R, Kashtan N, Itzkovitz S (2004) On the uniform generation of random graphs with prescibed degree sequences. arXiv:cond-mat/0312028
Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64:026118
Article Google Scholar
Nowicki K, Snijders T (2001) Estimation and prediction for stochastic block structures. J Am Stat Assoc 96:1077–1087
Article MATH Google Scholar
Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. http://toreopsahl.com/2011/08/12
Park J, Newman M (2003) The origin of degree correlations in the internet and other networks. Phys Rev E 68:026112
Article Google Scholar
Park J, Newman MEJ (2004) Statistical mechanics of networks. Phys Rev E 70(6):066117
Article MathSciNet Google Scholar
Pfeiffer III JJ, Moreno S, La Fond T, Neville J, Gallagher B (2014) Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on world wide web, pp 831–842
Picard F, Daudin JJ, Koskas M (2008) Assessing the exceptionality of network motifs. J Comput Biol 15(1):1–20
Article MathSciNet Google Scholar
Prasad TSK, Goel R, Kandasamy K, Keerthikumar S (2009) Human protein reference database—2009 update. Nucleic Acids Res 37(1):D767–D772
Article Google Scholar
Prill R, Iglesias PA, Levchenko A (2005) Dynamic properties of network motifs contribute to biological network organization. PLoS Biol 3(11):e343
Article Google Scholar
Ribeiro P, Silva F (2014) G-Tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28(2):337–377
Article MathSciNet MATH Google Scholar
Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Gldener U, Mannhaupt G, Mnsterktter M, Mewes HW (2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32(18):5539–5545
Article Google Scholar
Schbath S, Lacroix V, Sagot MF (2009) Assessing the exceptionality of coloured motifs in networks. J Bioinform Syst Biol 2009(1):616234
Article Google Scholar
Seshadhri C, Kolda TG, Pinar A (2012) Community structure and scale-free collections of Erdos–Renyi graphs. Phys Rev E 85(5):056109
Article Google Scholar
Shen-Orr SS, Milo R, Mangan S (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31:64–68
Article Google Scholar
Sinha A, Shen Z, Song Y, Ma H, Eide D, Hsu B, Wang K (2015) An overview of Microsoft Academic Service (MAS) and applications. In: Proceedings of the 24th international conference on world wide web (WWW 15 Companion), pp 243–246
Squartini T, Garlaschelli D (2011) Analytical maximum-likelihood method to detect patterns in real networks. New J Phys 13(8):083001
Article Google Scholar
Varshney LR, Chen BL, Paniagua E (2011) Structural properties of the Caenorhabditis elegans neuronal network. PLoS Comput Biol 7(2):e1001066
Article Google Scholar
von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P (2002) Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417:399–403
Article Google Scholar
Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Simone Severini for insightful discussions.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Catania, Catania, 95125, Italy
Giovanni Micale
Department of Computer Science, University of Verona, Verona, Italy
Rosalba Giugno
CNR, Institute of Cognitive Sciences and Technologies, Catania, Italy
Misael Mongiovì
Courant Institute of Mathematical Science, New York University, New York, NY, USA
Dennis Shasha
Department of Clinical and Experimental Medicine, University of Catania, Catania, 95125, Italy
Alfredo Ferro & Alfredo Pulvirenti

Authors

Giovanni Micale
View author publications
You can also search for this author in PubMed Google Scholar
Rosalba Giugno
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Ferro
View author publications
You can also search for this author in PubMed Google Scholar
Misael Mongiovì
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Shasha
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Pulvirenti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Pulvirenti.

Additional information

Responsible editor: Srinivasan Parthasarathy.

This work has been partially supported by the U.S. National Science Foundation and National Institutes of Health under Grants NSF: MCB-1158273, IOS-1339362, MCB-1412232, MCB-1355462, IOS-0922738, MCB-0929338, and NIH: 2R01GM032877-25A1. This work has been also partially supported by the Italian MIUR Projects: PRISMA—CUP E61H12000140005 and CLARA—CUP E64G14000190008.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Micale, G., Giugno, R., Ferro, A. et al. Fast analytical methods for finding significant labeled graph motifs. Data Min Knowl Disc 32, 504–531 (2018). https://doi.org/10.1007/s10618-017-0544-8

Download citation

Received: 12 October 2016
Accepted: 22 October 2017
Published: 02 November 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10618-017-0544-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast analytical methods for finding significant labeled graph motifs

Abstract

Access this article

Similar content being viewed by others

Algorithmic Tools for Understanding the Motif Structure of Networks

Motif detection speed up by using equations based on the degree sequence

Counting Motifs in the Entire Biological Network from Noisy and Incomplete Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast analytical methods for finding significant labeled graph motifs

Abstract

Access this article

Similar content being viewed by others

Algorithmic Tools for Understanding the Motif Structure of Networks

Motif detection speed up by using equations based on the degree sequence

Counting Motifs in the Entire Biological Network from Noisy and Incomplete Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation