Skip to main content

gLabTrie: A Data Structure for Motif Discovery with Constraints

  • Chapter
  • First Online:
Graph Data Management

Abstract

Motif discovery is the problem of finding subgraphs of a network that appear surprisingly often. Each such subgraph may indicate a small-scale interaction feature in applications ranging from a genomic interaction network, a significant relationship involving rock musicians, or any other application that can be represented as a network. We look at the problem of constrained search for motifs based on labels (e.g. gene ontology, musician type to continue our example from above). This chapter presents a brief review of the state of the art in motif finding and then extends the gTrie data structure from Ribeiro and Silva (Data Min Knowl Discov 28(2):337–377, 2014b) to support labels. Experiments validate the usefulness of our structure for small subgraphs, showing that we recoup the cost of the index after only a handful of queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adami C, Qian J, Rupp M, Hintze A (2011) Information content of colored motifs in complex networks. Artif Life 17(4):375–390

    Article  Google Scholar 

  • Adamic LA, Glance N (2005) The political blogosphere and the 2004 u.s. election: divided they blog. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05. ACM, New York, pp 36–43

    Chapter  Google Scholar 

  • Adelson RM (1966) Compound Poisson distributions. Oper Res Q 17(1):73–75

    Article  Google Scholar 

  • Alon U (2007) Network motifs: theory and experimental approaches. Nat Rev Genet 8(6):450–461

    Article  Google Scholar 

  • Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarskis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29

    Article  Google Scholar 

  • Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Batagelj V, Mrvar A, Zaversnik M (2002) Network analysis of dictionaries. In: Language technologies, pp 135–142

    Google Scholar 

  • Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Pages F, Trajanoski Z, Galon J (2009) ClueGO: a cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8):1091–1093

    Article  Google Scholar 

  • Dimitropoulos X, Krioukov D, Huffaker B, Claffy K, Riley G (2005) Inferring AS relationships: dead end or lively beginning? In: Nikoletseas SE (ed) Experimental and efficient algorithms. Springer, Berlin, pp 113–125

    Chapter  Google Scholar 

  • Dimitropoulos XA, Krioukov DV, Riley GF, Claffy KC (2006) Revealing the autonomous system taxonomy: the machine learning approach. CoRR abs/cs/0604015

    Google Scholar 

  • Grochow JA, Kellis M (2007) Network motif discovery using subgraph enumeration and symmetry-breaking. In: Speed T, Huang H (eds) Research in computational molecular biology. Springer, Berlin, pp 92–106

    Chapter  Google Scholar 

  • Kashtan N, Alon U (2005) Spontaneous evolution of modularity and network motifs. Proc Natl Acad Sci 102(39):13773–13778

    Article  Google Scholar 

  • Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A (2009) Human protein reference database–2009 update. Nucleic Acids Res 37(Database issue):D767–772

    Article  Google Scholar 

  • Kurata H, Maeda K, Onaka T, Takata T (2014) BioFNet: biological functional network database for analysis and synthesis of biological systems. Brief Bioinform 15(5):699–709

    Article  Google Scholar 

  • Ley M (2002) The DBLP computer science bibliography: evolution, research issues, perspectives. In: Laender AHF, Oliveira AL (eds) String processing and information retrieval. Springer, Berlin, pp 1–10

    Google Scholar 

  • Maere S, Heymans K, Kuiper M (2005) BiNGO: a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448–3449

    Article  Google Scholar 

  • McKay BD (1981) Practical graph isomorphism. Congressus numerantium 30:45–87

    MathSciNet  MATH  Google Scholar 

  • Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

    Article  Google Scholar 

  • Milo R, Kashtan N, Itzkovitz S, Newman MEJ, Alon U (2003) On the uniform generation of random graphs with prescribed degree sequences. eprint arXiv:cond-mat/0312028

    Google Scholar 

  • Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64:026118

    Article  Google Scholar 

  • Opsahl T (2011) Why anchorage is not (that) important: binary ties and sample selection. https://toreopsahl.com/2011/08/12/

  • Picard F, Daudin JJ, Koskas M, Schbath S, Robin S (2008) Assessing the exceptionality of network motifs. J Comput Biol 15(1):1–20

    Article  MathSciNet  Google Scholar 

  • Prill RJ, Iglesias PA, Levchenko A (2005) Dynamic properties of network motifs contribute to biological network organization. PLOS Biol 3(11):e343

    Article  Google Scholar 

  • Ribeiro P, Silva F (2012) Querying subgraph sets with g-tries. In: Proceedings of the 2Nd ACM SIGMOD workshop on databases and social networks, DBSocial ’12. ACM, New York, pp 25–30

    Chapter  Google Scholar 

  • Ribeiro P, Silva F (2014a) Discovering colored network motifs. In: Contucci P, Menezes R, Omicini A, Poncela-Casasnovas J (eds) Complex networks V. Springer International Publishing, Cham, pp 107–118

    Chapter  Google Scholar 

  • Ribeiro P, Silva F (2014b) G-Tries: a data structure for storing and finding subgraphs. Data Min Knowl Discov 28(2):337–377

    Article  MathSciNet  Google Scholar 

  • Schbath S, Lacroix V, Sagot MF (2009) Assessing the exceptionality of coloured motifs in networks. EURASIP J Bioinform Syst Biol 2009:3:1–3:9

    Article  Google Scholar 

  • Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31(1):64–68

    Article  Google Scholar 

  • SolĂ© RV, Pastor-Satorras R, Smith E, Kepler TB (2002) A model of large-scale proteome evolution. Adv Complex Syst 05(01):43–54

    Article  Google Scholar 

  • Wernicke S (2006) Efficient detection of network motifs. IEEE/ACM Trans Comput Biol Bioinform 3(4):347–359

    Article  Google Scholar 

  • Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings - 2002 IEEE international conference on data mining. ICDM 2002, pp 721–724

    Google Scholar 

Download references

Acknowledgements

Shasha’s work has been partially supported by an INRIA International Chair and the U.S. National Science Foundation under grants MCB-1412232, IOS-1339362, MCB-1355462, MCB-1158273, IOS-0922738, and MCB-0929339. This support is greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dennis Shasha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

MongiovĂ­, M., Micale, G., Ferro, A., Giugno, R., Pulvirenti, A., Shasha, D. (2018). gLabTrie: A Data Structure for Motif Discovery with Constraints. In: Fletcher, G., Hidders, J., Larriba-Pey, J. (eds) Graph Data Management. Data-Centric Systems and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-96193-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96193-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96192-7

  • Online ISBN: 978-3-319-96193-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics