Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3683))

Abstract

This paper discusses a general algorithm for the discovery of motif combinations. From a large number of input motifs, discovered by any single motif discovery tool, our algorithm discovers sets of motifs that occur together in sequences from a positive data set. Generality is achieved by working on occurrence sets of the motifs. The output of the algorithm is a Pareto front of composite motifs with respect to both support and significance. We have used our method to discover composite motifs for the AlkB family of homologues. Some of the returned motifs confirm previously known conserved patterns, while other sets of strongly conserved patterns may characterize subfamilies of AlkB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its fuction in automatic sequence interpretation. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., 2nd edn., pp. 53–61 (1994)

    Google Scholar 

  2. Neuwald, A.F., Liu, J.S., Lawrence, C.E.: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 4, 1618–1632 (1995)

    Article  Google Scholar 

  3. Attwood, T.K., Beck, M.E., Bleasby, A.J., Parry-Smith, D.J.: PRINTS - a database of protein motif fingerprints. Nucleic Acids Res. 22, 3590–3596 (1994)

    Google Scholar 

  4. van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000)

    Article  Google Scholar 

  5. Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl. 1), S354–S363 (2002)

    Google Scholar 

  6. Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol. 7, 345–362 (2000)

    Article  Google Scholar 

  7. Brazma, A., Vilo, J., Ukkonen, E., Valtonen, K.: Data mining for regulatory elements in yeast genome. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 5, pp. 65–74 (1997)

    Google Scholar 

  8. Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B.D., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  Google Scholar 

  9. Hart, R.K., Royyuru, A.K., Stolovitzky, G., Califano, A.: Systematic and fully automated identification of protein sequence patterns. J. Comput. Biol. 7, 585–600 (2000)

    Article  Google Scholar 

  10. Toivonen, H.: Discovery of Frequent Patterns in Large Data Collections. PhD thesis, University of Helsinki (1996)

    Google Scholar 

  11. Aravind, L., Koonin, E.V.: The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome Biol. 2 (2001) RESEARCH0007

    Google Scholar 

  12. Falnes, P.O., Johansen, R.F., Seeberg, E.: AlkB-mediated oxidative demethylation reverses DNA damage in Escherichia coli. Nature 419, 178–182 (2002)

    Article  Google Scholar 

  13. Drabløs, F., Feyzi, E., Aas, P.A., Vaagboe, C.B., Kavli, B., Bratlie, M.S., Peña- Diaz, J., Otterlei, M., Slupphaug, G., Krokan, H.E.: Alkylation damage in DNA and RNA–repair mechanisms and medical significance. DNA Repair 3, 1389–1407 (2004)

    Article  Google Scholar 

  14. Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14, 55–67 (1998)

    Article  Google Scholar 

  15. Aas, P., Otterlei, M., Falnes, P., Vaagboe, C., Skorpen, F., Akbari, M., Sundheim, O., Bjoras, M., Slupphaug, G., Seeberg, E., Krokan, H.: Human and bacterial oxidative demethylases repair alkylation damage in both RNA and DNA. Nature 421, 859–863 (2003)

    Article  Google Scholar 

  16. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M., Estreicher, A., et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sandve, G.K., Drabløs, F. (2005). Generalized Composite Motif Discovery. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11553939_108

Download citation

  • DOI: https://doi.org/10.1007/11553939_108

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28896-1

  • Online ISBN: 978-3-540-31990-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics