Skip to main content

Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Abstract

We give a probabilistic algorithm for Consensus Sequence, a NP-complete subproblem of motif recognition, that can be described as follows: given set of l-length sequences, determine if there exists a sequence that has Hamming distance at most d from every sequence. We demonstrate that distance between a randomly selected majority sequence and a consensus sequence decreases as the size of the data set increases. Applying our probabilistic paradigms and insights to motif recognition we develop pMCL-WMR, a program capable of detecting motifs in large synthetic and real-genomic data sets. Our results show that detecting motifs in data sets increases in ease and efficiency when the size of set of sequence increases, a surprising and counter-intuitive fact that has significant impact on this deeply-investigated area.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bollobas, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random. Struct. Algor. 31, 3–122 (2007)

    Article  Google Scholar 

  2. Boucher, C., Brown, D., Church, P.: A graph clustering approach to weak motif recognition. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 149–160. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9(3), 225–242 (2002)

    Article  CAS  PubMed  Google Scholar 

  4. Chin, F.Y.L., Leung, C.M.: Voting algorithms for discovering long motifs. In: Proc. APBC 2005, pp. 261–271 (2005)

    Google Scholar 

  5. Crawford, J.M., Auton, L.D.: Experimental results on the crossover point in satisfiability problems. In: Proc. AAAI 1993, pp. 21–27 (1993)

    Google Scholar 

  6. Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1), 354–363 (2002)

    Article  Google Scholar 

  7. Evans, P.A., Smith, A., Wareham, H.T.: On the complexity of finding common approximate substrings. Th. Comp. Sci. 306, 407–430 (2003)

    Article  Google Scholar 

  8. Feng, W., Wang, Z., Wang, L.: Identification of distinguishing motifs. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 253–264. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30, 113–119 (1997)

    Article  Google Scholar 

  10. Davila, J., Balla, S.: Rajasekaran. Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 544–552 (2007)

    Article  CAS  Google Scholar 

  11. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comp. and Sys. Sci. 65(1), 73–96 (2002)

    Article  Google Scholar 

  12. Koutsoupias, E., Papadimitriou, C.H.: On the greedy algorithm for satisfiability. Inform. Process. Lett. 43, 53–55 (1992)

    Article  Google Scholar 

  13. Motwani, R., Raghavan, R.: Randomized Algorithms. Cambridge University Press, New York (1995)

    Book  Google Scholar 

  14. Papadimitriou, C.H.: On selecting a satisfying truth assignment. In: Proc. FOCS 1991, pp. 163–169 (1991)

    Google Scholar 

  15. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)

    Article  Google Scholar 

  16. Pennock, D.M., Stout., Q.F.: Exploiting a theory of phase transitions in three-satisfiability problems. In: Proc. AAAI 1996, pp. 253–258 (1996)

    Google Scholar 

  17. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. ISMB 2000, pp. 344–354 (2000)

    Google Scholar 

  18. Rajasekaran, S., Balla, S., Huang, C.H.: Exact algorithms for the planted motif problem. J. Comp. Bio. 12(8), 1117–1128 (2005)

    Article  CAS  Google Scholar 

  19. Sagot, M.-F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 374–390. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  20. Schöning, U.: A probabilistic algorithm for k-SAT and constraint satisfaction problems. In: Proc. FOCS 1999, pp. 410–414 (1999)

    Google Scholar 

  21. Sze, S., Lu, S., Chen, J.: Integrating sample-driven and pattern-driven approaches in motif finding. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 438–449. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005)

    Article  CAS  PubMed  Google Scholar 

  23. Wingender, E., Dietze, P., Karas, H., Knüppel, R.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24(1), 238–241 (1996)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boucher, C., Brown, D.G. (2009). Detecting Motifs in a Large Data Set: Applying Probabilistic Insights to Motif Finding. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00727-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00726-2

  • Online ISBN: 978-3-642-00727-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics