Skip to main content

A Compact Mathematical Programming Formulation for DNA Motif Finding

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

In the motif finding problem one seeks a set of mutually similar subsequences within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find subsequences of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between subsequences come from a limited set of possibilities. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akutsu, T., Arimura, H., Shimozono, S.: On approximation algorithms for local multiple alignment. In: RECOMB, pp. 1–7 (2000)

    Google Scholar 

  2. Bafna, V., Lawler, E., Pevzner, P.A.: Approximation algorithms for multiple alignment. Theoretical Computer Science 182, 233–244 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bailey, T., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)

    Google Scholar 

  4. Chazelle, B., Kingsford, C., Singh, M.: A semidefinite programming approach to side-chain positioning with new rounding strategies. INFORMS J. on Computing 16, 380–392 (2004)

    Article  MathSciNet  Google Scholar 

  5. Cook, W., Cunningham, W., Pulleyblank, W., Schrijver, A.: Combinatorial Optimization. Wiley-Interscience, New York (1997)

    Google Scholar 

  6. Grötschel, M., Lovász, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization, 2nd edn. Springer, Berlin (1993)

    MATH  Google Scholar 

  7. Hertz, G., Stormo, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinf. 15, 563–577 (1999)

    Article  Google Scholar 

  8. Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)

    Article  Google Scholar 

  9. Kingsford, C., Chazelle, B., Singh, M.: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinf. 21, 1028–1039 (2005)

    Article  Google Scholar 

  10. Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  11. Lee, T., Rinaldi, N., Robert, F., Odom, D., Bar-Joseph, Z., Gerber, G., et al.: Transcriptional regulatory networks in S. cerevisiae. Science 298, 799–804 (2002)

    Article  Google Scholar 

  12. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Computer and Systems Sciences 65(1), 73–96 (2002)

    Article  MathSciNet  Google Scholar 

  13. Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comp. Bio. 7, 345–362 (2000)

    Article  Google Scholar 

  14. McGuire, A., Hughes, J., Church, G.: Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. Genome Res. 10, 744–757 (2000)

    Article  Google Scholar 

  15. Osada, R., Zaslavsky, E., Singh, M.: Comparative analysis of methods for representing and searching for transcription factor binding sites. Bioinf. 20, 3516–3525 (2004)

    Article  Google Scholar 

  16. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: ISMB, pp. 269–278 (2000)

    Google Scholar 

  17. Robison, K., McGuire, A., Church, G.: A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 Genome. J. Mol. Biol. 284, 241–254 (1998)

    Article  Google Scholar 

  18. Schuler, G., Altschul, S., Lipman, D.: A workbench for multiple alignment construction and analysis. Proteins 9(3), 180–190 (1991)

    Article  Google Scholar 

  19. Tavazoie, S., Hughes, J., Campbell, M., Cho, R., Church, G.: Systematic determination of genetic network architecture. Nat. Genetics 22(3), 281–285 (1999)

    Article  Google Scholar 

  20. Thompson, W., Rouchka, E., Lawrence, C.: Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 31, 3580–3585 (2003)

    Article  Google Scholar 

  21. Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotech. 23, 137–144 (2005)

    Article  Google Scholar 

  22. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. J. Comp. Bio. 1, 337–348 (1994)

    Article  Google Scholar 

  23. Zaslavsky, E., Singh, M.: Combinatorial Optimization Approaches to Motif Finding (submitted), also available as Princeton University Computer Science Dept. Technical Report TR-728-05

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kingsford, C., Zaslavsky, E., Singh, M. (2006). A Compact Mathematical Programming Formulation for DNA Motif Finding. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_22

Download citation

  • DOI: https://doi.org/10.1007/11780441_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics