Skip to main content

Composition Alignment

  • Conference paper
Algorithms in Bioinformatics (WABI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Included in the following conference series:

Abstract

In this paper, we develop a new approach for analyzing DNA sequences in order to detect regions with similar nucleotide composition. Our algorithm, which we call composition alignment or, more whimsically, scrambled alignment, employs the mechanisms of string matching and string comparison yet avoids the overdependence of those methods on position-by-position matching. In composition alignment, we extend the matching concept to composition matching. Two strings have a composition match if their lengths are equal and they have the same nucleotide content.

We define the composition alignment problem and give a dynamic programming solution. We explore several composition match weighting functions and show that composition alignment with one class of these can be computed in O(nm) time, the same as for standard alignment. We discuss statistical properties of composition alignment scores and demonstrate the ability of the algorithm to detect regions of similar composition in eukaryotic promoter sequences in the absence of detectable similarity through standard alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Aumann, Y., Landau, G., Lewenstein, M., Lewenstein, N.: Pattern matching with swaps. J. Algorithms 37, 247–266 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Amir, A., Cole, R., Hariharan, R., Lewenstein, M., Porat, E.: Overlap matching. In: Proc. 12th ACM-SIAM Sym. on Discrete Algorithms, pp. 279–288 (2001)

    Google Scholar 

  3. Amir, A., Lewenstein, M., Porat, E.: Approximate swapped matching. Information Processing Letters 83, 33–39 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Arratia, R., Waterman, M.: A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Prob. 4, 200–225 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  5. Benham, C.J.: Duplex destabilization in superhelical DNA is predicted to occur at specific transcriptional regulatory regions. J. Mol. Biol. 255, 425–434 (1996)

    Article  Google Scholar 

  6. Benham, C.J.: The topologically driven strand separation transition in DNAmethods of analysis and biological significance. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 47, 173–198 (1999)

    MathSciNet  Google Scholar 

  7. Bernardi, G.: The isochore organization of the human genome. Annu. Rev. Genet. 23, 637–661 (1989)

    Article  Google Scholar 

  8. Bernardi, G.: The human genome: Organization and evolutionary history. Annu. Rev. Genet. 29, 445–476 (1995)

    Article  MathSciNet  Google Scholar 

  9. Bucher, P.: Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 212, 563–578 (1990)

    Article  Google Scholar 

  10. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  11. Doerfler, W.: DNA methylation and gene activity. Ann. Rev. Biochem. 52, 93–124 (1983)

    Article  Google Scholar 

  12. Felsenfeld, G., McGhee, J.: Methylation and gene activity (1982)

    Google Scholar 

  13. Garden, M.G., Frommer, M.: CpG islands in vertebrate genomes. J.Mol. Biol. 196, 261–282 (1987)

    Article  Google Scholar 

  14. Goodsell, D.S., Dickerson, R.E.: Bending and curvature calculations in B-DNA. Nucleic Acids Research 22, 5497–5503 (1994)

    Article  Google Scholar 

  15. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  16. Heinemeyer, T., Chen, X., Karas, H., Kel, A., Kel, O., Liebich, I., Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E.: Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms. Nucleic Acids Res. 27, 318–322 (1999)

    Article  Google Scholar 

  17. Karlin, S., Altschul, S.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  18. Koo, H.-S., Wu, H.-M., Crothers, D.M.: DNA bending at adenine - thymine tracts. Nature 320, 501–506 (1986)

    Article  Google Scholar 

  19. Lewis, M., Chang, G., Horton, N.C., Kercher, M.A., Pace, H.C., Schumacher, M.A., Brennan, R.G., Lu, P.: Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271, 1247–1254 (1996)

    Article  Google Scholar 

  20. Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theor. 37, 145–151 (1991)

    Article  MATH  Google Scholar 

  21. Lowrance, R., Wagner, R.A.: An extension of the string-to-string correction problem. JACM 22, 177–183 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  22. Needleman, S., Wunch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  23. Périer, R., Praz, V., Junier, T., Bonnard, C., Bucher, P.: The Eukaryotic Promoter Database (EPD). Nucleic Acids Research 28, 302–303 (2000)

    Article  Google Scholar 

  24. Schultz, S.C., Shields, G.C., Steitz, T.A.: Crystal structure of a CAP-DNA complex: The DNA is bent by 90 degrees. Science 253, 1001–1007 (1991)

    Article  Google Scholar 

  25. Smit, A.: The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 6, 743–748 (1996)

    Article  Google Scholar 

  26. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  27. Vingron, M., Waterman, M.: Sequence alignment and penalty choice: review of concepts, case studies and implications. J. Mol. Biol. 235, 1–12 (1994)

    Article  Google Scholar 

  28. Wagner, R.A.: On the complexity of the extended string-to-string correction problem. In: Proceedings 7th ACM STOC, pp. 218–223 (1975)

    Google Scholar 

  29. Waterman, M., Gordon, L., Arratia, R.: Phase transitions in sequence matches and nucleic acid structure. Proc. Natl. Acad. Sci. USA 84, 1239–1243 (1987)

    Article  MathSciNet  Google Scholar 

  30. Yeramian, E.: Genes and the physics of the DNA double-helix. Gene 255, 139–150 (2000)

    Article  Google Scholar 

  31. Yeraminan, E., Bonnefoy, S., Langsley, G.: Physics-based gene identification: proof of concept for Plasmodium falciparum. Bioinformatics 18, 190–193 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Benson, G. (2003). Composition Alignment. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics