Skip to main content

Algorithms for Finding Maximal-Scoring Segment Sets

  • Conference paper
Algorithms in Bioinformatics (WABI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

Abstract

We examine the problem of finding maximal-scoring sets of disjoint regions in a sequence of scores. The problem arises in DNA and protein segmentation, and in post-processing of sequence alignments. Our key result states a simple recursive relationship between maximal-scoring segment sets. The statement leads to an algorithm that finds such a k-set of segments in a sequence of length n in O(nk) time. We describe linear-time algorithms for finding optimal segment sets using different criteria for choosing k, as well as an algorithm for finding an optimal set of k segments in O(nlog n) time, independently of k. We apply our methods to the identification of non-coding RNA genes in thermophiles.

Work supported by NSERC grant 250391-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bentley, J.: Programming pearls: algorithm design techniques. Comm. ACM 27, 865–873 (1984)

    Article  Google Scholar 

  2. Braun, J.V., Müller, H.G.: Statistical methods for DNA sequence segmentation. Statist. Sci. 13, 142–162 (1998)

    Article  MATH  Google Scholar 

  3. Karlin, S., Brendel, V.: Chance and significance in protein and DNA analysis. Science 257, 39–49 (1992)

    Article  Google Scholar 

  4. Fu, Y.X., Curnow, R.N.: Maximum likelihood estimation of multiple change points. Biometrika 77, 563–573 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  5. Li, W., Bernaola-Galván, P., Haghighi, F., Grosse, I.: Applications of recursive segmentation to the analysis of DNA sequences. Comput. Chem. 26, 491–510 (2002)

    Article  Google Scholar 

  6. Ruzzo, W.L., Tompa, M.: A linear time algorithm for finding all maximal scoring subsequences. In: Proc. 7th Intl. Conf. Intelligent Systems in Molecular Biology, pp. 234–241. AAAI Press, Menlo Park (1999)

    Google Scholar 

  7. Klein, R.J., Misulovin, Z., Eddy, S.R.: Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl. Acad. Sci. USA 99, 7542–7547 (2002)

    Article  Google Scholar 

  8. Churchill, G.A.: Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79–94 (1989)

    MATH  MathSciNet  Google Scholar 

  9. Zhang, Z., Berman, P., Wiehe, T., Miller, W.: Post-processing long pairwise alignments. Bioinformatics 15, 1012–1019 (1999)

    Article  Google Scholar 

  10. Barron, A., Rissanen, J., Yu, B.: The Minimum Description Length principle in coding and modeling. IEEE Trans. Inform. Theory 44, 2743–2760 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  12. Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high-scoring segments from molecular sequences. Ann. Statist. 18, 571–581 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  13. Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)

    Article  Google Scholar 

  14. Schattner, P.: Searching for RNA genes using base composition statistics. Nucleic Acids Res 30, 2076–2082 (2002)

    Article  Google Scholar 

  15. Galtier, N., Lobry, J.: Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in Prokaryotes. J. Mol. Evol. 44, 632–636 (1997)

    Article  Google Scholar 

  16. Wang, H.C., Hickey, D.A.: Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes. Nucleic Acids Res. 30, 2501–2507 (2002)

    Article  Google Scholar 

  17. Bao, Q., et al.: A complete sequence of the T. tengcongensis genome. Genome Res. 12, 689–700 (2002)

    Article  Google Scholar 

  18. Lowe, T.M., Eddy, S.R.: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997)

    Article  Google Scholar 

  19. Waters, E., et al.: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA 100 (2003)

    Google Scholar 

  20. Kawarabayashi, Y., et al.: Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7. DNA Research 8, 123–140 (2001)

    Article  Google Scholar 

  21. Brown, J.W.: The ribonuclease P database. Nucleic Acids Res. 27, 314 (1999)

    Article  Google Scholar 

  22. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csűrös, M. (2004). Algorithms for Finding Maximal-Scoring Segment Sets. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30219-3_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23018-2

  • Online ISBN: 978-3-540-30219-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics