Algorithms for Finding Maximal-Scoring Segment Sets

Csűrös, Miklós

doi:10.1007/978-3-540-30219-3_6

Miklós Csűrös²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

597 Accesses
4 Citations

Abstract

We examine the problem of finding maximal-scoring sets of disjoint regions in a sequence of scores. The problem arises in DNA and protein segmentation, and in post-processing of sequence alignments. Our key result states a simple recursive relationship between maximal-scoring segment sets. The statement leads to an algorithm that finds such a k-set of segments in a sequence of length n in O(nk) time. We describe linear-time algorithms for finding optimal segment sets using different criteria for choosing k, as well as an algorithm for finding an optimal set of k segments in O(nlog n) time, independently of k. We apply our methods to the identification of non-coding RNA genes in thermophiles.

Work supported by NSERC grant 250391-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bentley, J.: Programming pearls: algorithm design techniques. Comm. ACM 27, 865–873 (1984)
Article Google Scholar
Braun, J.V., Müller, H.G.: Statistical methods for DNA sequence segmentation. Statist. Sci. 13, 142–162 (1998)
Article MATH Google Scholar
Karlin, S., Brendel, V.: Chance and significance in protein and DNA analysis. Science 257, 39–49 (1992)
Article Google Scholar
Fu, Y.X., Curnow, R.N.: Maximum likelihood estimation of multiple change points. Biometrika 77, 563–573 (1990)
Article MATH MathSciNet Google Scholar
Li, W., Bernaola-Galván, P., Haghighi, F., Grosse, I.: Applications of recursive segmentation to the analysis of DNA sequences. Comput. Chem. 26, 491–510 (2002)
Article Google Scholar
Ruzzo, W.L., Tompa, M.: A linear time algorithm for finding all maximal scoring subsequences. In: Proc. 7th Intl. Conf. Intelligent Systems in Molecular Biology, pp. 234–241. AAAI Press, Menlo Park (1999)
Google Scholar
Klein, R.J., Misulovin, Z., Eddy, S.R.: Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Natl. Acad. Sci. USA 99, 7542–7547 (2002)
Article Google Scholar
Churchill, G.A.: Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51, 79–94 (1989)
MATH MathSciNet Google Scholar
Zhang, Z., Berman, P., Wiehe, T., Miller, W.: Post-processing long pairwise alignments. Bioinformatics 15, 1012–1019 (1999)
Article Google Scholar
Barron, A., Rissanen, J., Yu, B.: The Minimum Description Length principle in coding and modeling. IEEE Trans. Inform. Theory 44, 2743–2760 (1998)
Article MATH MathSciNet Google Scholar
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Article MATH Google Scholar
Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high-scoring segments from molecular sequences. Ann. Statist. 18, 571–581 (1990)
Article MATH MathSciNet Google Scholar
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)
Article Google Scholar
Schattner, P.: Searching for RNA genes using base composition statistics. Nucleic Acids Res 30, 2076–2082 (2002)
Article Google Scholar
Galtier, N., Lobry, J.: Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in Prokaryotes. J. Mol. Evol. 44, 632–636 (1997)
Article Google Scholar
Wang, H.C., Hickey, D.A.: Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes. Nucleic Acids Res. 30, 2501–2507 (2002)
Article Google Scholar
Bao, Q., et al.: A complete sequence of the T. tengcongensis genome. Genome Res. 12, 689–700 (2002)
Article Google Scholar
Lowe, T.M., Eddy, S.R.: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997)
Article Google Scholar
Waters, E., et al.: The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA 100 (2003)
Google Scholar
Kawarabayashi, Y., et al.: Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7. DNA Research 8, 123–140 (2001)
Article Google Scholar
Brown, J.W.: The ribonuclease P database. Nucleic Acids Res. 27, 314 (1999)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Département d’informatique et de recherche opérationnelle, Université de Montréal, C.P. 6128 succ. Centre-Ville, Montréal, Québec, H3C 3J7, Canada
Miklós Csűrös

Authors

Miklós Csűrös
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Computational Biology Unit, HIB, University of Bergen, 5020, Bergen, Norway
Inge Jonassen
Department of Biology,, Penn Center for Bioinformatics, Penn Genomics Institute, 415 S. University Ave., PA 19104, Philadelphia, USA
Junhyong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Csűrös, M. (2004). Algorithms for Finding Maximal-Scoring Segment Sets. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-30219-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics