Abstract
Recently, a new compact representation for suboptimal alignments was proposed by Naor and Brutlag (1993). The kernel of that representation is a minimal directed acyclic graph (DAG) containing all suboptimal alignments. In this paper, we propose a method that computes such a DAG in space linear to the graph size. Let F be the area of the region of the dynamicprogramming matrix bounded by the suboptimal alignments and W the maximum width of that region. For two sequences of lengths M and N, it is shown that the worst-case running time is O(MN+F log log W). To exploit the computed DAG, we employ a variant of Aho-Corasick pattern matching machine (Aho and Corasick, 1975) to locate all occurrences of specified patterns, and then find a path in the DAG that maximizes the sum of the scores of the non-overlapping patterns occurring in it. An example illustrates the utility.
This work was supported by grant RO1 LM05110 from the National Library of Medicine.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A. V. and Corasick, M. J. (1975) Efficient string matching: an aid to bibliographic search. Comm. ACM, 18, 333–340.
Altschul, S. F. and Lipman, D. J. (1989) Trees, stars, and multiple biological sequence alignment. SIAM J. Appl. Math., 49, 197–209.
Carrillo, H., and Lipman, D. J. (1988) The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48, 1073–1082.
Chao, K.-M., Hardison, R. C. and Miller, W. (1993) Locating well-conserved regions within a pairwise alignment. CABIOS, 9, 387–396.
Gumucio, D. L., Shelton, D. A., Bailey, W. J., Slightom, J. L., and Goodman, M. (1993) Phylogenetic footprinting reveals unexpected complexity in trans factor binding upstream from the ε-globin gene. Proc. Natl. Acad. Sci. USA, 90, 6018–6022.
Hardison, R. C., Chao, K.-M., Adamkiewicz, M., Price, D., Jackson, J., Zeigler, T., Stojanovic, N., and Miller, W. (1993) Positive and negative regulatory elements of the rabbit embryonic ε-globin gene revealed by an improved multiple alignment program and functional analysis. DNA Sequence, 4, 163–176.
Hirschberg, D. S. (1975) A linear space algorithm for computing maximal common subsequences. Comm. ACM, 18, 341–343.
Kececioglu, J. D. (1989) Notes on a multiple sequence alignment cost bound of Carrillo and Lipman. Manuscript.
Lawerence, C. B., Goldman, D. A., and Hood, R. T. (1986) Optimized homology searches of the gene and protein sequence data banks. Bull. Math. Biol., 48, 569–583.
Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. CABIOS, 4, 11–17.
Myers, E. W. and Miller, W. (1989) Approximate matching of regular expressions. Bull. Math. Biol., 51, 5–37.
Naor, D. and Brutlag, D. (1993) On suboptimal alignments of biological sequences. In Proceedings of the 4th Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, 684, 179–196.
Saqi, M. and Sternberg, M. (1991) A simple method to generate non-trivial alternative alignments of protein sequences. J. Mol. Biol., 219, 727–732.
Tagle, D. A., Koop, B. F., Goodman, M., Slightom, J., Hess, D. L. and Jones, R. T. (1988) Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol., 203, 7469–7480.
Vingron, M. and Argos, P. (1990) Determination of reliable regions in protein sequence alignment Protein Engineering, 3, 565–569.
Waterman, M., and Byers, T. (1985) A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Math. Biosciences, 77, 179–185.
Zuker, M. (1991) Suboptimal sequence alignment in molecular biology: alignment with error analysis. J. Mol. Biol., 221, 403–420.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chao, K.M. (1994). Computing all suboptimal alignments in linear space. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_3
Download citation
DOI: https://doi.org/10.1007/3-540-58094-8_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive