Abstract
Affine gap penalties are generally considered appropriate for aligning DNA and protein sequences. (“Affine” means that a gap of length k is penalized α + k β, i.e., it costs α to open up a gap plus β for each symbol in the gap.) For certain applications, such as aligning a cDNA sequence with a genomic DNA sequence, it might be adequate to use the restricted affine gap penalties which penalize long gaps with a constant penalty. As it turns out, several techniques developed for solving the approximate string matching problem can be utilized to yield efficient algorithms for computing the optimal alignment with restricted affine gap penalties. In particular, efficient algorithms can be derived based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the cost tables. To speedup the computation, the q-gram paradigm can be used to locate the interval in the longer sequence that should be aligned with the shorter sequence. We have implemented the above methods in C on Sun workstations running SunOS Unix. Preliminary experiments show that these approaches are very promising for aligning a cDNA sequence with a genomic DNA sequence.
This work was supported in part by grant R01 LM05110 from the National Library of Medicine, National Institutes of Health, USA, and grant NSC86-2213-E-126-002 from the National Science Council, Taiwan.
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. (1990) A basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Baeza-Yates, R. A. and Gonnet, G. H. (1994) Fast string matching with mismatches. Information and Computation 108, 187–199.
Chang, W. I. and Lampe, J. (1992) Theoretical and empirical comparisons of approximate string matching algorithms. Combinatorial Pattern Matching '92, Lecture Notes in Computer Science, 172–181.
Chao, K.-M. (1994) Computing all suboptimal alignments in linear space. Combinatorial Pattern Matching '94, Lecture Notes in Computer Science 807,31–42.
Chao, K.-M. and Miller, W. (1995) Linear-space algorithms that build local alignments from fragments. Algorithmica 13, 106–134.
Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1995) A local alignment tool for very long DNA sequences. CABIOS 11, 147–153.
Chao, K.-M., Zhang, J., Ostell, J. and Miller, W. (1997) A tool for aligning very similar DNA sequences. CABIOS, 13, 75–80.
Crochemore, M., Czumaj, A., Gaasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W. and Rytter, W. (1994) Speeding up two string-matching algorithms. Algorithmica 12, 247–267.
Daniels, D. L., Plunkett, G., Burland, V. and Blattner, F. R. (1992) Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257, 771–778.
Dermouche, A. (1995) A fast algorithm for string matching with mismatches. Information Processing Letters 55, 105–110.
Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.
Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359–373.
Hardison, R. C., Chao, K.-M., Schwartz, S., Stojanovic, N., Ganetsky, M. and Miller, W. (1994) Globin Gene Server: a prototype E-mail database server featuring extensive multiple alignments and data compilation for electronic genetic analysis. Genomics 21, 344–353.
Huang, X. (1994) On global sequence alignment. CABIOS 10, 227–235.
Kim, J. Y. and Shawe-Taylor, J. (1992) An approximate string-matching algorithm. Theo. Comp. Sci. 92, 107–117.
Landau, G. M., Vishkin, U. and Nussinov, R. (1988) Locating alignments with k differences for nucleotide and amino acid sequences. CABIOS 4, 19–24.
Lewin, B. (1994) Genes V. Oxford University Press.
Myers, E. W. (1986) An O(ND) difference algorithm and its variations. Algorithmica 1, 251–266.
Myers, E. W. and Miller, W. (1988) Optimal alignments in linear space. CABIOS 4,11–17.
Myers, E. W. and Miller, W. (1989) Row replacement algorithms for screen editors. ACM Trans. Program. Lang. Syst. 11, 33–56.
Needleman, S. B. and Wunsch, C. D. (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48,443–453.
Pearson, W. R. and Lipman, D. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85, 2444–2448.
Plunkett, G., Burland, V., Daniels D. L. and Blattner, F. R. (1993) Analysis of the Escherichia coli genome.III. DNA sequence of the region from 87.2 to 89.2 minutes. Nucleic Acids Res 21, 3391–3398.
Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A. (1996) Entrez: Molecular Biology Database and Retrieval System. Methods in Enzymol. 266, 141–162.
Sze, S.-H. and Pevzner, P. A. (1997) Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. Proceedings of the First Annual International Conference on Computational Molecular Biology, 300–309.
Ukkonen, E. (1992) Approximate string-matching with q-grams and maximal matches. Theo. Comp. Sci. 92, 191–211.
Ukkonen, E. and Wood, D. (1993) Approximate string matching with suffix automata. Algorithmica 10, 353–364.
Waterman, M. S. (1984) Efficient sequence alignment algorithms. J. theor. Biol. 108, 333–337.
Wilbur, W. J. and Lipman, D. (1984) The context dependent comparison of biological sequences. SIAM J. Appl. Math. 44, 557–567.
Xu, Y, Mural, R. and Uberbacher, E. C. (1994) Constructing gene models from a set of accurately-predicted exons: an application of dynamic programming. CABIOS 10, 613–623.
Zhang, J., Chao, K.-M., Florea, L. and Miller, W. (1997) Alignment Requirements for NCBI's Genomes Division. First Annual International Conference on Computational Molecular Biology, poster session.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chao, KM. (1997). Fast algorithms for aligning sequences with restricted affine gap penalties. In: Jiang, T., Lee, D.T. (eds) Computing and Combinatorics. COCOON 1997. Lecture Notes in Computer Science, vol 1276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0045093
Download citation
DOI: https://doi.org/10.1007/BFb0045093
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63357-0
Online ISBN: 978-3-540-69522-6
eBook Packages: Springer Book Archive