Skip to main content
Log in

Aligning DNA Sequences to Minimize the Change in Protein

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

We study an alignment model for coding DNA sequences recently proposed by J. Hein that takes into account both DNA and protein information, and attempts to minimize the total amount of evolution at both DNA and protein levels. Assuming that the gap penalty function is affine, we design a quadratic time dynamic programming algorithm for the model. Although the algorithm theoretically solves an open question of Hein, its running time is impractical because of the large constant factor embedded in the quadratic time complexity function. We therefore consider a mild simplification named Context-free Codon Alignment of Hein's model and present a much more efficient algorithm for the simplified model. The algorithms have been implemented and tested on both real and simulated sequences, and it is found that they produce almost identical alignments in most cases. Furthermore, we extend our model and design a heuristic algorithm to handle frame-shift errors and overlapping frames in coding regions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • M. Dayhoff et al., Atlas of Protein Sequence and Structure, vol. 5, no. 3, pp. 345–352, Nat. Biomed. Res. Found., Washington, D.C., 1978.

    Google Scholar 

  • O. Gotoh, “An improved algorithm for matching biological sequences,” J. Mol. Biol., vol. 162, pp. 705–708, 1981.

    Google Scholar 

  • D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, 1997.

  • J. Hein, “An algorithm combining DNA and protein alignment,” Journal of Theoretical Biology, vol. 167, pp. 169–174, 1994.

    Google Scholar 

  • J. Hein and J. Støvlbæk, “Genomic alignment,” J. Mol. Evol., vol. 38, pp. 310–316, 1994.

    Google Scholar 

  • J. Hein and J. Støvlbæk, “Combined DNA and protein alignment,” Methods in Enzymology, vol. 266, pp. 402–418, 1996.

    Google Scholar 

  • Y. Hua, “An improved algorithm for combined DNA and protein alignment,” M. Eng. Thesis, Department of Computer and Electrical Engineering, McMaster University, 1997.

  • S. Needlemann and C. Wunsch, “A general method applicable to the search for similarities in the amino acid sequences of two proteins,” J. Mol. Biol., vol. 48, pp. 443–453, 1970.

    Google Scholar 

  • C. Pedersen, “Computational analysis of biological sequences,” Manuscript, 1997.

  • C. Pedersen, R. Lyngsø, and J. Hein, “Comparison of coding DNA,” in Proc. 9th Combinatorial Pattern Matching Conf., LNCS 1448, Springer, 1998, pp. 153–173.

    Google Scholar 

  • D. Sankoff, “Matching sequences under deletion/insertion constraints,” Proc. Nat. Acad. Sci., vol. 69, no. 1, pp. 4–6, 1972.

    Google Scholar 

  • D. Sankoff, R. Cedergren, and G. Lapalme, “Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA,” J. Mol. Evol., vol. 7, pp. 133–149, 1976.

    Google Scholar 

  • P. Sellers, “On the theory and computation of evolutionary distances,” SIAM J. Appl. Math., vol. 26, pp. 787–793, 1974.

    Google Scholar 

  • D. States et al., Methods: A Companion to Methods in Enzymology, 1991, vol. 3, pp. 66–70.

    Google Scholar 

  • B. Wu, “Context-free codon alignment,” M.Sc. Thesis, Department of Computer Science and Systems, McMaster University, Hamilton, Ontario, Canada, 1998.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hua, Y., Jiang, T. & Wu, B. Aligning DNA Sequences to Minimize the Change in Protein. Journal of Combinatorial Optimization 3, 227–245 (1999). https://doi.org/10.1023/A:1009889710983

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009889710983

Navigation