Skip to main content

Modelling-Alignment for Non-random Sequences

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Abstract

Populations of biased, non-random sequences may cause standard alignment algorithms to yield false-positive matches and false-negative misses. A standard significance test based on the shuffling of sequences is a partial solution, applicable to populations that can be described by simple models. Masking-out low information content intervals throws information away. We describe a new and general method, modelling-alignment: Population models are incorporated into the alignment process, which can (and should) lead to changes in the rank-order of matches between a query sequence and a collection of sequences, compared to results from standard algorithms. The new method is general and places very few conditions on the nature of the models that can be used with it. We apply modelling-alignment to local alignment, global alignment, optimal alignment, and the relatedness problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allison, L.: Normalization of affine gap costs used in optimal sequence alignment. Journal of Theoretical Biology 161, 263–269 (1993)

    Article  MathSciNet  Google Scholar 

  2. Allison, L., Powell, D.R., Dix, T.I.: Compression and approximate matching. The Computer Journal 42(1), 1–10 (1999)

    Article  MATH  Google Scholar 

  3. Allison, L., Powell, D.R., Dix, T.I.: Modelling is more versatile than shuffling. Technical report, Monash University, School of Computer Science and Software Engineering (2000)

    Google Scholar 

  4. Allison, L., Wallace, C.S., Yee, C.N.: Finite-state models in the alignment of macromolecules. Journal of Molecular Evolution 35, 77–89 (1992)

    Article  Google Scholar 

  5. Altschul, S.F., Erickson, B.W.: Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol. 2(6), 526–538 (1985)

    Google Scholar 

  6. Bishop, M.J., Thompson, E.A.: Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190, 159–165 (1986)

    Article  Google Scholar 

  7. Brenner, S.E., Chothia, C., Hubbard, T.J.P.: Assessing sequence comparison methods with reliable structurally identifed distant evolutionary relationships. Proc. Natl. Acad. Sci. 95, 6073–6078 (1998)

    Article  Google Scholar 

  8. Claverie, J.-M., States, D.J.: Information enhancement methods for large scale sequence analysis. Comp. Chem 17(2), 191–201 (1993)

    Article  Google Scholar 

  9. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)

    Google Scholar 

  10. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)

    Article  Google Scholar 

  11. Fitch, W.M.: Random sequences. Journal of Molecular Biology 163, 171–176 (1983)

    Article  Google Scholar 

  12. Georgeff, M.P., Wallace, C.S.: A general selection criterion for inductive inference. In: European Conf. on Artificial Intelligence, pp. 473–482 (1984)

    Google Scholar 

  13. Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162, 705–708 (1982)

    Article  Google Scholar 

  14. Gribskov, M., Robinson, N.L.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers and Chemistry 20(1), 25–33 (1996)

    Article  Google Scholar 

  15. Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Proc. and Management 30(6), 875–886 (1994)

    Article  MATH  Google Scholar 

  16. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Academy Science 89(10), 915–919 (1992)

    Google Scholar 

  17. Huestis, R., Fischer, K.: Prediction of many new exons and introns in Plasmodium falciparum chromosome 2. Molecular and Biochemical Parasitology 118, 187–199 (2001)

    Article  Google Scholar 

  18. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  19. Loewenstern, D.M., Yianilos, P.N.: Significantly lower entropy estimates for natural DNA sequences. Technical Report 96-51, DIMACS (December 1996)

    Google Scholar 

  20. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21(6), 1087–1092 (1953)

    Article  Google Scholar 

  21. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  22. Pearson, W.R.: Effective protein sequence comparison. Meth. Enzymol. 266, 227–258 (1996)

    Article  Google Scholar 

  23. Pearson, W.R., Lipman, D.J.: Improved tools for biological comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  24. Rivals, E., Delgrange, O., Delahaye, J.-P., Dauchet, M., Delorme, M.-O., Hénaut, A., Ollivier, E.: Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences. CABIOS 13(2), 131–136 (1997)

    Google Scholar 

  25. Sellers, P.H.: On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26(4), 787–793 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  26. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. U. of Illinois Press (1949)

    Google Scholar 

  27. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  28. Wallace, C.S., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society series B 49(3), 240–265 (1987)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Powell, D.R., Allison, L., Dix, T.I. (2004). Modelling-Alignment for Non-random Sequences. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30549-1_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24059-4

  • Online ISBN: 978-3-540-30549-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics