Modelling-Alignment for Non-random Sequences

Powell, David R.; Allison, Lloyd; Dix, Trevor I.

doi:10.1007/978-3-540-30549-1_19

Modelling-Alignment for Non-random Sequences

David R. Powell^20,21,
Lloyd Allison²⁰ &
Trevor I. Dix^20,21

Conference paper

2547 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Abstract

Populations of biased, non-random sequences may cause standard alignment algorithms to yield false-positive matches and false-negative misses. A standard significance test based on the shuffling of sequences is a partial solution, applicable to populations that can be described by simple models. Masking-out low information content intervals throws information away. We describe a new and general method, modelling-alignment: Population models are incorporated into the alignment process, which can (and should) lead to changes in the rank-order of matches between a query sequence and a collection of sequences, compared to results from standard algorithms. The new method is general and places very few conditions on the nature of the models that can be used with it. We apply modelling-alignment to local alignment, global alignment, optimal alignment, and the relatedness problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allison, L.: Normalization of affine gap costs used in optimal sequence alignment. Journal of Theoretical Biology 161, 263–269 (1993)
Article MathSciNet Google Scholar
Allison, L., Powell, D.R., Dix, T.I.: Compression and approximate matching. The Computer Journal 42(1), 1–10 (1999)
Article MATH Google Scholar
Allison, L., Powell, D.R., Dix, T.I.: Modelling is more versatile than shuffling. Technical report, Monash University, School of Computer Science and Software Engineering (2000)
Google Scholar
Allison, L., Wallace, C.S., Yee, C.N.: Finite-state models in the alignment of macromolecules. Journal of Molecular Evolution 35, 77–89 (1992)
Article Google Scholar
Altschul, S.F., Erickson, B.W.: Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol. 2(6), 526–538 (1985)
Google Scholar
Bishop, M.J., Thompson, E.A.: Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190, 159–165 (1986)
Article Google Scholar
Brenner, S.E., Chothia, C., Hubbard, T.J.P.: Assessing sequence comparison methods with reliable structurally identifed distant evolutionary relationships. Proc. Natl. Acad. Sci. 95, 6073–6078 (1998)
Article Google Scholar
Claverie, J.-M., States, D.J.: Information enhancement methods for large scale sequence analysis. Comp. Chem 17(2), 191–201 (1993)
Article Google Scholar
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5, 345–352 (1978)
Google Scholar
Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998)
Article Google Scholar
Fitch, W.M.: Random sequences. Journal of Molecular Biology 163, 171–176 (1983)
Article Google Scholar
Georgeff, M.P., Wallace, C.S.: A general selection criterion for inductive inference. In: European Conf. on Artificial Intelligence, pp. 473–482 (1984)
Google Scholar
Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162, 705–708 (1982)
Article Google Scholar
Gribskov, M., Robinson, N.L.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers and Chemistry 20(1), 25–33 (1996)
Article Google Scholar
Grumbach, S., Tahi, F.: A new challenge for compression algorithms: genetic sequences. Inf. Proc. and Management 30(6), 875–886 (1994)
Article MATH Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Academy Science 89(10), 915–919 (1992)
Google Scholar
Huestis, R., Fischer, K.: Prediction of many new exons and introns in Plasmodium falciparum chromosome 2. Molecular and Biochemical Parasitology 118, 187–199 (2001)
Article Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
MathSciNet Google Scholar
Loewenstern, D.M., Yianilos, P.N.: Significantly lower entropy estimates for natural DNA sequences. Technical Report 96-51, DIMACS (December 1996)
Google Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21(6), 1087–1092 (1953)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Article Google Scholar
Pearson, W.R.: Effective protein sequence comparison. Meth. Enzymol. 266, 227–258 (1996)
Article Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)
Article Google Scholar
Rivals, E., Delgrange, O., Delahaye, J.-P., Dauchet, M., Delorme, M.-O., Hénaut, A., Ollivier, E.: Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences. CABIOS 13(2), 131–136 (1997)
Google Scholar
Sellers, P.H.: On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26(4), 787–793 (1974)
Article MATH MathSciNet Google Scholar
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. U. of Illinois Press (1949)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Wallace, C.S., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society series B 49(3), 240–265 (1987)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, Monash University, 3800, Australia
David R. Powell, Lloyd Allison & Trevor I. Dix
Victorian Bioinformatics Consortium,
David R. Powell & Trevor I. Dix

Authors

David R. Powell
View author publications
You can also search for this author in PubMed Google Scholar
Lloyd Allison
View author publications
You can also search for this author in PubMed Google Scholar
Trevor I. Dix
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Monash University, VIC 3800, Australia
Geoffrey I. Webb
Science, Engineering and Technology Portfolio, Royal Melbourne Institute of Technology, VIC 3001, Melbourne, Australia
Xinghuo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Powell, D.R., Allison, L., Dix, T.I. (2004). Modelling-Alignment for Non-random Sequences. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-30549-1_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics