Skip to main content

Pair HMM Based Gap Statistics for Re-evaluation of Indels in Alignments with Affine Gap Penalties

  • Conference paper
  • 827 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6293))

Abstract

Although computationally aligning sequence is a crucial step in the vast majority of comparative genomics studies our understanding of alignment biases still needs to be improved. To infer true structural or homologous regions computational alignments need further evaluation. It has been shown that the accuracy of aligned positions can drop substantially in particular around gaps. Here we focus on re-evaluation of score-based alignments with affine gap penalty costs. We exploit their relationships with pair hidden Markov models and develop efficient algorithms by which to identify gaps which are significant in terms of length and multiplicity. We evaluate our statistics with respect to the well-established structural alignments from SABmark and find that indel reliability substantially increases with their significance in particular in worst-case twilight zone alignments. This points out that our statistics can reliably complement other methods which mostly focus on the reliability of match positions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W.: Local alignment statistics. Methods in Enzymology 266, 460–480 (1996)

    Article  CAS  PubMed  Google Scholar 

  2. Bassino, F., Clement, J., Fayolle, J., Nicodeme, P.: Constructions for Clumps Statistics. In: MathInfo 2008 (2008), www.arxiv.org/abs/0804.3671

  3. Bradley, R.K., Roberts, A., Smoot, M., Juvekar, S., Do, J., Dewey, C., Holmes, I., Pachter, L.: Fast statistical alignment. PLoS Computational Biology 5(5), e1000392 (2009)

    Google Scholar 

  4. Cartwright, R.A.: Logarithmic gap costs decrease alignment accuracy. BMC Bioinformatics 7, 527 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  5. Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. Journal of Molecular Biology 341, 617–631 (2004)

    Article  CAS  PubMed  Google Scholar 

  6. Cline, M., Hughey, R., Karplus, K.: Predicting reliable regions in protein sequence alignments. Bioinformatics 18 (2), 306–314 (2002)

    Article  CAS  PubMed  Google Scholar 

  7. Dembo, A., Karlin, S.: Strong limit theorem of empirical functions for large exceedances of partial sums of i.i.d. variables. Annals of Probability 19, 1737–1755 (1991)

    Article  Google Scholar 

  8. Dewey, C.N., Huggins, P.M., Woods, K., Sturmfels, B., Pachter, L.: Parametric alignment of Drosophila genomes. PLoS Computational Biology 2, e73 (2006)

    Google Scholar 

  9. Do, C.B., Mahabhashyam, M.S., Brudno, M., Batzoglou, S.: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  11. Fu, J.C., Koutras, M.V.: Distribution theory of runs: a Markov chain approach. Journal of the American Statistical Association 89(427), 1050–1058 (1994)

    Article  Google Scholar 

  12. Gotoh, O.: An improved algorithm for matching biological sequences. Journal of Molecular Biology 162, 705–708 (1982)

    Article  CAS  PubMed  Google Scholar 

  13. Karlin, S., Altschul, S.F.: Methods for assessing the statistic significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the USA 87, 2264–2268 (1990)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kumar, S., Filipski, A.: Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Research 17, 127–135 (2007)

    Article  CAS  PubMed  Google Scholar 

  15. Loeytynoja, A., Goldman, N.: An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the USA 102 (30), 10557–10562 (2005)

    Article  CAS  Google Scholar 

  16. Loeytynoja, A., Goldman, N.: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635 (2008)

    Article  CAS  Google Scholar 

  17. Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18 (2007), doi:10.1101/gr.6725608

    Google Scholar 

  18. Mevissen, H., Vingron, M.: Quantifying the local reliability of a sequence alignment. Stochastic Models of Sequence Evolution including Insertion-Deletion Events. Protein Engineering 9(2), 127–132 (1996)

    Article  CAS  PubMed  Google Scholar 

  19. Miklos, I., Novak, A., Satija, R., Lyngso, R., Hein, J.: Stochastic Models of Sequence Evolution including Insertion-Deletion Events. In: Statistical Methods in Medical Research 2009 (2008), doi:10.1177/096228020809950

    Google Scholar 

  20. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  CAS  PubMed  Google Scholar 

  21. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Peköz, E.A., Ross, S.M.: A simple derivation of exact reliability formulas for linear and circular consecutive-k-of-n F systems. Journal of Applied Probability 32, 554–557 (1995)

    Article  Google Scholar 

  23. Polyanovsky, V.O., Roytberg, M.A., Tumanyan, V.G.: A new approach to assessing the validity of indels in algorithmic pair alignments. Biophysics 53(4), 253–255 (2008)

    Article  Google Scholar 

  24. Qian, B., Goldstein, R.A.: Distribution of indel lengths. Proteins: Structure, Function and Bioinformatics 45, 102–104 (2001)

    Article  CAS  Google Scholar 

  25. Schönhuth, A., Salari, R., Hormozdiari, F., Cherkasov, A., Sahinalp, S.C.: Towards improved assessment of functional similarity in large-scale screens: an indel study. Journal of Computational Biology 17(1), 1–20 (2010)

    Article  PubMed  Google Scholar 

  26. Schönhuth, A., Salari, R., Sahinalp, S.C.: Pair HMM based gap statistics for re-evaluation of indels in alignments with affine gap penalties—Extended Version (2010), http://arxiv.org/abs/1006.2420

  27. Van Walle, I., Lasters, I., Wyns, L.: SABmark - a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21, 1267–1268 (2005)

    Article  PubMed  Google Scholar 

  28. Schlosshauer, M., Ohlsson, M.: A novel approach to local reliability of sequence alignments. Bioinformatics 18 (6), 847–854 (2002)

    Article  CAS  PubMed  Google Scholar 

  29. Smith, T.M., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  CAS  PubMed  Google Scholar 

  30. Tress, M.L., Jones, D., Valencia, A.: Predicting reliable regions in protein alignments from sequence profiles. Journal of Molecular Biology 330 (4), 705–718 (2003)

    Article  CAS  PubMed  Google Scholar 

  31. Waterman, M.S., Eggert, M.: A new algorithm for best subsequences alignment with application to tRNA-rRNA comparisons. J. MoL. BioL. 197, 723–728 (1987)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schönhuth, A., Salari, R., Sahinalp, S.C. (2010). Pair HMM Based Gap Statistics for Re-evaluation of Indels in Alignments with Affine Gap Penalties. In: Moulton, V., Singh, M. (eds) Algorithms in Bioinformatics. WABI 2010. Lecture Notes in Computer Science(), vol 6293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15294-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15294-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15293-1

  • Online ISBN: 978-3-642-15294-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics