Skip to main content

Statistical Significance for NGS Reads Similarities

  • Conference paper
  • 808 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6620))

Abstract

In this work we present a significance curve to segregate random alignments from true matches in by identity sequence comparison, especially suitable for sequencing data produced by NGS-technologies. The experimental approach reproduces the random local ungapped similarities distribution by score and length from which it is possible to asses the statistical significance of any particular ungapped similarity. This work includes the study of the distribution behaviour as a function of the experimental technology used to produce the raw sequences, as well as the scoring system used in the comparison. Our approach reproduces the expected behaviour and completes the proposal of Rost and Sander for homology based sequence comparisons. Results can be exploited by computational applications to reduce the computational cost and memory usage.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Swindell, S.R., Plasterer, T.N.: SEQMAN. Contig assembly. Methods Mol. Biol. 70, 75–89 (1997)

    Google Scholar 

  2. Miller, J.R., et al.: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008)

    Article  Google Scholar 

  3. Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chevreux, B., Wetter, T., Suhai, S.: Genome sequence assembly using trace signals and additional sequence information. In: Comput. Sci. Biol.: Proc. German Conference on Bioinformatics GCB 1999 GCB, pp. 45–56 (1999)

    Google Scholar 

  5. http://www.pacificbiosciences.com/

  6. http://www.nanowerk.com/news/newsid=17170.php

  7. http://www.technologyreview.com/biomedicine/23589/

  8. Rost, B.: Twilight zone of protein sequence alignments. Protein Eng. 12(2), 85–94 (1999)

    Article  Google Scholar 

  9. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  10. Altschul, S.F., Gish, W.: Local alignment statistics. Methods Enzymol. 266, 460–480 (1996)

    Article  Google Scholar 

  11. Collins, J.F., Coulson, A.: Significance of protein sequence similarities. Methods Enzymol. 183, 474–487 (1990)

    Article  Google Scholar 

  12. Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1), 56–68 (1991)

    Article  Google Scholar 

  13. Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T.: The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29(2), 351–361 (2001)

    Article  Google Scholar 

  14. Trelles, O., Andrade, M.A., Valencia, A., Zapata, E.L., Carazo, J.M.: Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences. BioInformatics 14(5), 439–451 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Muñoz-Mérida, A., Ríos, J., Benzekri, H., Trelles, O. (2012). Statistical Significance for NGS Reads Similarities. In: Freitas, A.T., Navarro, A. (eds) Bioinformatics for Personalized Medicine. JBI 2010. Lecture Notes in Computer Science(), vol 6620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28062-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28062-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28061-0

  • Online ISBN: 978-3-642-28062-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics