Abstract
In this work we present a significance curve to segregate random alignments from true matches in by identity sequence comparison, especially suitable for sequencing data produced by NGS-technologies. The experimental approach reproduces the random local ungapped similarities distribution by score and length from which it is possible to asses the statistical significance of any particular ungapped similarity. This work includes the study of the distribution behaviour as a function of the experimental technology used to produce the raw sequences, as well as the scoring system used in the comparison. Our approach reproduces the expected behaviour and completes the proposal of Rost and Sander for homology based sequence comparisons. Results can be exploited by computational applications to reduce the computational cost and memory usage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Swindell, S.R., Plasterer, T.N.: SEQMAN. Contig assembly. Methods Mol. Biol. 70, 75–89 (1997)
Miller, J.R., et al.: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008)
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001)
Chevreux, B., Wetter, T., Suhai, S.: Genome sequence assembly using trace signals and additional sequence information. In: Comput. Sci. Biol.: Proc. German Conference on Bioinformatics GCB 1999 GCB, pp. 45–56 (1999)
Rost, B.: Twilight zone of protein sequence alignments. Protein Eng. 12(2), 85–94 (1999)
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Altschul, S.F., Gish, W.: Local alignment statistics. Methods Enzymol. 266, 460–480 (1996)
Collins, J.F., Coulson, A.: Significance of protein sequence similarities. Methods Enzymol. 183, 474–487 (1990)
Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1), 56–68 (1991)
Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T.: The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29(2), 351–361 (2001)
Trelles, O., Andrade, M.A., Valencia, A., Zapata, E.L., Carazo, J.M.: Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences. BioInformatics 14(5), 439–451 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muñoz-Mérida, A., Ríos, J., Benzekri, H., Trelles, O. (2012). Statistical Significance for NGS Reads Similarities. In: Freitas, A.T., Navarro, A. (eds) Bioinformatics for Personalized Medicine. JBI 2010. Lecture Notes in Computer Science(), vol 6620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28062-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-28062-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28061-0
Online ISBN: 978-3-642-28062-7
eBook Packages: Computer ScienceComputer Science (R0)