Abstract
High-throughput sequencing makes possible to process samples containing multiple genomic sequences and then estimate their frequencies or even assemble them. The maximum likelihood estimation of frequencies of the sequences based on observed reads can be efficiently performed using expectation-maximization (EM) method assuming that we know sequences present in the sample. Frequently, such knowledge is incomplete, e.g., in RNA-seq not all isoforms are known and when sequencing viral quasispecies their sequences are unknown. We propose to enhance EM with a virtual string and incorporate it into frequency estimation tools for RNA-Seq and quasispecies sequencing. Our simulations show that EM enhanced with the virtual string estimates string frequencies more accurately than the original methods and that it can find the reads from missing quasispecies thus enabling their reconstruction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Astrovskaya, I., Tork, B., Mangul, S., Westbrooks, K., Mandoiu, I., Balfe, P., Zelikovsky, A.: Inferring viral spectrum from 454 pyrosequencing reads. BMC Bioinformatics (to appear), http://dna.engr.uconn.edu/bibtexmngr/upload/Aal.11a.pdf
Balser, S., Malde, K., Lanzen, A., Sharma, A., Jonassen, I.: Characteristics of 454 pyrosequencing data–enabling realistic simulation with flowsim. Bioinformatics 26, i420–i425 (2010)
Zaitlen, N., Pasaniuc, B., Halperin, E.: Accurate estimation of expression levels of homologous genes in RNA-seq experiments. Journal of Computational Biology 18(3), 459–468 (2011)
Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.Y., Wang, C.: et al. Viral population estimation using pyrosequencing. PLoS Comput. Biol. 4, e1000074 (2008)
Von Hahn, T., Yoon, J.C., Alter, H., Rice, C.M., Rehermann, B., Balfe, P., Mckeating, J.A.: Hepatitis c virus continuously escapes from neutralizing antibody and t-cell responses during chronic infection in vivo. Gastroenterology 132, 667–678 (2007)
Hoffmann, S., Otto, C., Kurtz, S., Sharma, C.M., Khaitovich, P., Vogel, J., Stadler, P.F., Hackermüller, J.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comput. Biol. 5(9), e1000502 (2009)
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A., Dewey, C.N.: RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4), 493–500 (2010)
Nicolae, M., Mangul, S., Mandoiu, I.I., Zelikovsky, A.: Estimation of alternative splicing isoform frequencies from RNA-seq data. Algorithms for Molecular Biology 6, 9 (2011)
Mortazavi, A., Williams, B.A.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods (2008)
Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 17(3), 417–428 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mangul, S., Astrovskaya, I., Nicolae, M., Tork, B., Mandoiu, I., Zelikovsky, A. (2011). Maximum Likelihood Estimation of Incomplete Genomic Spectrum from HTS Data. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-23038-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23037-0
Online ISBN: 978-3-642-23038-7
eBook Packages: Computer ScienceComputer Science (R0)