Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments

Paşaniuc, Bogdan; Zaitlen, Noah; Halperin, Eran

doi:10.1007/978-3-642-12683-3_26

Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments

Bogdan Paşaniuc²⁰,
Noah Zaitlen^21,22 &
Eran Halperin^20,21,22

Conference paper

2663 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

Next generation high throughput sequencing (NGS) is poised to replace array based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naïve algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals dna methylation patterning. Nature 452(7184), 215–219 (2008) (03 2008/03/13/print)
Google Scholar
The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature 447, 799–816 (2007)
Google Scholar
The International HapMap Consortium. A second generation human haplotype map of over 3.1 million snps. Nature 449(7164), 851–861(2007) (10 2007/10/18/print)
Google Scholar
Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucl. Acids Res. 36(16), e105 (2008)
Google Scholar
Halperin, E., Hazan, E.: Haplofreq: Estimating haplotype frequencies efficiently. Journal of Computational Biology 13(2), 481–500 (2006) (PMID: 16597253)
Google Scholar
Hashimoto, T., de Hoon, M.J.L., Grimmond, S.M., Daub, C.O., Hayashizaki, Y., Faulkner, G.J.: Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite. Bioinformatics 25(19), 2613–2614 (2009)
Article Google Scholar
http://genome.ucsc.edu/
http://solid.appliedbiosystems.com/
http://www.illumina.com/pages.ilmn?ID=204
http://www.ncbi.nlm.nih.gov/homologene/
Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B.: Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science (2007) 1141319
Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Article Google Scholar
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)
Article Google Scholar
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., Gilad, Y.: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 18(9), 1509–1517 (2008)
Article Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by rna-seq. Nat. Meth. 5(7), 621–628 (2008) (07 2008/07//print)
Google Scholar
Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Meth. 5(1), 16–18 (2008) (01 2008/01//print)
Google Scholar
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., Cooke, M.P., Walker, J.R., Hogenesch, J.B.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101(16), 6062–6067 (2004)
Article Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009) (01 2009/01//print)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute, Berkeley, CA
Bogdan Paşaniuc & Eran Halperin
Molecular Microbiology and Biotechnology Department, Tel-Aviv University,
Noah Zaitlen & Eran Halperin
The Blavatnik School of Computer Science, Tel-Aviv University,
Noah Zaitlen & Eran Halperin

Authors

Bogdan Paşaniuc
View author publications
You can also search for this author in PubMed Google Scholar
Noah Zaitlen
View author publications
You can also search for this author in PubMed Google Scholar
Eran Halperin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 02139, Cambridge, MA, USA
Bonnie Berger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paşaniuc, B., Zaitlen, N., Halperin, E. (2010). Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-12683-3_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics