Skip to main content

Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

Abstract

Next generation high throughput sequencing (NGS) is poised to replace array based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naïve algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Pradhan, S., Nelson, S.F., Pellegrini, M., Jacobsen, S.E.: Shotgun bisulphite sequencing of the arabidopsis genome reveals dna methylation patterning. Nature 452(7184), 215–219 (2008) (03 2008/03/13/print)

    Google Scholar 

  2. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. Nature 447, 799–816 (2007)

    Google Scholar 

  3. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million snps. Nature 449(7164), 851–861(2007) (10 2007/10/18/print)

    Google Scholar 

  4. Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucl. Acids Res. 36(16), e105 (2008)

    Google Scholar 

  5. Halperin, E., Hazan, E.: Haplofreq: Estimating haplotype frequencies efficiently. Journal of Computational Biology 13(2), 481–500 (2006) (PMID: 16597253)

    Google Scholar 

  6. Hashimoto, T., de Hoon, M.J.L., Grimmond, S.M., Daub, C.O., Hayashizaki, Y., Faulkner, G.J.: Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite. Bioinformatics 25(19), 2613–2614 (2009)

    Article  Google Scholar 

  7. http://genome.ucsc.edu/

  8. http://solid.appliedbiosystems.com/

  9. http://www.illumina.com/pages.ilmn?ID=204

  10. http://www.ncbi.nlm.nih.gov/homologene/

  11. Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B.: Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science (2007) 1141319

    Google Scholar 

  12. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  13. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)

    Article  Google Scholar 

  14. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., Gilad, Y.: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 18(9), 1509–1517 (2008)

    Article  Google Scholar 

  15. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by rna-seq. Nat. Meth. 5(7), 621–628 (2008) (07 2008/07//print)

    Google Scholar 

  16. Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Meth. 5(1), 16–18 (2008) (01 2008/01//print)

    Google Scholar 

  17. Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., Cooke, M.P., Walker, J.R., Hogenesch, J.B.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101(16), 6062–6067 (2004)

    Article  Google Scholar 

  18. Wang, Z., Gerstein, M., Snyder, M.: Rna-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009) (01 2009/01//print)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paşaniuc, B., Zaitlen, N., Halperin, E. (2010). Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics