skip to main content
10.1145/2382936.2382983acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

An integer programming approach to novel transcript reconstruction from paired-end RNA-Seq reads

Published: 07 October 2012 Publication History

Abstract

Massively parallel whole transcriptome sequencing, commonly referred to as RNA-Seq, has become the technology of choice for performing gene expression profiling. However, reconstruction of full-length novel transcripts from RNA-Seq data remains challenging due to the short read length delivered by most existing sequencing technologies. We propose a novel statistical genome-guided method called "Transcriptome Reconstruction using Integer Programming" (TRIP) that incorporates fragment length distribution into novel transcript reconstruction from paired-end RNA-Seq reads. TRIP creates a splice graph based on aligned RNA-Seq reads and enumerates all maximal paths corresponding to putative transcripts. The problem of selecting true transcripts is formulated as an integer program (IP) which minimizes the set of selected transcripts yielding a good statistical fit between the fragment length distribution (empirically determined during library preparation) and fragment lengths implied by mapped read pairs. Experimental results on both real and synthetic datasets show that TRIP is more accurate than methods ignoring fragment length distribution information. The software is available at: http://www.cs.gsu.edu/serghei/?q=trip

References

[1]
I. Astrovskaya, B. Tork, S. Mangul, K. Westbrooks, I. Mandoiu, P. Balfe, and A. Zelikovsky. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics, 12(Suppl 6):S1, 2011.
[2]
K. F. Au, H. Jiang, L. Lin, Y. Xing, and W. H. Wong. Detection of splice junctions from paired-end rna-seq data by splicemap. Nucleic Acids Research, 2010.
[3]
A. Derti, P. Garrett-Engele, K. D. MacIsaac, R. C. Stevens, S. Sriram, R. Chen, C. A. Rohl, J. M. Johnson, and T. Babak. A quantitative atlas of polyadenylation in five mammals. Genome Research, 22(6):1173--1183, 2012.
[4]
J. Feng, W. Li, and T. Jiang. Inference of isoforms from short sequence reads. In Proc. RECOMB, pages 138--157, 2010.
[5]
M. Garber, M. G. Grabherr, M. Guttman, and C. Trapnell. Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Methods, 8(6):469--477, May 2011.
[6]
M. Grabherr. Full-length transcriptome assembly from rna-seq data without a reference genome. Nature biotechnology, 29(7):644--652, 2011.
[7]
M. Guttman, M. Garber, J. Levin, J. Donaghey, J. Robinson, X. Adiconis, L. Fan, M. Koziol, A. Gnirke, C. Nusbaum, J. Rinn, E. Lander, and A. Regev. Ab initio reconstruction of cell type--specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnology, 28(5):503--510, 2010.
[8]
B. Li, V. Ruotti, R. Stewart, J. Thomson, and C. Dewey. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26(4):493--500, 2010.
[9]
W. Li, J. Feng, and T. Jiang. IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly. Lecture Notes in Computer Science, 6577:168--+, 2011.
[10]
Y. Y. Lin, P. Dao, F. Hach, M. Bakhshi, F. Mo, A. Lapuk, C. Collins, and S. C. Sahinalp. Cliiq: Accurate comparative detection and quantification of expressed isoforms in a population. Proc. 12th Workshop on Algorithms in Bioinformatics, 2012.
[11]
S. Mangul, A. Caciula, I. Mandoiu, and A. Zelikovsky. Rna-seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on, pages 118--123, nov. 2011.
[12]
T. R. Mercer, D. J. Gerhardt, M. E. Dinger, J. Crawford, C. Trapnell, J. A. Jeddeloh, J. S. Mattick, and J. L. Rinn. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nature Biotechnology, 30(1):99--104, 2012.
[13]
A. Mortazavi, B. Williams, K. McCue, L. Schaeffer, and B. Wold. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods, 2008.
[14]
M. Nicolae, S. Mangul, I. Mandoiu, and A. Zelikovsky. Estimation of alternative splicing isoform frequencies from rna-seq data. Algorithms for Molecular Biology, 6:9, 2011.
[15]
S. Pal, R. Gupta, H. Kim, P. Wickramasinghe, V. Baubet, L. C. Showe, N. Dahmane, and R. V. Davuluri. Alternative transcription exceeds alternative splicing in generating the transcriptome diversity of cerebellar development. Genome Research, 2011.
[16]
P. A. Pevzner. 1-Tuple DNA sequencing: computer analysis. J Biomol Struct Dyn, 7(1):63--73, Aug. 1989.
[17]
A. Roberts, H. Pimentel, C. Trapnell, and L. Pachter. Identification of novel transcripts in annotated genomes using rna-seq. Bioinformatics, 2011.
[18]
G. Robertson, J. Schein, R. Chiu, R. Corbett, M. Field, S. D. Jackman, K. Mungall, S. Lee, H. M. Okada, J. Q. Qian, and et al. De novo assembly and analysis of rna-seq data. Nature Methods, 7(11):909--912, 2010.
[19]
J. M. Rothberg, W. Hinz, T. M. Rearick, J. Schultz, W. Mileski, M. Davey, J. H. Leamon, K. Johnson, M. J. Milgrew, M. Edwards, J. Hoon, J. F. Simons, D. Marran, J. W. Myers, J. F. Davidson, A. Branting, J. R. Nobile, B. P. Puc, D. Light, T. A. Clark, M. Huber, J. T. Branciforte, I. B. Stoner, S. E. Cawley, M. Lyons, Y. Fu, N. Homer, M. Sedova, X. Miao, B. Reed, J. Sabina, E. Feierstein, M. Schorn, M. Alanjary, E. Dimalanta, D. Dressman, R. Kasinskas, T. Sokolsky, J. A. Fidanza, E. Namsaraev, K. J. McKernan, A. Williams, G. T. Roth, and J. Bustillo. An integrated semiconductor device enabling non-optical genome sequencing. Nature, 475(7356):348--352, 2011.
[20]
C. Trapnell, L. Pachter, and S. Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 25(9):1105--1111, 2009.
[21]
C. Trapnell, B. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. van Baren, S. Salzberg, B. Wold, and L. Pachter. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology, 28(5):511--515, 2010.
[22]
E. Wang, R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. Kingsmore, G. Schroth, and C. Burge. Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221):470--476, 2008.

Cited By

View all
  • (2022)MultiTrans: An Algorithm for Path Extraction Through Mixed Integer Linear Programming for Transcriptome AssemblyIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.308327719:1(48-56)Online publication date: 1-Jan-2022
  • (2022)Efficient Minimum Flow Decomposition via Integer Linear ProgrammingJournal of Computational Biology10.1089/cmb.2022.025729:11(1252-1267)Online publication date: 1-Nov-2022
  • (2022)Fast, Flexible, and Exact Minimum Flow Decompositions via ILPResearch in Computational Molecular Biology10.1007/978-3-031-04749-7_14(230-245)Online publication date: 29-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
October 2012
725 pages
ISBN:9781450316705
DOI:10.1145/2382936
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Conference

BCB' 12
Sponsor:

Acceptance Rates

BCB '12 Paper Acceptance Rate 33 of 159 submissions, 21%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)MultiTrans: An Algorithm for Path Extraction Through Mixed Integer Linear Programming for Transcriptome AssemblyIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.308327719:1(48-56)Online publication date: 1-Jan-2022
  • (2022)Efficient Minimum Flow Decomposition via Integer Linear ProgrammingJournal of Computational Biology10.1089/cmb.2022.025729:11(1252-1267)Online publication date: 1-Nov-2022
  • (2022)Fast, Flexible, and Exact Minimum Flow Decompositions via ILPResearch in Computational Molecular Biology10.1007/978-3-031-04749-7_14(230-245)Online publication date: 29-Apr-2022
  • (2021)Safety in multi-assembly via paths appearing in all path covers of a DAGIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2021.3131203(1-1)Online publication date: 2021
  • (2019)Does Relaxing the Infinite Sites Assumption Give Better Tumor Phylogenies? An ILP-Based Comparative ApproachIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2018.286572916:5(1410-1423)Online publication date: 1-Sep-2019
  • (2016)Transcriptome Quantification and Differential Expression from NGS DataComputational Methods for Next Generation Sequencing Data Analysis10.1002/9781119272182.ch14(301-327)Online publication date: 26-Aug-2016
  • (2016)Computational Methods for Transcript Assembly from RNA‐SEQ ReadsComputational Methods for Next Generation Sequencing Data Analysis10.1002/9781119272182.ch11(245-268)Online publication date: 26-Aug-2016
  • (2015)IAOseq: inferring abundance of overlapping genes using RNA-seq dataBMC Bioinformatics10.1186/1471-2105-16-S1-S316:S1Online publication date: 21-Jan-2015
  • (2015)Explaining a Weighted DAG with Few Paths for Solving Genome-Guided Multi-AssemblyIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.241875312:6(1345-1354)Online publication date: 1-Nov-2015
  • (2014)Transcriptome assembly and quantification from Ion Torrent RNA-Seq dataBMC Genomics10.1186/1471-2164-15-S5-S715:Suppl 5(S7)Online publication date: 2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media