ABSTRACT
It is crucial to understand the performance of transcriptome assemblies to improve current practices. Investigating the factors that affect a transcriptome assembly is very important and is the primary goal of our project. To that end, we designed a multi-step pipeline consisting of variety of pre-processing and quality control steps. XSEDE allocations enabled us to achieve the computational demands of the project. The high memory Blacklight and Greenfield systems at Pittsburgh Supercomputing Center were essential to accomplish multiple steps of this project. This paper presents the computational aspects of our comprehensive transcriptome assembly and validation study.
- A. Celaj, J. Markle, J. Danska, and J. Parkinson. Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation. Microbiome, 2(39), 2014.Google Scholar
- SEQC/MAQC-Iii. Consortium et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nature biotechnology, 32(9):903--914, 2014.Google ScholarCross Ref
- M. Dodt, J. T. Roehr, R. Ahmed, and C. Dieterich. FLEXBAR - flexible barcode and adapter processing for next-generation sequencing platforms. Biology, 1(3):895--905, 2012.Google ScholarCross Ref
- N. Ghaffari, O. A. Arshad, H. Jeong, J. Thiltges, M. F. Criscitiello, B.-J. Yoon, A. Datta, and C. D. Johnson. Examining de novo transcriptome assemblies via a quality assessment pipeline. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 2015.Google Scholar
- M. G. Grabherr, B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-seq data. Nature biotechnology, 29(7):644, 2011.Google ScholarCross Ref
- D. Kim, G. Pertea, C. Trapnell, H. Pimentel, R. Kelley, S. L. Salzberg, et al. Tophat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol, 14(4):R36, 2013.Google ScholarCross Ref
- B. Langmead, C. Trapnell, M. Pop, S. L. Salzberg, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biol, 10(3):R25, 2009.Google ScholarCross Ref
- H.-S. Le, M. H. Schulz, B. M. McCauley, V. F. Hinman, and Z. Bar-Joseph. Probabilistic error correction for RNA sequencing. Nucleic acids research, page gkt215, 2013.Google Scholar
- B. Li, N. Fillmore, Y. Bai, M. Collins, J. A. Thomson, R. Stewart, and C. N. Dewey. Evaluation of de novo transcriptome assemblies from rna-seq data. Genome Biol, 15(12):553, 2014.Google ScholarCross Ref
- H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, et al. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16):2078--2079, 2009. Google ScholarDigital Library
- M. Martin. Cutadapt removes adapter sequences from high-throughput sequencing reads. EM Bnet. journal, 17(1):pp--10, 2011.Google Scholar
- A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20(9):1297--1303, 2010.Google ScholarCross Ref
- F. A. Simão, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, and E. M. Zdobnov. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31(19):3210--3212, 2015.Google ScholarCross Ref
- J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, et al. XSEDE: accelerating scientific discovery. Computing in Science & Engineering, 16(5):62--74, 2014.Google ScholarCross Ref
- T. D. Wu and C. K. Watanabe. Gmap: a genomic mapping and alignment program for mrna and est sequences. Bioinformatics, 21(9):1859--1875, 2005. Google ScholarDigital Library
Recommendations
Strand specific RNA-seq data for higher specificity
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systemsHigh-throughput RNA Sequencing (RNA-seq) has become a popular tool for transcriptome analysis. An important application of RNA-seq is to detect differential alternative splicing, that is, differences in exon splicing patterns under different biological ...
Circular RNA Detection from High-throughput Sequencing
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsAlternative splicing refers to the production of multiple mRNA isoforms from a single gene due to alternative selection of exons or splice sites during pre-mRNA splicing. While canonical alternative splicing produces a linear form of RNA by joining an ...
Micro-Variations from RNA-seq Experiments for Non-model Organisms
Bioinformatics and Biomedical EngineeringAbstractRNA-based high-throughput sequencing technologies provide a huge amount of reads from transcripts. In addition to expression analyses, transcriptome reconstruction, or isoform detection, they could be useful for detection of gene variations, in ...
Comments