ABSTRACT
RNA-sequencing is a technique to study RNA expression in biological material. It is quickly gaining popularity in the field of transcriptomics. Trinity is a software tool that was developed for efficient de novo reconstruction of transcriptomes from RNA-Seq data. In this paper we first conduct a performance study of Trinity and compare it to previously published data from 2011. The version from 2011 is much slower than many other de novo assemblers and biologists have thus been forced to choose between quality and speed. We examine the runtime behavior of Trinity as a whole as well as its individual components and then optimize the most performance critical parts. We find that standard best practices for HPC applications can also be applied to Trinity, especially on systems with large amounts of memory. When combining best practices for HPC applications along with our specific performance optimization, we can decrease the runtime of Trinity by a factor of 3.9. This brings the runtime of Trinity in line with other de novo assemblers while maintaining superior quality. The purpose of this paper is to describe a series of improvements to Trinity, quantify the execution improvements achieved, and document the new version of the software.
- Blacklight SGI UV 1000 at PSC. http://www.psc.edu/machines/sgi/uv/blacklight.php.Google Scholar
- Collectl. http://collectl.sourceforge.net.Google Scholar
- IU Mason Cluster. http://pti.iu.edu/hps/mason.Google Scholar
- K-mer Tools. http://kmer.sourceforge.net.Google Scholar
- National Center for Genome Analysis Support. http://ncgas.org.Google Scholar
- RNA-Seq De novo Assembly Using Trinity. http://trinityrnaseq.sourceforge.net.Google Scholar
- C. Geng, Y. KangPing, C. Wang, and S. TieLiu. De novo transcriptome assembly of RNA-Seq reads with different strategies. Science China Life Sciences, 54(12):1129--1133, 2011.Google ScholarCross Ref
- M. G. Grabherr, B. J. Haas, M. Yassour, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29(7):644--U130, 2011.Google ScholarCross Ref
- A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel. The vampir performance analysis tool-set. In M. Resch et al., editors, Tools for High Performance Computing, pages 139--155. Springer, 2008.Google Scholar
- J. Malone and B. Oliver. Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biology, 9(1):34+, 2011.Google Scholar
- G. Marcais and C. Kingsford. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6):764--770, 2011. Google ScholarDigital Library
- J. A. Martin and Z. Wang. Next-generation transcriptome assembly. Nat Rev Genet, 12(10):671--682, 2011.Google ScholarCross Ref
- C. Stewart et al. MRI: Acquisition of a High-Speed, High Capacity Storage System to Support Scientific Computing: The Data Capacitor. http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0521433.Google Scholar
- Q.-Y. Zhao, Y. Wang, Y.-M. Kong, D. Luo, X. Li, and P. Hao. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12(14), 2011.Google Scholar
Index Terms
Trinity RNA-Seq assembler performance optimization
Recommendations
Strand specific RNA-seq data for higher specificity
RACS '15: Proceedings of the 2015 Conference on research in adaptive and convergent systemsHigh-throughput RNA Sequencing (RNA-seq) has become a popular tool for transcriptome analysis. An important application of RNA-seq is to detect differential alternative splicing, that is, differences in exon splicing patterns under different biological ...
Circular RNA Detection from High-throughput Sequencing
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsAlternative splicing refers to the production of multiple mRNA isoforms from a single gene due to alternative selection of exons or splice sites during pre-mRNA splicing. While canonical alternative splicing produces a linear form of RNA by joining an ...
Micro-Variations from RNA-seq Experiments for Non-model Organisms
Bioinformatics and Biomedical EngineeringAbstractRNA-based high-throughput sequencing technologies provide a huge amount of reads from transcripts. In addition to expression analyses, transcriptome reconstruction, or isoform detection, they could be useful for detection of gene variations, in ...
Comments