ABSTRACT
Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.
- The International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860--921.Google Scholar
- Illumina. HiSeq 2500 Sequencing System Specifications. Available from: http://www.illumina.com/Documents/%5Cproducts%5Cappnotes%5Cappnote_hiseq2500.pdf.Google Scholar
- Schatz, M., J. Witkowski, and W.R. McCombie, Current challenges in de novo plant genome sequencing and assembly. Genome Biology, 2012. 13(4): p. 243. Tavel, P. 2007. Modeling and Simulation Design. AK Peters Ltd., Natick, MA.Google ScholarCross Ref
- Dimensions of Need - Staple foods: What do people eat. United Nations Food and Agriculture Organization: Agriculture and Consumer Protection; Available from: http://www.fao.org/docrep/u8480e/u8480e07.htm.Google Scholar
- Sanger, F., A.R. Coulson, G.F. Hong, D.F. Hill, and G.B. Petersen, Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol, 1982. 162(4): p. 729--73.Google Scholar
- Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, et al., Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269(5223): p. 496--512.Google Scholar
- Schatz, M.C., A.L. Delcher, and S.L. Salzberg, Assembly of large genomes using second-generation sequencing. Genome Res, 2010. 20(9): p. 1165--73.Google Scholar
- Nagarajan, N. and M. Pop, Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of computational biology: a journal of computational molecular cell biology, 2009. 16(7): p. 897--908.Google Scholar
- Earl, D., K. Bradnam, J. St John, A. Darling, D. Lin, et al., Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome research, 2011. 21(12): p. 2224--41.Google Scholar
- Bradnam, K.R., et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience, 2013. 2(1): p. 10.Google ScholarCross Ref
- Salzberg, S.L., A.M. Phillippy, A. Zimin, D. Puiu, T. Magoc, et al., GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome research, 2012. 22(3): p. 557--67.Google Scholar
- Gnerre, S., I. Maccallum, D. Przybylski, F.J. Ribeiro, J.N. Burton, et al., High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America, 2011. 108(4): p. 1513--8.Google Scholar
- Simpson, J.T., K. Wong, S.D. Jackman, J.E. Schein, S.J. Jones, et al., ABySS: A parallel assembler for short read sequence data. Genome Res, 2009.Google ScholarCross Ref
- Li, R., H. Zhu, J. Ruan, W. Qian, X. Fang, et al., De novo assembly of human genomes with massively parallel short read sequencing. Genome Res, 2009.Google Scholar
- Jia, J., et al., Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature, 2013. 496(7443): p. 91--5.Google Scholar
- Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011. 29(7): p. 644--52.Google Scholar
- Brian Couger, et al., Enabling large-scale next-generation sequence assembly with Blacklight. Concurrency and Computation: Practice and Experience, 2014. doi: 10.1002/cpe.3231.Google ScholarCross Ref
- Goff, S.A., D. Ricke, T.H. Lan, G. Presting, R. Wang, et al., A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 2002. 296(5565): p. 92--100.Google Scholar
- http://goo.gl/w7qNJQGoogle Scholar
Index Terms
- Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight
Recommendations
Enabling large-scale next-generation sequence assembly with Blacklight
A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic ...
Enabling large-scale next-generation sequence assembly with Blacklight
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryA variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic ...
From NGS assembly challenges to instability of fungal mitochondrial genomes
Graphical abstractMitochondrial genomes can contain repeat landscapes ranging from notable absence of repeats, as in human and fission yeast, to rich and complex repeat systems as in baker's yeast. In this article we characterize exact repetitions of 17-...
Comments