skip to main content
10.1145/2616498.2616502acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight

Published: 13 July 2014 Publication History

Abstract

Wheat, corn, and rice provide 60 percent of the world's food intake every day, and just 15 plant species make up 90 percent of the world's food intake. As such there is tremendous agricultural and scientific interest to sequence and study plant genomes, especially to develop a reference sequence to direct plant breeding or to identify functional elements. DNA sequencing technologies can now generate sequence data for large genomes at low cost, however, it remains a substantial computational challenge to assemble the short sequencing reads into their complete genome sequences. Even one of the simpler ancestral species of wheat, Aegilops tauschii, has a genome size of 4.36 gigabasepairs (Gbp), nearly fifty percent larger than the human genome. Assembling a genome this size requires computational resources, especially RAM to store the large assembly graph, out of reach for most institutions. In this paper, we describe a collaborative effort between Cold Spring Harbor Laboratory and the Pittsburgh Supercomputing Center to assemble large, complex cereal genomes starting with Ae. tauschii, using the XSEDE shared memory supercomputer Blacklight. We expect these experiences using Blacklight to provide a case study and computational protocol for other genomics communities to leverage this or similar resources for assembly of other significant genomes of interest.

References

[1]
The International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860--921.
[2]
Illumina. HiSeq 2500 Sequencing System Specifications. Available from: http://www.illumina.com/Documents/%5Cproducts%5Cappnotes%5Cappnote_hiseq2500.pdf.
[3]
Schatz, M., J. Witkowski, and W.R. McCombie, Current challenges in de novo plant genome sequencing and assembly. Genome Biology, 2012. 13(4): p. 243. Tavel, P. 2007. Modeling and Simulation Design. AK Peters Ltd., Natick, MA.
[4]
Dimensions of Need - Staple foods: What do people eat. United Nations Food and Agriculture Organization: Agriculture and Consumer Protection; Available from: http://www.fao.org/docrep/u8480e/u8480e07.htm.
[5]
Sanger, F., A.R. Coulson, G.F. Hong, D.F. Hill, and G.B. Petersen, Nucleotide sequence of bacteriophage lambda DNA. J Mol Biol, 1982. 162(4): p. 729--73.
[6]
Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F. Kirkness, et al., Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 1995. 269(5223): p. 496--512.
[7]
Schatz, M.C., A.L. Delcher, and S.L. Salzberg, Assembly of large genomes using second-generation sequencing. Genome Res, 2010. 20(9): p. 1165--73.
[8]
Nagarajan, N. and M. Pop, Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of computational biology: a journal of computational molecular cell biology, 2009. 16(7): p. 897--908.
[9]
Earl, D., K. Bradnam, J. St John, A. Darling, D. Lin, et al., Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome research, 2011. 21(12): p. 2224--41.
[10]
Bradnam, K.R., et al., Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience, 2013. 2(1): p. 10.
[11]
Salzberg, S.L., A.M. Phillippy, A. Zimin, D. Puiu, T. Magoc, et al., GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome research, 2012. 22(3): p. 557--67.
[12]
Gnerre, S., I. Maccallum, D. Przybylski, F.J. Ribeiro, J.N. Burton, et al., High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America, 2011. 108(4): p. 1513--8.
[13]
Simpson, J.T., K. Wong, S.D. Jackman, J.E. Schein, S.J. Jones, et al., ABySS: A parallel assembler for short read sequence data. Genome Res, 2009.
[14]
Li, R., H. Zhu, J. Ruan, W. Qian, X. Fang, et al., De novo assembly of human genomes with massively parallel short read sequencing. Genome Res, 2009.
[15]
Jia, J., et al., Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature, 2013. 496(7443): p. 91--5.
[16]
Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol, 2011. 29(7): p. 644--52.
[17]
Brian Couger, et al., Enabling large-scale next-generation sequence assembly with Blacklight. Concurrency and Computation: Practice and Experience, 2014.
[18]
Goff, S.A., D. Ricke, T.H. Lan, G. Presting, R. Wang, et al., A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 2002. 296(5565): p. 92--100.
[19]
http://goo.gl/w7qNJQ

Index Terms

  1. Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
    July 2014
    445 pages
    ISBN:9781450328937
    DOI:10.1145/2616498
    • General Chair:
    • Scott Lathrop,
    • Program Chair:
    • Jay Alameda
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    In-Cooperation

    • NSF: National Science Foundation
    • Drexel University
    • Indiana University: Indiana University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DNA Sequencing
    2. Genome Assembly
    3. NGS
    4. Plant Genomics
    5. data-intensive computing
    6. high-performance computing
    7. shared memory

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    XSEDE '14

    Acceptance Rates

    XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
    Overall Acceptance Rate 129 of 190 submissions, 68%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 130
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media