Skip to main content

Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6577))

Abstract

Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no gaurantees on the quality of the solution. In this work we explored the feasibility of an exact solution for scaffolding and present a first fixed-parameter tractable solution for assembly (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ng, P., Tan, J.J., Ooi, H.S., et al.: Multiplex sequencing of paired-end ditags (MS-PET): A strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Research 34, e84 (2006)

    Article  Google Scholar 

  2. Eid, J., Fehr, A., Gray, J., et al.: Real-time DNA sequencing from single polymerase molecules. Science 323(5910), 133–138 (2009)

    Article  Google Scholar 

  3. Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics 11(345) (2010)

    Google Scholar 

  4. Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Research 19, 336–346 (2009)

    Article  Google Scholar 

  5. Zerbino, D.R., McEwen, G.K., Marguiles, E.H., Birney, E.: Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12) (2009)

    Google Scholar 

  6. Huson, D.H., Reinert, K., Myers, E.W.: The greedy path-merging algorithm for contig scaffolding. Journal of the ACM 49(5), 603–615 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Myers, E.W., Sutton, G.G., Delcher, A.L., et al.: A whole-genome assembly of Drosophila. Science 287(5461), 2196–2204 (2000)

    Article  Google Scholar 

  8. Kent, W.J., Haussler, D.: Assembly of the working draft of the human genome with GigAssembler. Genome Research 11, 1541–1548 (2001)

    Article  Google Scholar 

  9. Pevzner, P.A., Tang, H.: Fragment assembly with double-barreled data. Bioinformatics 17(S1), 225–233 (2001)

    Article  Google Scholar 

  10. Pop, M., Kosack, S.D., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Research 14, 149–159 (2004)

    Article  Google Scholar 

  11. Mullikin, J.C., Ning, Z.: The phusion assembler. Genome Research 13, 81–90 (2003)

    Article  Google Scholar 

  12. Jaffe, D.B., Butler, J., Gnerre, S., et al.: Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Research 13, 91–96 (2003)

    Article  Google Scholar 

  13. Aparicio, S., Chapma, J., Stupka, E., et al.: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301–1310 (2002)

    Article  Google Scholar 

  14. Pop, M., Phillipy, A., Delcher, A.L., Salzberg, S.L.: Comparative genome assembly. Briefings in Bioinformatics 5(3), 237–248 (2004)

    Article  Google Scholar 

  15. Richter, D.C., Schuster, S.C., Huson, D.H.: OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics 23(13), 1573–1579 (2007)

    Article  Google Scholar 

  16. Husemann, P., Stoye, J.: Phylogenetic comparative assembly. Algorithms for Molecular Biology 5(3) (2010)

    Google Scholar 

  17. Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)

    Article  Google Scholar 

  18. Pop, M.: Shotgun sequence assembly. Advances in Computers 60 (2004)

    Google Scholar 

  19. Saxe, J.: Dynamic programming algorithms for recognizing small-bandwidth graphs in polynomial time. SIAM J. on Algebraic and Discrete Methodd 1(4), 363–369 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  20. Goldfarb, D., Idnani, A.: A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming 27 (1983)

    Google Scholar 

  21. Richter, D.C., Ott, F., Schmid, R., Huson, D.H.: Metasim: a sequencing simulator for genomics and metagenomics. PloS One 3(10) (2008)

    Google Scholar 

  22. MacCallum, I., Przybylksi, D., Gnerre, S., et al.: ALLPATHS2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biology 10, R103 (2009)

    Article  Google Scholar 

  23. Nandi, T., Ong, C., Singh, A.P., et al.: A genomic survey of positive selection in Burkholderia pseudomallei provides insights into the evolution of accidental virulence. PLoS Pathogens 6(4) (2010)

    Google Scholar 

  24. Kurtz, S.A., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5, R12 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, S., Nagarajan, N., Sung, WK. (2011). Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20036-6_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20035-9

  • Online ISBN: 978-3-642-20036-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics