skip to main content
10.1145/2147805.2147810acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Using inversion signatures to generate draft genome sequence scaffolds

Published: 01 August 2011 Publication History

Abstract

We present a linear-time algorithm that can generate a contig scaffold for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, although in this general case there is no guarantee that the scaffold will be completely correct. We compare the performance of SIS, the program that implements the algorithm, to five other scaffold-generating programs. The results from two batches of tests using real genomes and artificial contig boundaries show that SIS has significantly better performance.

References

[1]
S. Altschul, T. Madden, A. Schäffer, J. Zhang, Z. Z. W. Miller, and D. L. DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25(17):3389--3402, 1997.
[2]
S. Assefa, T. M. Keane, T. D. Otto, C. Newbold, and M. Berriman. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics, 25:1968--1969, 2009.
[3]
S. Batzoglou, D. B. Jaffe, K. Stanley, J. Butler, S. Gnerre, E. Mauceli, B. Berger, J. P. Mesirov, and E. S. Lander. ARACHNE: a whole-genome shotgun assembler. Genome Res., 12:177--189, 2002.
[4]
M. Boetzer, C. V. Henkel, H. J. Jansen, D. Butler, and W. Pirovano. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics, 27:578--579, 2011.
[5]
J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, C. Nusbaum, and D. B. Jaffe. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res., 18:810--820, 2008.
[6]
A. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna. GRIL: genome rearrangement and inversion locator. Bioinformatics, 20(1):122--124, 2004.
[7]
A. E. Darling, B. Mau, F. R. Blattner, and N. T. Perna. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res., 14:1394--1403, 2004.
[8]
A. E. Darling, I. Miklós, and M. A. Ragan. Dynamics of genome rearrangement in bacterial populations. PLoS Genet., 4(7):e1000128, 2008.
[9]
A. Dayarian, T. P. Michael, and A. M. Sengupta. SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics, 11:345, 2010.
[10]
M. Deloger, M. El Karoui, and M. A. Petit. A genomic distance based on MUM indicates discontinuity between most bacterial species and genera. J. Bacteriol., 191:91--99, 2009.
[11]
J. A. Eisen, J. F. Heidelberg, O. White, and S. L. Salzberg. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol., 1(6):research0011.1--0011.9, 2000.
[12]
X. Huang, J. Wang, S. Aluru, S. P. Yang, and L. Hillier. PCAP: a whole-genome assembly program. Genome Res., 13:2164--2170, 2003.
[13]
P. Husemann and J. Stoye. r2cat: synteny plots and comparative assembly. Bioinformatics, 26:570--571, 2010.
[14]
D. H. Huson, K. Reinert, and E. W. Myers. The greedy path-merging algorithm for contig scaffolding. J. ACM, 49:603--615, 2002.
[15]
W. J. Kent. BLAT-the BLAST-like alignment tool. Genome Res., 12:656--664, 2002.
[16]
S. Kurtz, A. Phillippy, A. L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S. L. Salzberg. Versatile and open software for comparing large genomes. Genome Biol., 5(2):R12, 2004.
[17]
I. Maccallum et al. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol., 10:R103, 2009.
[18]
A. Muñoz, C. Zheng, Q. Zhu, V. Albert, S. Rounsley and D. Sankoff. Scaffold Filling, Contig Fusion and Comparative Gene Order Inference. BMC Bioinformatics, 11:304, 2010.
[19]
E. W. Myers et al.. A whole-genome assembly of Drosophila. Science, 287:2196--2204, 2000.
[20]
N. Nagarajan, T. D. Read, and M. Pop. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics, 24:1229--1235, 2008.
[21]
M. Pop, D. S. Kosack, and S. L. Salzberg. Hierarchical scaffolding with Bambus. Genome Res., 14:149--159, 2004.
[22]
D. C. Richter, S. C. Schuster, and D. H. Huson. OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics, 23:1573--1579, 2007.
[23]
A. I. Rissman, B. Mau, B. S. Biehl, A. E. Darling, J. D. Glasner, and N. T. Perna. Reordering contigs of draft genomes using the Mauve aligner. Bioinformatics, 25:2071--2073, 2009.
[24]
K. Swenson and B. Moret. Inversion-based genomic signatures. BMC Bioinformatics, 10(Suppl. 1):S7, 2009.
[25]
A. Valouev, Y. Zhang, D. C. Schwartz, and M. S. Waterman. Refinement of optical map assemblies. Bioinformatics, 22:1217--1224, 2006.
[26]
S. A. van Hijum, A. L. Zomer, O. P. Kuipers, and J. Kok. Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Res., 33:W560--566, 2005.
[27]
R. L. Warren, D. Varabei, D. Platt, X. Huang, D. Messina, S. P. Yang, J. W. Kronstad, M. Krzywinski, W. C. Warren, J. W. Wallis, L. W. Hillier, A. T. Chinwalla, J. E. Schein, A. S. Siddiqui, M. A. Marra, R. K. Wilson, and S. J. Jones. Physical map-assisted whole-genome shotgun sequence assemblies. Genome Res., 16:768--775, 2006.

Cited By

View all
  • (2012)SIS: a program to generate draft genome sequence scaffolds for prokaryotesBMC Bioinformatics10.1186/1471-2105-13-9613:1Online publication date: 14-May-2012

Index Terms

  1. Using inversion signatures to generate draft genome sequence scaffolds

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
        August 2011
        688 pages
        ISBN:9781450307963
        DOI:10.1145/2147805
        • General Chairs:
        • Robert Grossman,
        • Andrey Rzhetsky,
        • Program Chairs:
        • Sun Kim,
        • Wei Wang
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 August 2011

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. DNA assembly
        2. genome inversions
        3. scaffold reconstruction

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        BCB' 11
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2012)SIS: a program to generate draft genome sequence scaffolds for prokaryotesBMC Bioinformatics10.1186/1471-2105-13-9613:1Online publication date: 14-May-2012

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media