Skip to main content

Multi-genome Scaffold Co-assembly Based on the Analysis of Gene Orders and Genomic Repeats

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9683))

Included in the following conference series:

Abstract

Advances in the DNA sequencing technology over the past decades have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many software tools for assembly of sequenced genomic material, they often experience difficulties with reconstructing complete chromosomes. Major obstacles include uneven read coverage and long similar subsequences (repeats) in genomes. Assemblers therefore often are able to reliably reconstruct only long subsequences, called scaffolds.

We present a method for simultaneous co-assembly of all fragmented genomes (represented as collections of scaffolds rather than chromosomes) in a given set of annotated genomes. The method is based on the analysis of gene orders and relies on the evolutionary model, which includes genome rearrangements as well as gene insertions and deletions. It can also utilize information about genomic repeats and the phylogenetic tree of the given genomes, further improving their assembly quality.

The work is supported by the National Science Foundation under Grant No. IIS-1462107.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since each singleton multicolor mc (i.e., \(|mc|=1\)) is T-consistent, our new method involves independent assembly of single genomes as a particular case.

  2. 2.

    The value of \(\varDelta =1\) corresponds to a typical fusion. We consider potential assembly only if it could achieve a better gain in the evolutionary score than a fusion.

  3. 3.

    Genome Fc is omitted to simulate the case when no closely related reference genome is available.

References

  1. Aganezov, S., Sydtnikova, N., AGC Consortium, Alekseyev, M.A.: Scaffold assembly based on genome rearrangement analysis. Comput. Biol. Chem. 57, pp. 46–53 (2015)

    Google Scholar 

  2. Alekseyev, M.A., Pevzner, P.A.: Breakpoint graphs and ancestral genome reconstructions. Genome Res. 19(5), 943–957 (2009)

    Article  Google Scholar 

  3. Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 16(Suppl. 10), S11 (2015)

    Article  Google Scholar 

  4. Assour, L., Emrich, S.: Multi-genome synteny for assembly improvement. In: Proceedings of 7th International Conference on Bioinformatics and Computational Biology, pp. 193–199 (2015)

    Google Scholar 

  5. Avdeyev, P., Jiang, S., Aganezov, S., Hu, F., Alekseyev, M.A.: Reconstruction of ancestral genomes in presence of gene gain and loss. J. Comput. Biol. 23(3), 1–15 (2016)

    Article  Google Scholar 

  6. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)

    Article  MathSciNet  Google Scholar 

  7. Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)

    Article  Google Scholar 

  8. Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., et al.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. 108(4), 1513–1518 (2011)

    Article  Google Scholar 

  9. Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)

    Article  Google Scholar 

  10. Kasprzyk, A.: BioMart: driving a paradigm change in biological data management. Database 2011, bar049 (2011)

    Article  Google Scholar 

  11. Megy, K., Emrich, S.J., Lawson, D., Campbell, D., Dialynas, E., Hughes, D.S., Koscielny, G., Louis, C., MacCallum, R.M., Redmond, S.N., et al.: VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic Acids Res. 40(D1), D729–D734 (2012)

    Article  Google Scholar 

  12. Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., et al.: Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347(6217), 1258522 (2015)

    Article  Google Scholar 

  13. Smit, A., Hubley, R., Green, P.: RepeatMasker Open-3.0 (1996–2010). http://www.repeatmasker.org

  14. The GFA Format Specification Working Group: Graphical Fragment Assembly (GFA) Format Specification. https://github.com/pmelsted/GFA-spec

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max A. Alekseyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Aganezov, S., Alekseyev, M.A. (2016). Multi-genome Scaffold Co-assembly Based on the Analysis of Gene Orders and Genomic Repeats. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2016. Lecture Notes in Computer Science(), vol 9683. Springer, Cham. https://doi.org/10.1007/978-3-319-38782-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38782-6_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38781-9

  • Online ISBN: 978-3-319-38782-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics