Abstract
Advances in the DNA sequencing technology over the past decades have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many software tools for assembly of sequenced genomic material, they often experience difficulties with reconstructing complete chromosomes. Major obstacles include uneven read coverage and long similar subsequences (repeats) in genomes. Assemblers therefore often are able to reliably reconstruct only long subsequences, called scaffolds.
We present a method for simultaneous co-assembly of all fragmented genomes (represented as collections of scaffolds rather than chromosomes) in a given set of annotated genomes. The method is based on the analysis of gene orders and relies on the evolutionary model, which includes genome rearrangements as well as gene insertions and deletions. It can also utilize information about genomic repeats and the phylogenetic tree of the given genomes, further improving their assembly quality.
The work is supported by the National Science Foundation under Grant No. IIS-1462107.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Since each singleton multicolor mc (i.e., \(|mc|=1\)) is T-consistent, our new method involves independent assembly of single genomes as a particular case.
- 2.
The value of \(\varDelta =1\) corresponds to a typical fusion. We consider potential assembly only if it could achieve a better gain in the evolutionary score than a fusion.
- 3.
Genome Fc is omitted to simulate the case when no closely related reference genome is available.
References
Aganezov, S., Sydtnikova, N., AGC Consortium, Alekseyev, M.A.: Scaffold assembly based on genome rearrangement analysis. Comput. Biol. Chem. 57, pp. 46–53 (2015)
Alekseyev, M.A., Pevzner, P.A.: Breakpoint graphs and ancestral genome reconstructions. Genome Res. 19(5), 943–957 (2009)
Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 16(Suppl. 10), S11 (2015)
Assour, L., Emrich, S.: Multi-genome synteny for assembly improvement. In: Proceedings of 7th International Conference on Bioinformatics and Computational Biology, pp. 193–199 (2015)
Avdeyev, P., Jiang, S., Aganezov, S., Hu, F., Alekseyev, M.A.: Reconstruction of ancestral genomes in presence of gene gain and loss. J. Comput. Biol. 23(3), 1–15 (2016)
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)
Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)
Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., et al.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. 108(4), 1513–1518 (2011)
Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)
Kasprzyk, A.: BioMart: driving a paradigm change in biological data management. Database 2011, bar049 (2011)
Megy, K., Emrich, S.J., Lawson, D., Campbell, D., Dialynas, E., Hughes, D.S., Koscielny, G., Louis, C., MacCallum, R.M., Redmond, S.N., et al.: VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic Acids Res. 40(D1), D729–D734 (2012)
Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., et al.: Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347(6217), 1258522 (2015)
Smit, A., Hubley, R., Green, P.: RepeatMasker Open-3.0 (1996–2010). http://www.repeatmasker.org
The GFA Format Specification Working Group: Graphical Fragment Assembly (GFA) Format Specification. https://github.com/pmelsted/GFA-spec
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Aganezov, S., Alekseyev, M.A. (2016). Multi-genome Scaffold Co-assembly Based on the Analysis of Gene Orders and Genomic Repeats. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2016. Lecture Notes in Computer Science(), vol 9683. Springer, Cham. https://doi.org/10.1007/978-3-319-38782-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-38782-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38781-9
Online ISBN: 978-3-319-38782-6
eBook Packages: Computer ScienceComputer Science (R0)