Abstract
While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt the scaffold graph approach. However, constructing an accurate scaffold graph is still a challenge task. Removing potential false relationships is a key to achieve a better scaffolding performance, while most scaffolding approaches neglect the impacts of uneven sequencing depth that may cause more sequencing errors, and finally result in many false relationships. In this paper, we present a new scaffolding method LSLS (Loose-Strict-Loose Scaffolding), which is based on path extension. LSLS uses different strategies to extend paths, which can be more adaptive to different sequencing depths. For the problem of multiple paths, we designed a score function, which is based on the distribution of read pairs, to evaluate the reliability of path candidates and extend them with the paths which have the highest score. Besides, LSLS contains a new gap estimation method, which can estimate gap sizes more precisely. The experiment results on the two standard datasets show that LSLS can get better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Voelkerding, K.V., Dames, S.A., Durtschi, J.D.: Next-generation sequencing: from basic research to diagnostics. Clin. Chem. 55(4), 641–658 (2009)
Luo, J., Wang, J., Zhang, Z., Wu, F.X., Li, M., Pan, Y.: Epga: de novo assembly using the distributions of reads and insert size. Bioinformatics 31(6), 825–833 (2015)
Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., Ridder, D.D.: Grass: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429 (2012)
Salmela, L., Mäkinen, V., Välimäki, N., Ylinen, J., Ukkonen, E.: Fast scaffolding with small independent mixed integer programs. Bioinformatics 27(23), 3259–3265 (2011)
Dayarian, A., Michael, T.P., Sengupta, A.M.: Sopra: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11(1), 345 (2010)
Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)
Donmez, N., Brudno, M.: Scarpa: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428 (2013)
Gao, S., Nagarajan, N., Sung, W.K.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. J. Comput. Mol. Cell Biol. 18(11), 1681–1691 (2011)
Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)
Simpson, J.T., Wong, K., Jackman, S.D., et al.: Abyss: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117 (2009)
Mandric, I., Zelikovsky, A.: ScaffMatch: scaffolding algorithm based on maximum weight matching. In: Przytycka, Teresa M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 222–223. Springer, Cham (2015). doi:10.1007/978-3-319-16706-0_22
Luo, J., Wang, J., Zhen, Z., Min, L., Wu, F.X.: Boss: a novel scaffolding algorithm based on an optimized scaffold graph. Bioinformatics 33, 169–176 (2016). btw597
Ariyaratne, P.N., Sung, W.K.: Pe-assembler: de novo assembler using short paired-end reads. Bioinformatics 27(2), 167 (2011)
Pop, M., Kosack, D.S., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Res. 14(1), 149–159 (2004)
Kent, W.J., Haussler, D.: Assembly of the working draft of the human genome with gigassembler. Genome Res. 11(9), 1541–1548 (2001)
Huson, D.H., Reinert, K., Myers, E.W.: The greedy path-merging algorithm for contig scaffolding. J. ACM 49(5), 603–615 (2002)
Min, L., Liao, Z., He, Y., Wang, J., Luo, J., Yi, P.: Isea: iterative seed-extension algorithm for de novo assembly using paired-end information and insert size distribution. IEEE/ACM Trans. Comput. Biol. Bioinform. PP(99), 1 (2016)
Hunt, M., et al.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), 1–15 (2014)
Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J., Arvestad, L.: Besst - efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15(1), 281 (2014)
Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., Pirovano, W.: Scaffolding pre-assembled contigs using sspace. Bioinformatics 27(4), 578–579 (2011)
Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., et al.: Soap2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, M. et al. (2017). LSLS: A Novel Scaffolding Method Based on Path Extension. In: Huang, DS., Jo, KH., Figueroa-GarcÃa, J. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10362. Springer, Cham. https://doi.org/10.1007/978-3-319-63312-1_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-63312-1_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63311-4
Online ISBN: 978-3-319-63312-1
eBook Packages: Computer ScienceComputer Science (R0)