Abstract
The Scaffolding problem in bioinformatics, aims to complete the contig assembly process by determining the relative position and orientation of these contigs. Modeled as a combinatorial optimization problem in a graph named scaffold graph, this problem is \(\mathcal {NP}\)-hard and its exact resolution is generally impossible on large instances. Hence, heuristics like polynomial-time approximation algorithms remain the only possibility to propose a solution. In general, even in the case where we know a constant guaranteed approximation ratio, it is impossible to know if the solution proposed by the algorithm is close to the optimal, or close to the bound defined by this ratio. In this paper we present a measure, associated to a greedy algorithm, determining an upper bound on the score of the optimal solution. This measure, depending on the instance, guarantees a – non constant – ratio for the greedy algorithm on this instance. We prove that this measure is a fine upper bound on optimal score, we perform experiments on real instances and show that the greedy algorithm yields near from optimal solutions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This means that instances for which the computed solution has a third of the optimal weight exist. It does not exclude better approximation algorithms.
- 2.
- 3.
References
Barvinok, A., Gimadi, E.K., Serdyukov, A.I.: The maximum TSP. In: Gutin, G., Punnen, A.P. (eds.) The Traveling Salesman Problem and Its Variations. Combinatorial Optimization, pp. 585–607. Springer, Heidelberg (2007)
Chateau, A., Giroudeau, R.: A complexity and approximation framework for the maximization scaffolding problem. Theor. Comput. Sci. 595, 92–106 (2015). http://dx.doi.org/10.1016/j.tcs.2015.06.023
Chen, Z.-Z., Harada, Y., Machida, E., Guo, F., Wang, L.: Better approximation algorithms for scaffolding problems. In: Zhu, D., Bereg, S. (eds.) FAW 2016. LNCS, vol. 9711, pp. 17–28. Springer, Heidelberg (2016). http://dx.doi.org/10.1007/978-3-319-39817-4_3
Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 236–248. Springer, Heidelberg (2012)
Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11(1), 1–21 (2010)
Donmez, N., Brudno, M.: SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428–434 (2013)
Ferdjoukh, A., Bourreau, E., Chateau, A., Nebut, C.: A model-driven approach to generate relevant and realistic datasets. In: SEKE, pp. 105–109. KSI Research Inc. and Knowledge Systems Institute Graduate School (2016)
Gao, S., Sung, W.-K., Nagarajan, N.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. 18(11), 1681–1691 (2011)
Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)
Hunt, M., Newbold, C., Berriman, M., Otto, T.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), 1–15 (2014). doi:10.1186/gb-2014-15-3-r42. http://dx.doi.org/10.1186/gb-2014-15-3-r42
Huson, D.H., Reinert, K., Myers, E.W.: The greedy path-merging algorithm for contig scaffolding. J. ACM (JACM) 49(5), 603–615 (2002)
Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)
Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010). doi:10.1093/bioinformatics/btp698. http://dx.doi.org/10.1093/bioinformatics/btp698
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.T., Abecasis, G.R., Durbin, R.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J., Arvestad, L.: BESST - efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15(1), 281 (2014). ISSN 1471–2105
Salmela, L., Mäkinen, V., Välimäki, N., Ylinen, J., Ukkonen, E.: Fast scaffolding with small independent mixed integer programs. Bioinformatics 27(23), 3259–3265 (2011)
Weller, M., Chateau, A., Dallard, C., Giroudeau, R.: Scaffolding problems revisited: complexity, approximation and fixed parameter tractable algorithms, and some special cases. In: (2016, revision)
Weller, M., Chateau, A., Giroudeau, R.: Exact approaches for scaffolding. BMC Bioinform. 16(14), S2 (2015). ISSN 1471–2105
Weller, M., Chateau, A., Giroudeau, R.: On the complexity of scaffolding problems: from cliques to sparse graphs. In: Lu, Z., Kim, D., Wu, W., Li, W., Du, D.-Z. (eds.) COCOA 2015. LNCS, vol. 9486, pp. 409–423. Springer, Heidelberg (2015)
Acknowledgments
This work was partially founded by the “Projet Investissement d’Avenir” Institut de Biologie Computationnelle. We also like to thank Anne Dievart and Julien Frouin from CIRAD, for their interest to our work and the Azucena Rice illumina reads library.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Dallard, C., Weller, M., Chateau, A., Giroudeau, R. (2016). Instance Guaranteed Ratio on Greedy Heuristic for Genome Scaffolding. In: Chan, TH., Li, M., Wang, L. (eds) Combinatorial Optimization and Applications. COCOA 2016. Lecture Notes in Computer Science(), vol 10043. Springer, Cham. https://doi.org/10.1007/978-3-319-48749-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-48749-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48748-9
Online ISBN: 978-3-319-48749-6
eBook Packages: Computer ScienceComputer Science (R0)