Skip to main content

Instance Guaranteed Ratio on Greedy Heuristic for Genome Scaffolding

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10043))

Abstract

The Scaffolding problem in bioinformatics, aims to complete the contig assembly process by determining the relative position and orientation of these contigs. Modeled as a combinatorial optimization problem in a graph named scaffold graph, this problem is \(\mathcal {NP}\)-hard and its exact resolution is generally impossible on large instances. Hence, heuristics like polynomial-time approximation algorithms remain the only possibility to propose a solution. In general, even in the case where we know a constant guaranteed approximation ratio, it is impossible to know if the solution proposed by the algorithm is close to the optimal, or close to the bound defined by this ratio. In this paper we present a measure, associated to a greedy algorithm, determining an upper bound on the score of the optimal solution. This measure, depending on the instance, guarantees a – non constant – ratio for the greedy algorithm on this instance. We prove that this measure is a fine upper bound on optimal score, we perform experiments on real instances and show that the greedy algorithm yields near from optimal solutions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This means that instances for which the computed solution has a third of the optimal weight exist. It does not exclude better approximation algorithms.

  2. 2.

    http://www.ncbi.nlm.nih.gov/.

  3. 3.

    http://www.lirmm.fr/~ferdjoukh/english/research.html.

References

  1. Barvinok, A., Gimadi, E.K., Serdyukov, A.I.: The maximum TSP. In: Gutin, G., Punnen, A.P. (eds.) The Traveling Salesman Problem and Its Variations. Combinatorial Optimization, pp. 585–607. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Chateau, A., Giroudeau, R.: A complexity and approximation framework for the maximization scaffolding problem. Theor. Comput. Sci. 595, 92–106 (2015). http://dx.doi.org/10.1016/j.tcs.2015.06.023

    Article  MathSciNet  MATH  Google Scholar 

  3. Chen, Z.-Z., Harada, Y., Machida, E., Guo, F., Wang, L.: Better approximation algorithms for scaffolding problems. In: Zhu, D., Bereg, S. (eds.) FAW 2016. LNCS, vol. 9711, pp. 17–28. Springer, Heidelberg (2016). http://dx.doi.org/10.1007/978-3-319-39817-4_3

    Chapter  Google Scholar 

  4. Chikhi, R., Rizk, G.: Space-efficient and exact de bruijn graph representation based on a bloom filter. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 236–248. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  5. Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11(1), 1–21 (2010)

    Article  Google Scholar 

  6. Donmez, N., Brudno, M.: SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29(4), 428–434 (2013)

    Article  Google Scholar 

  7. Ferdjoukh, A., Bourreau, E., Chateau, A., Nebut, C.: A model-driven approach to generate relevant and realistic datasets. In: SEKE, pp. 105–109. KSI Research Inc. and Knowledge Systems Institute Graduate School (2016)

    Google Scholar 

  8. Gao, S., Sung, W.-K., Nagarajan, N.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. 18(11), 1681–1691 (2011)

    Article  MathSciNet  Google Scholar 

  9. Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)

    Article  Google Scholar 

  10. Hunt, M., Newbold, C., Berriman, M., Otto, T.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), 1–15 (2014). doi:10.1186/gb-2014-15-3-r42. http://dx.doi.org/10.1186/gb-2014-15-3-r42

    Article  Google Scholar 

  11. Huson, D.H., Reinert, K., Myers, E.W.: The greedy path-merging algorithm for contig scaffolding. J. ACM (JACM) 49(5), 603–615 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)

    Article  Google Scholar 

  13. Li, H., Durbin, R.: Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010). doi:10.1093/bioinformatics/btp698. http://dx.doi.org/10.1093/bioinformatics/btp698

    Article  Google Scholar 

  14. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G.T., Abecasis, G.R., Durbin, R.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)

    Article  Google Scholar 

  15. Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J., Arvestad, L.: BESST - efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15(1), 281 (2014). ISSN 1471–2105

    Article  Google Scholar 

  16. Salmela, L., Mäkinen, V., Välimäki, N., Ylinen, J., Ukkonen, E.: Fast scaffolding with small independent mixed integer programs. Bioinformatics 27(23), 3259–3265 (2011)

    Article  Google Scholar 

  17. Weller, M., Chateau, A., Dallard, C., Giroudeau, R.: Scaffolding problems revisited: complexity, approximation and fixed parameter tractable algorithms, and some special cases. In: (2016, revision)

    Google Scholar 

  18. Weller, M., Chateau, A., Giroudeau, R.: Exact approaches for scaffolding. BMC Bioinform. 16(14), S2 (2015). ISSN 1471–2105

    Article  Google Scholar 

  19. Weller, M., Chateau, A., Giroudeau, R.: On the complexity of scaffolding problems: from cliques to sparse graphs. In: Lu, Z., Kim, D., Wu, W., Li, W., Du, D.-Z. (eds.) COCOA 2015. LNCS, vol. 9486, pp. 409–423. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was partially founded by the “Projet Investissement d’Avenir” Institut de Biologie Computationnelle. We also like to thank Anne Dievart and Julien Frouin from CIRAD, for their interest to our work and the Azucena Rice illumina reads library.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mathias Weller or Annie Chateau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Dallard, C., Weller, M., Chateau, A., Giroudeau, R. (2016). Instance Guaranteed Ratio on Greedy Heuristic for Genome Scaffolding. In: Chan, TH., Li, M., Wang, L. (eds) Combinatorial Optimization and Applications. COCOA 2016. Lecture Notes in Computer Science(), vol 10043. Springer, Cham. https://doi.org/10.1007/978-3-319-48749-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48749-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48748-9

  • Online ISBN: 978-3-319-48749-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics