Skip to main content

On the Shortest Common Superstring of NGS Reads

  • Conference paper
  • First Online:
Theory and Applications of Models of Computation (TAMC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10185))

  • 900 Accesses

Abstract

The Shortest Superstring Problem (SSP) consists, for a set of strings \(S = \{s_1,\cdots ,s_n\}\) (with no \(s_i\) substring of \(s_j\)), to find a minimum length string that contains all \(s_i, 1\le i \le n\), as substrings.

This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being \(2\frac{11}{30}\), which has been achieved through a long and difficult process. SSP is highly used in practice on Next Generation Sequencing (NGS) data, which plays an increasingly important role in modern biological and medical research. In this note, we show that on NGS data the SSP approximation ratio reached by the classical algorithm of Blum et al. [2], is usually below \(2\frac{11}{30}\), while assuming specific characteristics of the data that are experimentally verified on a large sampling set. Moreover, we present an efficient linear time test for any input of strings of equal length, which allows to compute the approximation ratio that can be reached using the classical algorithm in [2].

This work was supported by the PEPS INS2I-CNRS project CompX and by a Genotype to Phenotype project of the Life Sciences Department of University of Bordeaux.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Armen, C., Stein, C.: A \({2}{\frac{2}{3}}\) superstring approximation algorithm. Discrete Appl. Math. 88(1–3), 29–57 (1998). http://dx.doi.org/10.1016/S0166-218X(98)00065-1, http://www.sciencedirect.com/science/article/pii/S0166218X98000651, Computational Molecular Biology DAM - CMB Series

  2. Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41(4), 630–647 (1994). doi:10.1145/179812.179818, http://doi.acm.org/10.1145/179812.179818

  3. Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. J. Algorithms 24(2), 340–353 (1997). http://dx.doi.org/10.1006/jagm.1997.0861, http://www.sciencedirect.com/science/article/pii/S0196677497908610

  4. Armen, C., Stein, C.: Improved length bounds for the shortest superstring problem. In: Akl, S.G., Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1995. LNCS, vol. 955, pp. 494–505. Springer, Heidelberg (1995). doi:10.1007/3-540-60220-8_88

    Chapter  Google Scholar 

  5. Crochemore, M., Cygan, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Algorithms for three versions of the shortest common superstring problem. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 299–309. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13509-5_27

    Chapter  Google Scholar 

  6. Czumaj, A., Gçasieniec, L.: On the complexity of determining the period of a string. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 412–422. Springer, Heidelberg (2000). doi:10.1007/3-540-45123-4_34

    Chapter  Google Scholar 

  7. Czumaj, A., Gasieniec, L., Piotrów, M., Rytter, W.: Sequential and parallel approximation of shortest superstrings. J. Algorithms 23(1), 74–100 (1997). http://dx.doi.org/10.1006/jagm.1996.0823, http://www.sciencedirect.com/science/article/pii/S0196677496908238

  8. Ferragina, P., Landau, G., Ma, B.: Combinatorial pattern matching why greed works for shortest common superstring problem. Theor. Comput. Sci. 410(51), 5374–5381 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.014, http://www.sciencedirect.com/science/article/pii/S0304397509006410

  9. Fici, G., Kociumaka, T., Radoszewski, J., Rytter, W., Walen, T.: On the greedy algorithm for the shortest common superstring problem with reversals. Inf. Process. Lett. 116(3), 245–251 (2016). doi:10.1016/j.ipl.2015.11.015

  10. Gallant, J., Maier, D., Astorer, J.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980). http://dx.doi.org/10.1016/0022-0000(80)90004-5, http://www.sciencedirect.com/science/article/pii/0022000080900045

  11. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman & Co., New York (1990)

    MATH  Google Scholar 

  12. Golovnev, A., Kulikov, A.S., Mihajlin, I.: Approximating shortest superstring problem using de Bruijn graphs. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 120–129. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38905-4_13

    Chapter  Google Scholar 

  13. Guibas, L.J., Odlyzko, A.M.: Periods in strings. J. Comb. Theor. Ser. A 30(1), 19–42 (1981). http://dx.doi.org/10.1016/0097-3165(81)90038-8, http://www.sciencedirect.com/science/article/pii/0097316581900388

  14. Holub, S., Shallit, J.: Periods and borders of random words. In: STACS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol. 47, pp. 44:1–44:10 (2016)

    Google Scholar 

  15. Kaplan, H., Shafrir, N.: The greedy algorithm for shortest superstrings. Inf. Process. Lett. 93(1), 13–17 (2005). doi:10.1016/j.ipl.2004.09.012

  16. Karpinski, M., Schmied, R.: Improved inapproximability results for the shortest superstring and related problems. In: CATS, Australian Computer Society, CRPIT 2013, vol. 141, pp. 27–36 (2013)

    Google Scholar 

  17. Kosaraju, S.R., Park, J.K., Stein, C.: Long tours and short superstrings. In: 1994 Proceedings of 35th Annual Symposium on Foundations of Computer Science, pp 166–177 (1994). doi:10.1109/SFCS.1994.365696

  18. Li, M.: Towards a DNA sequencing theory (learning a string). In: Proceedings of the 31st Symposium on the Foundations of Computer Science, pp. 125–134. IEEE Computer Society Press, Los Alamitos (1990)

    Google Scholar 

  19. Mucha, M.: Lyndon words and short superstrings. In: SODA, pp. 958–972. SIAM (2013)

    Google Scholar 

  20. Ott, S.: Lower bounds for approximating shortest superstrings over an alphabet of size 2. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) WG 1999. LNCS, vol. 1665, pp. 55–64. Springer, Heidelberg (1999). doi:10.1007/3-540-46784-X_7

    Chapter  Google Scholar 

  21. Paluch, K.E.: Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring. CoRR abs/1401.3670 (2014). http://arxiv.org/abs/1401.3670

  22. Paluch, K.E., Elbassioni, K.M., van Zuylen, A.: Simpler approximation of the maximum asymmetric traveling salesman problem. In: STACS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol. 14, pp. 501–506 (2012)

    Google Scholar 

  23. Sweedyk, Z.: A \(\mathbf{{2}{\frac{1}{2}}}\)-approximation algorithm for shortest superstring. SIAM J. Comput. 29(3), 954–986 (1999). doi:10.1137/S0097539796324661

  24. Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988). doi:10.1016/0304-3975(88)90167-3, http://www.sciencedirect.com/science/article/pii/0304397588901673

  25. Teng, S.H., Yao, F.F.: Approximating shortest superstrings. SIAM J. Comput. 26(2), 410–417 (1997). doi:10.1137/S0097539794286125

  26. Vazirani, V.V.: Approximation Algorithms. Springer-Verlag New York Inc., New York (2001)

    MATH  Google Scholar 

  27. Yu, Y.W.: Approximation hardness of shortest common superstring variants. CoRR abs/1602.08648 (2016). http://arxiv.org/abs/1602.08648

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tristan Braquelaire .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 5. Results on 100 sets of reads

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Braquelaire, T., Gasparoux, M., Raffinot, M., Uricaru, R. (2017). On the Shortest Common Superstring of NGS Reads. In: Gopal, T., Jäger , G., Steila, S. (eds) Theory and Applications of Models of Computation. TAMC 2017. Lecture Notes in Computer Science(), vol 10185. Springer, Cham. https://doi.org/10.1007/978-3-319-55911-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55911-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55910-0

  • Online ISBN: 978-3-319-55911-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics