Abstract
The Shortest Superstring Problem (SSP) consists, for a set of strings \(S = \{s_1,\cdots ,s_n\}\) (with no \(s_i\) substring of \(s_j\)), to find a minimum length string that contains all \(s_i, 1\le i \le n\), as substrings.
This problem is proved to be NP-Complete and APX-hard. Guaranteed approximation algorithms have been proposed, the current best ratio being \(2\frac{11}{30}\), which has been achieved through a long and difficult process. SSP is highly used in practice on Next Generation Sequencing (NGS) data, which plays an increasingly important role in modern biological and medical research. In this note, we show that on NGS data the SSP approximation ratio reached by the classical algorithm of Blum et al. [2], is usually below \(2\frac{11}{30}\), while assuming specific characteristics of the data that are experimentally verified on a large sampling set. Moreover, we present an efficient linear time test for any input of strings of equal length, which allows to compute the approximation ratio that can be reached using the classical algorithm in [2].
This work was supported by the PEPS INS2I-CNRS project CompX and by a Genotype to Phenotype project of the Life Sciences Department of University of Bordeaux.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Armen, C., Stein, C.: A \({2}{\frac{2}{3}}\) superstring approximation algorithm. Discrete Appl. Math. 88(1–3), 29–57 (1998). http://dx.doi.org/10.1016/S0166-218X(98)00065-1, http://www.sciencedirect.com/science/article/pii/S0166218X98000651, Computational Molecular Biology DAM - CMB Series
Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41(4), 630–647 (1994). doi:10.1145/179812.179818, http://doi.acm.org/10.1145/179812.179818
Breslauer, D., Jiang, T., Jiang, Z.: Rotations of periodic strings and short superstrings. J. Algorithms 24(2), 340–353 (1997). http://dx.doi.org/10.1006/jagm.1997.0861, http://www.sciencedirect.com/science/article/pii/S0196677497908610
Armen, C., Stein, C.: Improved length bounds for the shortest superstring problem. In: Akl, S.G., Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1995. LNCS, vol. 955, pp. 494–505. Springer, Heidelberg (1995). doi:10.1007/3-540-60220-8_88
Crochemore, M., Cygan, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Algorithms for three versions of the shortest common superstring problem. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 299–309. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13509-5_27
Czumaj, A., Gçasieniec, L.: On the complexity of determining the period of a string. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 412–422. Springer, Heidelberg (2000). doi:10.1007/3-540-45123-4_34
Czumaj, A., Gasieniec, L., Piotrów, M., Rytter, W.: Sequential and parallel approximation of shortest superstrings. J. Algorithms 23(1), 74–100 (1997). http://dx.doi.org/10.1006/jagm.1996.0823, http://www.sciencedirect.com/science/article/pii/S0196677496908238
Ferragina, P., Landau, G., Ma, B.: Combinatorial pattern matching why greed works for shortest common superstring problem. Theor. Comput. Sci. 410(51), 5374–5381 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.014, http://www.sciencedirect.com/science/article/pii/S0304397509006410
Fici, G., Kociumaka, T., Radoszewski, J., Rytter, W., Walen, T.: On the greedy algorithm for the shortest common superstring problem with reversals. Inf. Process. Lett. 116(3), 245–251 (2016). doi:10.1016/j.ipl.2015.11.015
Gallant, J., Maier, D., Astorer, J.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980). http://dx.doi.org/10.1016/0022-0000(80)90004-5, http://www.sciencedirect.com/science/article/pii/0022000080900045
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman & Co., New York (1990)
Golovnev, A., Kulikov, A.S., Mihajlin, I.: Approximating shortest superstring problem using de Bruijn graphs. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 120–129. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38905-4_13
Guibas, L.J., Odlyzko, A.M.: Periods in strings. J. Comb. Theor. Ser. A 30(1), 19–42 (1981). http://dx.doi.org/10.1016/0097-3165(81)90038-8, http://www.sciencedirect.com/science/article/pii/0097316581900388
Holub, S., Shallit, J.: Periods and borders of random words. In: STACS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol. 47, pp. 44:1–44:10 (2016)
Kaplan, H., Shafrir, N.: The greedy algorithm for shortest superstrings. Inf. Process. Lett. 93(1), 13–17 (2005). doi:10.1016/j.ipl.2004.09.012
Karpinski, M., Schmied, R.: Improved inapproximability results for the shortest superstring and related problems. In: CATS, Australian Computer Society, CRPIT 2013, vol. 141, pp. 27–36 (2013)
Kosaraju, S.R., Park, J.K., Stein, C.: Long tours and short superstrings. In: 1994 Proceedings of 35th Annual Symposium on Foundations of Computer Science, pp 166–177 (1994). doi:10.1109/SFCS.1994.365696
Li, M.: Towards a DNA sequencing theory (learning a string). In: Proceedings of the 31st Symposium on the Foundations of Computer Science, pp. 125–134. IEEE Computer Society Press, Los Alamitos (1990)
Mucha, M.: Lyndon words and short superstrings. In: SODA, pp. 958–972. SIAM (2013)
Ott, S.: Lower bounds for approximating shortest superstrings over an alphabet of size 2. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) WG 1999. LNCS, vol. 1665, pp. 55–64. Springer, Heidelberg (1999). doi:10.1007/3-540-46784-X_7
Paluch, K.E.: Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring. CoRR abs/1401.3670 (2014). http://arxiv.org/abs/1401.3670
Paluch, K.E., Elbassioni, K.M., van Zuylen, A.: Simpler approximation of the maximum asymmetric traveling salesman problem. In: STACS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, LIPIcs, vol. 14, pp. 501–506 (2012)
Sweedyk, Z.: A \(\mathbf{{2}{\frac{1}{2}}}\)-approximation algorithm for shortest superstring. SIAM J. Comput. 29(3), 954–986 (1999). doi:10.1137/S0097539796324661
Tarhio, J., Ukkonen, E.: A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57(1), 131–145 (1988). doi:10.1016/0304-3975(88)90167-3, http://www.sciencedirect.com/science/article/pii/0304397588901673
Teng, S.H., Yao, F.F.: Approximating shortest superstrings. SIAM J. Comput. 26(2), 410–417 (1997). doi:10.1137/S0097539794286125
Vazirani, V.V.: Approximation Algorithms. Springer-Verlag New York Inc., New York (2001)
Yu, Y.W.: Approximation hardness of shortest common superstring variants. CoRR abs/1602.08648 (2016). http://arxiv.org/abs/1602.08648
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Braquelaire, T., Gasparoux, M., Raffinot, M., Uricaru, R. (2017). On the Shortest Common Superstring of NGS Reads. In: Gopal, T., Jäger , G., Steila, S. (eds) Theory and Applications of Models of Computation. TAMC 2017. Lecture Notes in Computer Science(), vol 10185. Springer, Cham. https://doi.org/10.1007/978-3-319-55911-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-55911-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55910-0
Online ISBN: 978-3-319-55911-7
eBook Packages: Computer ScienceComputer Science (R0)