Abstract
We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem and give a randomized algorithm, we refer to as CSP-Greedy, that computes the closest string on smoothed instances up to a constant factor approximation in time O(ℓ3), where ℓ is the string length. Using smoothed analysis, we prove CSP-Greedy achieves a \(\left( ( 1 + \frac{\epsilon e}{2^n})\right)^{\ell}\)-approximation guarantee, where ε> 0 is any small value and n is the number of strings. These approximation and runtime guarantees demonstrate that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-greedy runs extremely efficiently on instances with a large number of strings. This counter-intuitive fact that “large” Closest String instances are easier and more efficient to solve gives new insight into this well-investigated problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of 47th FOCS, pp. 449–456 (2006)
Andoni, A., Krauthgamer, R.: The smoothed complexity of edit distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 357–369. Springer, Heidelberg (2008)
Banderier, C., Beier, R., Mehlhorn, K.: Smoothed analysis of three combinatorial problems. In: Ochmański, E., Tyszkiewicz, J. (eds.) MFCS 2008. LNCS, vol. 5162, pp. 198–207. Springer, Heidelberg (2008)
Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus strings. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)
Blum, A., Dunagan, J.D.: Smoothed analysis of the perceptron algorithm for linear programming. In: Proc. of 13th SODA, pp. 905–914 (2002)
Boucher, C., Brown, D.G.: Detecting motifs in a large data set: applying probabilistic insights to motif finding. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 139–150. Springer, Heidelberg (2009)
Brejová, B., Brown, D.G., Harrower, I., López-Ortiz, A., Vinař, T.: Sharper upper and lower bounds for an approximation scheme for consensus-pattern. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 1–10. Springer, Heidelberg (2005)
Brejová, B., Brown, D.G., Harrower, I., Vinař, T.: New bounds for motif finding in strong instances. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 94–105. Springer, Heidelberg (2006)
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS (9), 123–125 (1993)
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999)
Dunagan, J.D., Spielman, D.A., Teng, S.-H.: Smoothed analysis of the renegar’s condition number for linear programming. In: Proc. of SIOPT (2002)
Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30(2), 113–119 (1997)
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37, 25–42 (2003)
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1) (2003)
Lenstra, W.H.: Integer programming with a fixed number of variables. Math. of OR 8, 538–548 (1983)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comput. Syst. Sci. 65(1), 73–96 (2002)
Lucas, K., Busch, M., Össinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- and gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)
Ma, B.: Why greedy works for shortest common superstring problem. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 244–254. Springer, Heidelberg (2008)
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Manthey, B., Reischuk, R.: Smoothed analysis of binary search trees. Th. Comp. Sci. 378(3), 292–315 (2007)
Papadimitriou, C.H.: On selecting a satisfying truth assignment. In: Proc. of 32nd FOCS, pp. 163–169 (1991)
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA strings. In: Proc. of 8th ISMB , pp. 269–278 (2000)
Proutski, V., Holme, E.C.: Primer master: A new program for the design and analyiss of PCR primers. CABIOS 12, 253–255 (1996)
Schöning, U.: A probabilistic algorithm for k-sat and constraint satisfaction problems. In: Proc. of 40th FOCS, pp. 410–414 (1999)
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: why the simplex algorithm ususally takes polynomial time. In: Proc. of 33rd STOC, pp. 296–305 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boucher, C., Wilkie, K. (2010). Why Large Closest String Instances Are Easy to Solve in Practice. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-16321-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)