Why Large Closest String Instances Are Easy to Solve in Practice

Boucher, Christina; Wilkie, Kathleen

doi:10.1007/978-3-642-16321-0_10

Christina Boucher¹⁸ &
Kathleen Wilkie¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1125 Accesses

Abstract

We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem and give a randomized algorithm, we refer to as CSP-Greedy, that computes the closest string on smoothed instances up to a constant factor approximation in time O(ℓ³), where ℓ is the string length. Using smoothed analysis, we prove CSP-Greedy achieves a $\left( ( 1 + \frac{\epsilon e}{2^n})\right)^{\ell}$-approximation guarantee, where ε> 0 is any small value and n is the number of strings. These approximation and runtime guarantees demonstrate that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-greedy runs extremely efficiently on instances with a large number of strings. This counter-intuitive fact that “large” Closest String instances are easier and more efficient to solve gives new insight into this well-investigated problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Designing and Implementing Algorithms for the Closest String Problem

New Modeling Ideas for the Exact Solution of the Closest String Problem

Optimum Solution of the Closest String Problem via Rank Distance

References

Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of 47th FOCS, pp. 449–456 (2006)
Google Scholar
Andoni, A., Krauthgamer, R.: The smoothed complexity of edit distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 357–369. Springer, Heidelberg (2008)
Chapter Google Scholar
Banderier, C., Beier, R., Mehlhorn, K.: Smoothed analysis of three combinatorial problems. In: Ochmański, E., Tyszkiewicz, J. (eds.) MFCS 2008. LNCS, vol. 5162, pp. 198–207. Springer, Heidelberg (2008)
Google Scholar
Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus strings. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)
Chapter Google Scholar
Blum, A., Dunagan, J.D.: Smoothed analysis of the perceptron algorithm for linear programming. In: Proc. of 13th SODA, pp. 905–914 (2002)
Google Scholar
Boucher, C., Brown, D.G.: Detecting motifs in a large data set: applying probabilistic insights to motif finding. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 139–150. Springer, Heidelberg (2009)
Chapter Google Scholar
Brejová, B., Brown, D.G., Harrower, I., López-Ortiz, A., Vinař, T.: Sharper upper and lower bounds for an approximation scheme for consensus-pattern. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 1–10. Springer, Heidelberg (2005)
Chapter Google Scholar
Brejová, B., Brown, D.G., Harrower, I., Vinař, T.: New bounds for motif finding in strong instances. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 94–105. Springer, Heidelberg (2006)
Chapter Google Scholar
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)
Article MathSciNet MATH Google Scholar
Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS (9), 123–125 (1993)
Google Scholar
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999)
Book MATH Google Scholar
Dunagan, J.D., Spielman, D.A., Teng, S.-H.: Smoothed analysis of the renegar’s condition number for linear programming. In: Proc. of SIOPT (2002)
Google Scholar
Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30(2), 113–119 (1997)
Article MathSciNet MATH Google Scholar
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37, 25–42 (2003)
Article MathSciNet MATH Google Scholar
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1) (2003)
Google Scholar
Lenstra, W.H.: Integer programming with a fixed number of variables. Math. of OR 8, 538–548 (1983)
Article MathSciNet MATH Google Scholar
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comput. Syst. Sci. 65(1), 73–96 (2002)
Article MATH Google Scholar
Lucas, K., Busch, M., Össinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- and gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)
Google Scholar
Ma, B.: Why greedy works for shortest common superstring problem. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 244–254. Springer, Heidelberg (2008)
Chapter Google Scholar
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)
Chapter Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Manthey, B., Reischuk, R.: Smoothed analysis of binary search trees. Th. Comp. Sci. 378(3), 292–315 (2007)
Article MathSciNet MATH Google Scholar
Papadimitriou, C.H.: On selecting a satisfying truth assignment. In: Proc. of 32nd FOCS, pp. 163–169 (1991)
Google Scholar
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)
Google Scholar
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA strings. In: Proc. of 8th ISMB , pp. 269–278 (2000)
Google Scholar
Proutski, V., Holme, E.C.: Primer master: A new program for the design and analyiss of PCR primers. CABIOS 12, 253–255 (1996)
Google Scholar
Schöning, U.: A probabilistic algorithm for k-sat and constraint satisfaction problems. In: Proc. of 40th FOCS, pp. 410–414 (1999)
Google Scholar
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: why the simplex algorithm ususally takes polynomial time. In: Proc. of 33rd STOC, pp. 296–305 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, Canada
Christina Boucher
Department of Applied Mathematics, University of Waterloo, Canada
Kathleen Wilkie

Authors

Christina Boucher
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen Wilkie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Physics and Mathematics, Edificio "B", Universidad Michoacana, Ciudad Universitaria, 5800, Morelia, Mich., Mexico
Edgar Chavez
Dept. of Computer Science and Enginerring, University of California, 92521, Riverside, CA, USA
Stefano Lonardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boucher, C., Wilkie, K. (2010). Why Large Closest String Instances Are Easy to Solve in Practice. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-16321-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics