Skip to main content

Why Large Closest String Instances Are Easy to Solve in Practice

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Abstract

We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem and give a randomized algorithm, we refer to as CSP-Greedy, that computes the closest string on smoothed instances up to a constant factor approximation in time O(ℓ3), where ℓ is the string length. Using smoothed analysis, we prove CSP-Greedy achieves a \(\left( ( 1 + \frac{\epsilon e}{2^n})\right)^{\ell}\)-approximation guarantee, where ε> 0 is any small value and n is the number of strings. These approximation and runtime guarantees demonstrate that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-greedy runs extremely efficiently on instances with a large number of strings. This counter-intuitive fact that “large” Closest String instances are easier and more efficient to solve gives new insight into this well-investigated problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of 47th FOCS, pp. 449–456 (2006)

    Google Scholar 

  2. Andoni, A., Krauthgamer, R.: The smoothed complexity of edit distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 357–369. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  3. Banderier, C., Beier, R., Mehlhorn, K.: Smoothed analysis of three combinatorial problems. In: Ochmański, E., Tyszkiewicz, J. (eds.) MFCS 2008. LNCS, vol. 5162, pp. 198–207. Springer, Heidelberg (2008)

    Google Scholar 

  4. Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus strings. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  5. Blum, A., Dunagan, J.D.: Smoothed analysis of the perceptron algorithm for linear programming. In: Proc. of 13th SODA, pp. 905–914 (2002)

    Google Scholar 

  6. Boucher, C., Brown, D.G.: Detecting motifs in a large data set: applying probabilistic insights to motif finding. In: Rajasekaran, S. (ed.) BICoB 2009. LNCS, vol. 5462, pp. 139–150. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Brejová, B., Brown, D.G., Harrower, I., López-Ortiz, A., Vinař, T.: Sharper upper and lower bounds for an approximation scheme for consensus-pattern. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 1–10. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Brejová, B., Brown, D.G., Harrower, I., Vinař, T.: New bounds for motif finding in strong instances. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 94–105. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Dopazo, J., Rodríguez, A., Sáiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. CABIOS (9), 123–125 (1993)

    Google Scholar 

  11. Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999)

    Book  MATH  Google Scholar 

  12. Dunagan, J.D., Spielman, D.A., Teng, S.-H.: Smoothed analysis of the renegar’s condition number for linear programming. In: Proc. of SIOPT (2002)

    Google Scholar 

  13. Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30(2), 113–119 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37, 25–42 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1) (2003)

    Google Scholar 

  16. Lenstra, W.H.: Integer programming with a fixed number of variables. Math. of OR 8, 538–548 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  17. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comput. Syst. Sci. 65(1), 73–96 (2002)

    Article  MATH  Google Scholar 

  18. Lucas, K., Busch, M., Össinger, S., Thompson, J.A.: An improved microcomputer program for finding gene- and gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. CABIOS 7, 525–529 (1991)

    Google Scholar 

  19. Ma, B.: Why greedy works for shortest common superstring problem. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 244–254. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 396–409. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  21. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  Google Scholar 

  22. Manthey, B., Reischuk, R.: Smoothed analysis of binary search trees. Th. Comp. Sci. 378(3), 292–315 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  23. Papadimitriou, C.H.: On selecting a satisfying truth assignment. In: Proc. of 32nd FOCS, pp. 163–169 (1991)

    Google Scholar 

  24. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17, S207–S214 (2001)

    Google Scholar 

  25. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA strings. In: Proc. of 8th ISMB , pp. 269–278 (2000)

    Google Scholar 

  26. Proutski, V., Holme, E.C.: Primer master: A new program for the design and analyiss of PCR primers. CABIOS 12, 253–255 (1996)

    Google Scholar 

  27. Schöning, U.: A probabilistic algorithm for k-sat and constraint satisfaction problems. In: Proc. of 40th FOCS, pp. 410–414 (1999)

    Google Scholar 

  28. Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: why the simplex algorithm ususally takes polynomial time. In: Proc. of 33rd STOC, pp. 296–305 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boucher, C., Wilkie, K. (2010). Why Large Closest String Instances Are Easy to Solve in Practice. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16321-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16320-3

  • Online ISBN: 978-3-642-16321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics