Skip to main content

Closest String and Substring Problems

  • Reference work entry
  • First Online:
Encyclopedia of Algorithms
  • 220 Accesses

Years and Authors of Summarized Original Work

  • 2000; Li, Ma, Wang

  • 2003; Deng, et al.

  • 2008; Marx

  • 2009; Ma, Sun

  • 2011; Chen, Wang

  • 2012; Chen, Ma, Wang

Problem Definition

The problem of finding a center string that is “close” to every given string arises and has applications in computational molecular biology [4, 5, 911, 18, 19] and coding theory [1, 6, 7].

This problem has two versions: The first problem comes from coding theory when we are looking for a code not too far away from a given set of codes.

Problem 1 (The closest string problem)

Input: a set of strings \(\mathcal{S} =\{ s_{1},s_{2},\ldots ,s_{n}\}\), each of length m.

Output: the smallest d and a string s of length m which is within Hamming distance d to each \(s_{i} \in \mathcal{S}\).

The second problem is much more elusive than the closest string problem. The problem is formulated from applications in finding conserved regions, genetic drug target identification, and genetic probes in molecular biology.

Problem 2 (The...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Ben-Dor A, Lancia G, Perone J, Ravi R (1997) Banishing bias from consensus sequences. In: Proceedings of the 8th annual symposium on combinatorial pattern matching conference, Aarhus, pp 247–261

    Google Scholar 

  2. Chen Z, Wang L (2011) Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model. IEEE/ACM Trans Comput Biol Bioinform 8(5):1400–1410

    Article  Google Scholar 

  3. Chen Z-Z, Ma B, Wang L (2012) A three-string approach to the closest string problem. J Comput Syst Sci 78(1):164–178

    Article  MathSciNet  MATH  Google Scholar 

  4. Deng X, Li G, Li Z, Ma B, Wang L (2003) Genetic design of drugs without side-effects. SIAM J Comput 32(4):1073–1090

    Article  MathSciNet  MATH  Google Scholar 

  5. Dopazo J, Rodríguez A, Sáiz JC, Sobrino F (1993) Design of primers for PCR amplification of highly variable genomes. CABIOS 9:123–125

    Google Scholar 

  6. Frances M, Litman A (1997) On covering problems of codes. Theor Comput Syst 30:113–119

    Article  MathSciNet  MATH  Google Scholar 

  7. Gasieniec L, Jansson J, Lingas A (1999) Efficient approximation algorithms for the hamming center problem. In: Proceedings of the 10th ACM-SIAM symposium on discrete algorithms, Baltimore, pp 135–S906

    Google Scholar 

  8. Gramm J, Niedermeier R, Rossmanith P 2003 Fixed-parameter algorithms for closest string and related problems. Algorithmica 37(1):25–42

    Article  MathSciNet  MATH  Google Scholar 

  9. Hertz G, Stormo G (1995) Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proceedings of the 3rd international conference on bioinformatics and genome research, Tallahassee, pp 201–216

    Google Scholar 

  10. Lanctot K, Li M, Ma B, Wang S, Zhang L (1999) Distinguishing string selection problems. In: Proceedings of the 10th ACM-SIAM symposium on discrete algorithms, Baltimore, pp 633–642

    Google Scholar 

  11. Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7:41–51

    Article  Google Scholar 

  12. Li M, Ma B, Wang L (2002) Finding similar regions in many sequences. J Comput Syst Sci 65(1):73–96

    Article  MathSciNet  MATH  Google Scholar 

  13. Li M, Ma B, Wang L (1999) Finding similar regions in many strings. In: Proceedings of the thirty-first annual ACM symposium on theory of computing, Atlanta, pp 473–482

    Google Scholar 

  14. Li M, Ma B, Wang L (2002) On the closest string and substring problems. J ACM 49(2):157–171

    Article  MathSciNet  MATH  Google Scholar 

  15. Ma B (2000) A polynomial time approximation scheme for the closest substring problem. In: Proceedings of the 11th annual symposium on combinatorial pattern matching, Montreal, pp 99–107

    Google Scholar 

  16. Ma B, Sun X (2009) More efficient algorithms for closest string and substring problems. SIAM J Comput 39(4):1432–1443

    Article  MathSciNet  MATH  Google Scholar 

  17. Marx D (2008) Closest substring problems with small distances. SIAM J Comput 38(4):1382–1410

    Article  MathSciNet  MATH  Google Scholar 

  18. Stormo G (1990) Consensus patterns in DNA. In: Doolittle RF (ed) Molecular evolution: computer analysis of protein and nucleic acid sequences. Methods Enzymol 183:211–221

    Google Scholar 

  19. Stormo G, Hartzell GW III (1991) Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA 88:5699–5703

    Article  Google Scholar 

  20. Wang L, Zhu B (2009) Efficient algorithms for the closest string and distinguishing string selection problems. In: Proceedings of 3rd international workshop on frontiers in algorithms, Hefei. Lecture notes in computer science, vol 5598, pp 261–270

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Wang, L., Li, M., Ma, B. (2016). Closest String and Substring Problems. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_73

Download citation

Publish with us

Policies and ethics