Skip to main content
Log in

Configurations and Minority in the String Consensus Problem

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The Closest String Problem is defined as follows. Let \(S\) be a set of \(k\) strings \(\{s_1,\ldots ,s_k\}\), each of length \(\ell \). Find a string \(s^*\), such that the maximum Hamming distance of \(s^*\) from each of the strings is minimized. We denote this distance with \(d\). The string \(s^*\) is called a consensus string. In this paper we present two main algorithms, the Configuration algorithm with \(O(k^2 \ell ^ k)\) running time for this problem, and the Minority algorithm. The problem was introduced by Lanctot et al. [SODA’99 and (Inf Comput 185(1):41–55, 2003)]. They showed that the problem is \(\mathcal {NP}\)-hard and provided an approximation algorithm based on Integer Programming. Since then the closest string problem has been studied extensively both in computational biology and theoretical computer science. This research can be roughly divided into three categories: Approximate, exact and practical solutions. This paper falls under the exact solutions category. Despite the great effort to obtain efficient algorithms for this problem an algorithm with the natural running time of \(O(\ell ^ k)\) was not known. In this paper we close this gap. Our result means that algorithms solving the closest string problem in times \(O(\ell ^2), O(\ell ^3), O(\ell ^4)\) and \(O(\ell ^5)\) exist for the cases of \(k=2,3,4\) and \(5\), respectively. It is known that, in fact, the cases of \(k=2,3,\) and \(4\) can be solved in linear time. No efficient algorithm is currently known for the case of \(k=5\). We prove two lemmas, the unit square lemma and the minority lemma that exploit surprising properties of the closest string problem and enable constructing the closest string in a sequential fashion. These lemmas with some additional ideas give a \(O(\ell ^2)\) algorithm for computing a closest string of \(5\) binary strings. Algorithm Minority is based on these lemmas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Amir, A., Landau, G.M., Na, J.C., Park, H., Park, K., Sim, J.S.: Consensus optimizing both distance sum and radius. In: Kalgren, J., Tarhio, J., Hyyrö, H. (eds.) Proceedings of 16th Symposium on String Processing and Information Retrieval (SPIRE), LNCS, vol. 5721. Springer, pp. 234–242 (2009)

  2. Amir, A., Paryenty, H., Roditty, L.: Approximations and partial solutions for the consensus sequence problem. In: Proceedings of 18th Symposium on String Processing and Information Retrieval (SPIRE) (2011, to appear)

  3. Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proceedings of 47th IEEE Symposium on the Foundation of Computer Science (FOCS), pp. 449–458 (2006)

  4. Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Proceedings of 8th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–261 (1997)

  5. Boucher, C., Brown, D., Durocher, S.: On the structure of small motif recognition instances. In: Proceedings of 15th Symposium on String Processing and Information Retrieval (SPIRE), pp. 269–281 (2008)

  6. Boucher, C., Wilkie, K.: Why large closest string instances are easy to solve in practice. In: Proceedings of 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 106–117 (2010)

  7. Chimani, M., Woste, M., Bocker, S.: A closer look at the closest string and closest substring problem. In: Proceedings of 13th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 13–24 (2011)

  8. Evans, P.A., Smith, A., Wareham, H.T.: The Parameterized Complexity of p-Center Approximate Substring Problems. Technical Report TR01-149, Faculty of Computer Science, University of New Brunswick, Canada (2001)

  9. Frances, M., Litman, A.: On covering problems of codes. Theory Comput. Syst. 30(2), 113–119 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gramm, J., Niedermeier, R., Rossmanith, P.: Exact solutions for closest string and related problems. In: Eades, P., Takaoka, T. (eds.) Proceedings of 12th Annual Symposium on Algorithms and Computation (ISAAC), LNCS, vol. 2223. Springer, pp. 441–453 (2001)

  11. Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37(1), 25–42 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hufsky, F., Kuchenbecker, L., Jahn, K., Stoye, J., Bocker, S.: Swiftly computing center strings. In: Proceedings of 10th Workshop on Algorithms in Bioinformatics (WABI), pp. 325–336 (2010)

  13. Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lenstra, H.W.: Integer programming with a fixed number of variables. Math. Oper. Res. 8, 538–548 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  16. Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM J. Comput. 39(4), 1432–1443 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of 5th International Workshop on Algorithms and Data Structures (WADS), pp. 126–135 (1997)

  19. Sze, S., Lu, S., Chen, J.: Integrating sample-driven and pattern-driven approaches in motif finding. In: Proceedings of 4th Workshop on Algorithms in Bioinformatics (WABI), pp. 438–449 (2004)

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their helpful remarks.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liam Roditty.

Additional information

Amihood Amir: Partly supported by NSF Grant CCR-09-04581 and ISF Grant 571/14.

Haim Paryenty: Partly supported by a Bar-Ilan University President’s Fellowship.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amir, A., Paryenty, H. & Roditty, L. Configurations and Minority in the String Consensus Problem. Algorithmica 74, 1267–1292 (2016). https://doi.org/10.1007/s00453-015-9996-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-015-9996-7

Keywords

Navigation