Skip to main content

Algorithms for Closest and Farthest String Problems via Rank Distance

  • Conference paper
  • First Online:
Theory and Applications of Models of Computation (TAMC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11436))

  • 653 Accesses

Abstract

A new distance between strings, termed rank distance, was introduced by Dinu (Fundamenta Informaticae, 2003). Since then, the properties of rank distance were studied in several papers. In this article, we continue the study of rank distance. More precisely we tackle three problems that concern the distance between strings.

  1. 1.

    The first problem that we study is String with Fixed Rank Distance (SFRD): given a set of strings S and an integer d decide if there exists a string that is at distance d from every string in S. For this problem we provide a polynomial time exact algorithm.

  2. 2.

    The second problem that we study is named is the Closest String Problem under Rank Distance (CSRD). The input consists of a set of strings S, asks to find the minimum integer d and a string that is at distance at most d from all strings in S. Since this problem is NP-hard (Dinu and Popa, CPM 2012) it is likely that no polynomial time algorithm exists. Thus, we propose three different approaches: a heuristic approach and two integer linear programming formulations, one of them using geometric interpretation of the problem.

  3. 3.

    Finally, we approach the Farthest String Problem via Rank Distance (FSRD) that asks to find two strings with the same frequency of characters (i.e. the same Parikh vector) that have the largest possible rank distance. We provide a polynomial time exact algorithm for this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Rank distance can be defined for strings that do not necessarily have the same Parikh vector (see, e.g., [12]). However, these strings can be transformed into strings with the same Parikh vector without affecting the rank distance. Thus, for the sake of simplicity, we do not consider such strings in our paper.

References

  1. Arbib, C., Felici, G., Servilio, M., Ventura, P.: Optimum solution of the closest string problem via rank distance. In: Cerulli, R., Fujishige, S., Mahjoub, A.R. (eds.) ISCO 2016. LNCS, vol. 9849, pp. 297–307. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45587-7_26

    Google Scholar 

  2. Babaie, M., Mousavi, S.R.: A memetic algorithm for closest string problem and farthest string problem. In: 2010 18th Iranian Conference on Electrical Engineering. IEEE, May 2010

    Google Scholar 

  3. Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002)

    Google Scholar 

  4. Ben-Dor, A., Lancia, G., Ravi, R., Perone, J.: Banishing bias from consensus sequences. In: Apostolico, A., Hein, J. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63220-4_63

    Google Scholar 

  5. de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theor. Comput. Sci. 230(1–2), 39–48 (2000)

    MathSciNet  MATH  Google Scholar 

  6. Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Deza, E., Deza, M.: Dictionary of Distances. North-Holland, Amsterdam (2006)

    MATH  Google Scholar 

  8. Dinu, A., Dinu, L.P.: On the syllabic similarities of romance languages. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 785–788. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_88

    Google Scholar 

  9. Dinu, L.P.: On the classification and aggregation of hierarchies with different constitutive elements. Fundam. Inform. 55(1), 39–50 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Dinu, L.P., Ionescu, R., Tomescu, A.: A rank-based sequence aligner with applications in phylogenetic analysis. PLoS ONE 9(8), e104006 (2014)

    Google Scholar 

  11. Dinu, L.P., Manea, F.: An efficient approach for the rank aggregation problem. Theor. Comput. Sci. 359(1–3), 455–461 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Dinu, L.P., Popa, A.: On the closest string via rank distance. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 413–426. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31265-6_33

    Google Scholar 

  13. Dinu, L.P., Sgarro, A.: A low-complexity distance for DNA strings. Fundam. Inform. 73(3), 361–372 (2006)

    MathSciNet  MATH  Google Scholar 

  14. Frances, M., Litman, A.: On covering problems of codes. Theory Comput. Syst. 30(2), 113–119 (1997)

    MathSciNet  MATH  Google Scholar 

  15. Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)

    Google Scholar 

  16. Gramm, J., Huffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Currents in Computational Molecular Biology. RECOMB, pp. 74–75 (2002)

    Google Scholar 

  17. Greenhill, S.J.: Levenshtein distances fail to identify language relationships accurately. Comput. Linguist. 37(4), 689–698 (2011)

    Google Scholar 

  18. Ionescu, R.T., Popescu, M.: Knowledge Transfer between Computer Vision and Text Mining - Similarity-Based Learning Approaches. Advances in Computer Vision and Pattern Recognition. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30367-3

    Google Scholar 

  19. Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)

    MathSciNet  Google Scholar 

  20. Kannan, R.: Minkowski’s convex body theorem and integer programming. Math. Oper. Res. 12(3), 415–440 (1987)

    MathSciNet  MATH  Google Scholar 

  21. Koonin, E.V.: The emerging paradigm and open problems in comparative genomics. Bioinformatics 15(4), 265–266 (1999)

    Google Scholar 

  22. Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003)

    MathSciNet  MATH  Google Scholar 

  23. Lenstra, H.W.: Integer programming with a fixed number of variables. Math. Oper. Res. 8(4), 538–548 (1983)

    MathSciNet  MATH  Google Scholar 

  24. Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002)

    MathSciNet  MATH  Google Scholar 

  25. Liu, X., He, H., Sýkora, O.: Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 591–597. Springer, Heidelberg (2005). https://doi.org/10.1007/11527503_70

    Google Scholar 

  26. Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)

    MathSciNet  MATH  Google Scholar 

  27. Nerbonne, J., Hinrichs, E.W.: Linguistic distances. In: Proceedings of the Workshop on Linguistic Distances, Sydney, July 2006, pp. 1–6 (2006)

    Google Scholar 

  28. Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 315–327. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44888-8_23

    Google Scholar 

  29. Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2–4), 390–415 (2005)

    MathSciNet  MATH  Google Scholar 

  30. Popescu, M., Dinu, L.P.: Rank distance as a stylistic similarity. In: 22nd International Conference on Computational Linguistics, Posters Proceedings, COLING 2008, 18–22 August 2008, Manchester, UK, pp. 91–94 (2008)

    Google Scholar 

  31. Popov, V.Y.: Multiple genome rearrangement by swaps and by element duplications. Theor. Comput. Sci. 385(1–3), 115–126 (2007)

    MathSciNet  MATH  Google Scholar 

  32. Ritter, J.: An efficient bounding sphere. In: Graphics Gems, pp. 301–303. Elsevier (1990)

    Google Scholar 

  33. Sun, Y., et al.: Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer. Nat. Commun. 6, 8481 (2015)

    Google Scholar 

  34. Wang, L., Dong, L.: Randomized algorithms for motif detection. J. Bioinf. Comput. Biol. 3(5), 1039–1052 (2005)

    Google Scholar 

  35. Wooley, J.C.: Trends in computational biology: a summary based on a RECOMB plenary lecture. J. Comput. Biol. 6(3/4), 459–474 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandru Popa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dinu, L.P., Dumitru, B.C., Popa, A. (2019). Algorithms for Closest and Farthest String Problems via Rank Distance. In: Gopal, T., Watada, J. (eds) Theory and Applications of Models of Computation. TAMC 2019. Lecture Notes in Computer Science(), vol 11436. Springer, Cham. https://doi.org/10.1007/978-3-030-14812-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14812-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14811-9

  • Online ISBN: 978-3-030-14812-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics