Skip to main content
Log in

A large neighborhood search heuristic for the longest common subsequence problem

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

Given a set S={S 1,…,S k} of finite strings, the k-Longest Common Subsequence Problem (k-LCSP) seeks a string L * of maximum length such that L * is a subsequence of each S i for i=1,…,k. This paper presents a large neighborhood search technique that provides quality solutions to large k-LCSP instances. This heuristic runs in linear time in both the length of the sequences and the number of sequences. Some computational results are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aho, A.V., Hopcroft, J.E., Ullman, J.: Data Structures and Algorithms. Addison–Wiley, Reading (1983)

    MATH  Google Scholar 

  • Ahuja, R., Ergun, O., Orlin, J., Punen, A.: A survey of very large-scale neighborhood search techniques. Discret. Appl. Math. 123(1–3), 75–102 (2002)

    Article  MATH  Google Scholar 

  • Bafna, V., Muthukrishnan, S., Ravi, R.: Computing similarity between RNA strings. In: Proceedings of the 6th Annual Symposium on Combinatorial Pattern Matching, pp. 1–16, 1995

  • Banerjee, A., Ghosh, J.: Clickstream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining, pp. 33–40, 2001

  • Bergroth, L., Hakonen, H., Raitta, T.: New approximation algorithms for longest common subsequences. In: Proceedings. String Processing and Information Retrieval: A South American Symposium, pp. 32–40, 1998

  • Bergroth, L., Hakonen, H., Raitta, T.: A survey of longest common subsequence algorithms. In: Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, pp. 39–48, 2000

  • Bonizzoni, P., Vedova, G.D., Mauri, G.: Experimenting an approximation algorithm for the LCS. Discret. Appl. Math. 110(1), 13–24 (1998)

    Article  MathSciNet  Google Scholar 

  • Brisk, P., Kaplan, A., Sarrafzadeh, M.: Area-efficient instruction set synthesis for reconfigurable system-on-chip designs. In: Proceedings 2004 Design Automation Conference, pp. 395–400, 2004

  • Chin, F., Poon, C.: Performance analysis of some simple heuristics for computing longest common subsequences. Algorithmica 12(4–5), 293–311 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Dayhoff, M.O.: Computer analysis of protein evolution. Sci. Am. 221(1), 86–95 (1969)

    Article  Google Scholar 

  • Dayhoff, M., Schwartz, R., Orcutt, B.: A model of evolutionary change in proteins. Atlas Protein Seq. Struct. 5, 345–352 (1978)

    Google Scholar 

  • Eppstein, D., Galil, Z., Giancarlo, R.: Italiano, Sparse dynamic programming. II. convex and concave cost functions. J. Assoc. Comput. Mach. 39(3), 546–567 (1992)

    MATH  MathSciNet  Google Scholar 

  • Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  • Guenoche, A., Vitte, P.: Longest common subsequence to multiple strings. Exact and approximate algorithms. Tech. Sci. Inform. 14(7), 897–915 (1995)

    Google Scholar 

  • Guenoche, A.: Supersequence of masks for oligo-chips. J. Bioinform. Comput. Biol. 2(3), 459–469 (2004)

    Article  Google Scholar 

  • Hakata, K., Imai, H.: The longest common subsequence problem for small alphabet size between many strings. In: Proceedings of the 3rd International Symposium on Algorithms and Computation, 650, pp. 469–478, 1992

  • Hayes, C.C.: A model of planning for plan efficiency: taking advantage of operator overlap. In: Proceedings of the 11th International Joint Conference of Artificial Intelligence, pp. 949–953, 1989

  • Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. Assoc. Comput. Mach. 18(6), 341–343 (1975)

    MATH  MathSciNet  Google Scholar 

  • Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. Assoc. Comput. Mach. 24(4), 664–675 (1977)

    MATH  MathSciNet  Google Scholar 

  • Hsu, W.J., Du, M.W.: Computing a longest common subsequence for a set of strings. BIT 24, 45–59 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  • Hunt, J.W., McIlroy, M.D.: An algorithm for differential file comparison. Computing Science Technical Report, 41, AT&T Bell Laboratories, Murray Hill, New Jersey (1975)

  • Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. Assoc. Comput. Mach. 20(5), 350–353 (1977)

    MATH  MathSciNet  Google Scholar 

  • Irving, R.W., Fraser, C.B.: Two algorithms for the longest common subsequence of three (or more) strings. In: Proceedings of 3rd Symposium on Combinatorial Pattern Matching, vol. 644, pp. 214–229. Springer, Berlin (1992)

    Google Scholar 

  • Itoga, S.Y.: The string merging problem. BIT 21, 20–30 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  • Jiang, T., Li, M.: On the approximation of shortest common and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  • Jiang, T., Lin, G., Ma, B., Zhang, K.: A general edit distance between RNA structures. J. Comput. Biol. 9(2), 371–388 (2002)

    Article  Google Scholar 

  • Land, A.H., Doig, A.G.: An automatic method for solving discrete programming problems. Econometrica 28, 497–520 (1960)

    Article  MATH  MathSciNet  Google Scholar 

  • Larson, R.: State Increment Dynamic Programming. Elsevier, New York (1968)

    MATH  Google Scholar 

  • Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. SMC-8(5), 381–389 (1978)

    Article  MathSciNet  Google Scholar 

  • Maier, D.: The complexity of some problems on subsequences and supersequences. J. Assoc. Comput. Mach. 25, 322–336 (1978)

    MATH  MathSciNet  Google Scholar 

  • Sankoff, D., Kruskal, J.B. (eds.): Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison. Addison–Wesley, Reading (1983)

    Google Scholar 

  • Sellis, T.: Multiple query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)

    Article  Google Scholar 

  • Singireddy, A.: Solving the longest common subsequence problem in Bioinformatics. Master’s Thesis, Industrial and Manufacturing Systems Engineering, Kansas State University, Manhattan, KS (2003)

  • Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  • Storer, J.: Data Compression: Methods and Theory. Computer Science Press, MD (1988)

  • Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)

    Article  Google Scholar 

  • Wagner, R.A.: Common phrases and minimum-space text storage. Commun. Assoc. Comput. Mach. 16(3), 148–152 (1973)

    Google Scholar 

  • Winston, W.L.: Operations Research Applications and Algorithms, 4th edn. Brooks/Cole-Thomson Learning, Belmont (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Todd Easton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Easton, T., Singireddy, A. A large neighborhood search heuristic for the longest common subsequence problem. J Heuristics 14, 271–283 (2008). https://doi.org/10.1007/s10732-007-9038-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-007-9038-y

Keywords

Navigation