Skip to main content
Log in

Computing a longest common subsequence for a set of strings

  • Part I Computer Science
  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

The known 2-string LCS problem is generalized to finding a Longest Common Subsequence (LCS) for a set of strings. A new, general approach that systematically enumerates common subsequences is proposed for the solution. Assuming a finite symbol set, it is shown that the presented scheme requires a preprocessing time that grows linearly with the total length of the input strings and a processing time that grows linearly with (K), the number of strings, and (∥ℙ∥) the number of matches among them. The only previous algorithm for the generalized LCS problem takesO(K·|S 1|·|S 2|·...|S k |) execution time, where |S i | denotes the length of the stringS i . Since typically ∥ℙ∥ is a very small percentage of |S 1|·|S 2|·...·|S k |, the proposed method may be considered to be much more efficient than the straightforward dynamic programming approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References and bibliography

  1. A. V. Aho, D. S. Hirschberg and J. D. Ullman,Bounds on the complexity of the longest common subsequence problem, J. Assoc. Comput. Mach. 23(1) (Jan. 1976), 1–12.

    Google Scholar 

  2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms, 2nd printing, Addison-Wesley, Reading, Mass., 1976.

    Google Scholar 

  3. V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev,On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR 194 (1970), 487–488 (in Russian), English transl. in Soviet Math., Dokl. 11, 5 (1970), 1209–1210.

    Google Scholar 

  4. V. Chavatal, D. A. Klarner, and D. E. Knuth,Selected combinatorial research problems, STAN-CS-72-292, Stanford Univ., Stanford, Calif. 1972, p. 26.

    Google Scholar 

  5. V. Chvatal and D. Sankoff,Longest common subsequences of two random sequences. STAN-CS-75-477, Stanford Univ., Stanford, Calif., Jan. 1975.

    Google Scholar 

  6. M. O. Dayhoff,Computer aids to protein sequence determination, J. Theoret. Biology 8, (Jan. 1965), 97–112.

    Article  Google Scholar 

  7. M. O. Dayhoff,Computer analysis of protein evolution, Scientif. Amer. 221, 1 (July 1969), 86–95.

    Google Scholar 

  8. M. L. Fredman,On computing length of the longest increasing subsequences, Discrete Math. 11, 1 (Jan. 1975), 29–36.

    Article  Google Scholar 

  9. K. S. Fu and B. K. Bhargava,Tree systems for syntactic pattern recognition, IEEE Trans. Computs. C-12, 12 (Dec. 1973), 1087–1099.

    Google Scholar 

  10. J. Gallant, D. Maier and J. A. Storer,On finding minimal length superstrings, J. Comput. and Sys. Sci. 20, (1980), 50–58.

    Article  Google Scholar 

  11. D. S. Hirschberg,A linear space algorithm for computing maximal common subsequences, Comm. ACM 18(6) (June, 1975), 341–343.

    Article  Google Scholar 

  12. D. S. Hirschberg,Algorithms for the longest common subsequence problem, J. Assoc. Comput. Mach. 24(4) (1977) 664–675.

    Google Scholar 

  13. W. J. Hsu and M. W. Du,A fast algorithm for the longest common subsequence problem, Yearly Report for NSC Support, March (1982).

  14. J. W. Hunt and M. D. McIlroy,An algorithm for Differential File Comparison, Computing Science Technical Report 41, 197.

  15. J. W. Hunt and T. G. Szymanski,A fast algorithm for computing longest common subsequences, Com. ACM 20(5) (May, 1977), 350–353.

    Article  Google Scholar 

  16. S. Y. Itoga,The string merging problem, BIT 21 (1981), 20–30.

    Google Scholar 

  17. D. E. Knuth, J. H. Morris and V. R. Pratt,Fast pattern matching algorithms, Technical Report STAN-CS-74-440, Computer Science Dpet., Stanford Univ., Aug. (1974).

  18. D. E. Knuth,The Art of Computer Programming, Vol. 1:Fundamental Algorithms, Addison-Wesley, Reading, Mass., Sec. ed., (1973).

    Google Scholar 

  19. D. E. Knuth,The Art of Computer Programming, Vol. 3:Sorting and Searching, Addison-Wesley, Reading, Mass., (1973).

    Google Scholar 

  20. R. Lowrance and R. A. Wagner,An extension of the string to string correction problem, J. Assoc., Comput. Mach., 22(2), (1975), 177–183.

    Google Scholar 

  21. S. Y. Lu and K. S. Fu,A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. Syst., Man., Cybern., Vol. SMC-8(5), (May, 1978), 381–389.

    Google Scholar 

  22. D. Maier,The complexity of some problems on subsequences and supersequences, J. Assoc. Comput. Mach. 25(2), (April, 1978), 322–336.

    Google Scholar 

  23. W. J. Masek and M. S. Paterson,A faster algorithm computing string edit distances, J. Comput. and Syst. Sci. 20 (1980), 18–31.

    Article  Google Scholar 

  24. H. L. Morgan,Spelling correction in systems programs, Comm. ACM 13(2), (Feb. 1970), 90–94.

    Article  Google Scholar 

  25. A. Mukhopadhyay,A fast algorithm for the longest-common-subsequence problem, Inf. Sci. 20, (1980), 69–82.

    Article  Google Scholar 

  26. D. Sankoff,Matching sequences under deletion insertion constraints, Proc. Nat. Acad. Sci., U.S.A., 69 (1972), 4–6.

    Google Scholar 

  27. S. M. Selkow,The tree-to-tree editing problem, Inform. Processing Letters, 6, 6 (Dec., 1977), 184–186.

    Article  Google Scholar 

  28. P. H. Sellers,An algorithm for the distance between two finite sequences, J. Combinatorial Theory Ser. A16: (1974), 253–258.

    Article  Google Scholar 

  29. R. A. Wagner,Common phrases and minimum-space text storage, Comm. ACM, 16(3), (March, 1973), 148–152.

    Article  Google Scholar 

  30. R. A. Wagner and M. J. Fischer,The string-to-string correction problem, J. Assoc. Comput. Mach. 21(1), (1974), 168–173.

    Google Scholar 

  31. P. A. Wagner,On the complexity of the extended string-to-string correction problem. Proc. Seventh Annual ACM Symp. on Theory of Comput., (1975), 218–223.

  32. C. K. Wong and A. K. Chandra,Bounds for the string editing problem, J. Assoc. Comput. Mach. 28(1) (Feb. 1976), 13–18.

    Google Scholar 

  33. E. Horowitz and S. Sahni,Fundamentals of Data Structures, Computer Science Press, Potomac, Maryland, (1976).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, W.J., Du, M.W. Computing a longest common subsequence for a set of strings. BIT 24, 45–59 (1984). https://doi.org/10.1007/BF01934514

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01934514

Key Phrases

Navigation