Abstract
The known 2-string LCS problem is generalized to finding a Longest Common Subsequence (LCS) for a set of strings. A new, general approach that systematically enumerates common subsequences is proposed for the solution. Assuming a finite symbol set, it is shown that the presented scheme requires a preprocessing time that grows linearly with the total length of the input strings and a processing time that grows linearly with (K), the number of strings, and (∥ℙ∥) the number of matches among them. The only previous algorithm for the generalized LCS problem takesO(K·|S 1|·|S 2|·...|S k |) execution time, where |S i | denotes the length of the stringS i . Since typically ∥ℙ∥ is a very small percentage of |S 1|·|S 2|·...·|S k |, the proposed method may be considered to be much more efficient than the straightforward dynamic programming approach.
Similar content being viewed by others
References and bibliography
A. V. Aho, D. S. Hirschberg and J. D. Ullman,Bounds on the complexity of the longest common subsequence problem, J. Assoc. Comput. Mach. 23(1) (Jan. 1976), 1–12.
A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms, 2nd printing, Addison-Wesley, Reading, Mass., 1976.
V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev,On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR 194 (1970), 487–488 (in Russian), English transl. in Soviet Math., Dokl. 11, 5 (1970), 1209–1210.
V. Chavatal, D. A. Klarner, and D. E. Knuth,Selected combinatorial research problems, STAN-CS-72-292, Stanford Univ., Stanford, Calif. 1972, p. 26.
V. Chvatal and D. Sankoff,Longest common subsequences of two random sequences. STAN-CS-75-477, Stanford Univ., Stanford, Calif., Jan. 1975.
M. O. Dayhoff,Computer aids to protein sequence determination, J. Theoret. Biology 8, (Jan. 1965), 97–112.
M. O. Dayhoff,Computer analysis of protein evolution, Scientif. Amer. 221, 1 (July 1969), 86–95.
M. L. Fredman,On computing length of the longest increasing subsequences, Discrete Math. 11, 1 (Jan. 1975), 29–36.
K. S. Fu and B. K. Bhargava,Tree systems for syntactic pattern recognition, IEEE Trans. Computs. C-12, 12 (Dec. 1973), 1087–1099.
J. Gallant, D. Maier and J. A. Storer,On finding minimal length superstrings, J. Comput. and Sys. Sci. 20, (1980), 50–58.
D. S. Hirschberg,A linear space algorithm for computing maximal common subsequences, Comm. ACM 18(6) (June, 1975), 341–343.
D. S. Hirschberg,Algorithms for the longest common subsequence problem, J. Assoc. Comput. Mach. 24(4) (1977) 664–675.
W. J. Hsu and M. W. Du,A fast algorithm for the longest common subsequence problem, Yearly Report for NSC Support, March (1982).
J. W. Hunt and M. D. McIlroy,An algorithm for Differential File Comparison, Computing Science Technical Report 41, 197.
J. W. Hunt and T. G. Szymanski,A fast algorithm for computing longest common subsequences, Com. ACM 20(5) (May, 1977), 350–353.
S. Y. Itoga,The string merging problem, BIT 21 (1981), 20–30.
D. E. Knuth, J. H. Morris and V. R. Pratt,Fast pattern matching algorithms, Technical Report STAN-CS-74-440, Computer Science Dpet., Stanford Univ., Aug. (1974).
D. E. Knuth,The Art of Computer Programming, Vol. 1:Fundamental Algorithms, Addison-Wesley, Reading, Mass., Sec. ed., (1973).
D. E. Knuth,The Art of Computer Programming, Vol. 3:Sorting and Searching, Addison-Wesley, Reading, Mass., (1973).
R. Lowrance and R. A. Wagner,An extension of the string to string correction problem, J. Assoc., Comput. Mach., 22(2), (1975), 177–183.
S. Y. Lu and K. S. Fu,A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. Syst., Man., Cybern., Vol. SMC-8(5), (May, 1978), 381–389.
D. Maier,The complexity of some problems on subsequences and supersequences, J. Assoc. Comput. Mach. 25(2), (April, 1978), 322–336.
W. J. Masek and M. S. Paterson,A faster algorithm computing string edit distances, J. Comput. and Syst. Sci. 20 (1980), 18–31.
H. L. Morgan,Spelling correction in systems programs, Comm. ACM 13(2), (Feb. 1970), 90–94.
A. Mukhopadhyay,A fast algorithm for the longest-common-subsequence problem, Inf. Sci. 20, (1980), 69–82.
D. Sankoff,Matching sequences under deletion insertion constraints, Proc. Nat. Acad. Sci., U.S.A., 69 (1972), 4–6.
S. M. Selkow,The tree-to-tree editing problem, Inform. Processing Letters, 6, 6 (Dec., 1977), 184–186.
P. H. Sellers,An algorithm for the distance between two finite sequences, J. Combinatorial Theory Ser. A16: (1974), 253–258.
R. A. Wagner,Common phrases and minimum-space text storage, Comm. ACM, 16(3), (March, 1973), 148–152.
R. A. Wagner and M. J. Fischer,The string-to-string correction problem, J. Assoc. Comput. Mach. 21(1), (1974), 168–173.
P. A. Wagner,On the complexity of the extended string-to-string correction problem. Proc. Seventh Annual ACM Symp. on Theory of Comput., (1975), 218–223.
C. K. Wong and A. K. Chandra,Bounds for the string editing problem, J. Assoc. Comput. Mach. 28(1) (Feb. 1976), 13–18.
E. Horowitz and S. Sahni,Fundamentals of Data Structures, Computer Science Press, Potomac, Maryland, (1976).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hsu, W.J., Du, M.W. Computing a longest common subsequence for a set of strings. BIT 24, 45–59 (1984). https://doi.org/10.1007/BF01934514
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01934514