Skip to main content
Log in

Performance analysis of some simple heuristics for computing longest common subsequences

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Although theLongest Common Subsequence (LCS)Problem has been studied by many researchers for years, heuristic methods have not been investigated before. In this paper we present a simple heuristic which guarantees to return a common subsequence of length at least 1/s that of the longest wheres is the number of different symbols in the input strings. Furthermore, we generalize the idea to several classes of heuristic algorithms. Surprisingly, we find that no other heuristic in these classes outperforms this simple algorithm. In other words, we show that any heuristic which uses only global information, such as number of symbol occurrences, might return a common subsequence as short as 1/s of the length of the longest. Analysis of the average performance of the simple heuristic fors=2 is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. V. Aho, D. S. Hirschberg, and J. D. Ullman, Bounds on the complexity of the maximal common subsequence problem,J. Assoc. Comput. Mach.,23 (1976), 1–12.

    MATH  MathSciNet  Google Scholar 

  2. A. Apostolico and C. Guerra, The longest common subsequence problem revisited,Algorithmica 2 (1987), 315–336.

    Article  MATH  MathSciNet  Google Scholar 

  3. F. Chin and C. K. Poon, A fast algorithm for computing longest common subsequences of small alphabet size,J. Inform. Process., 13(4) (1990), 463–469. A preliminary version also appeared inProceedings of the International Workshop on Discrete Algorithms and Complexity, 1989, pp. 163–168.

    MATH  Google Scholar 

  4. F. Chin and C. K. Poon, Performance of heuristics for the longest common subsequences problem,Proceedings of the 1990 International Computer Symposium, Hsinchu, Taiwan, December 1990, pp. 164–169.

  5. G. R. Cross and S. Kuo, Two-Step String-Matching Procedure, Technical Report CS-89-198, Washington State University, 1989.

  6. D. S. Hirschberg, A linear space algorithm for computing maximal common subsequences,Comm. ACM,18 (1975), 341–343.

    Article  MATH  MathSciNet  Google Scholar 

  7. D. S. Hirschberg, Algorithms for the longest common subsequence problem,J. Assoc. Comput. Mach.,24 (1977), 664–675.

    MATH  MathSciNet  Google Scholar 

  8. D. S. Hirschberg, An information-theoretic lower bound for the longest common subsequence problem,Inform. Process. Lett., 7(1) (1978), 40–41.

    Article  MATH  MathSciNet  Google Scholar 

  9. J. W. Hunt and T. G. Szymanski, A fast algorithm for computing longest common subsequences,Comm. ACM,20 (1977), 350–353.

    Article  MATH  MathSciNet  Google Scholar 

  10. W. J. Masek and M. S. Paterson, A faster algorithm computing string edit distances,J. Comput. System Sci,20 (1980), 18–31.

    Article  MATH  MathSciNet  Google Scholar 

  11. E. W. Myers, AnO(ND) difference algorithm and its variations,Algorithmica,1 (1986), 251–266.

    Article  MATH  MathSciNet  Google Scholar 

  12. N. Nakatsu, Y. Kambayashi, and S. Yajima, A longest common subsequence algorithm suitable for similar text strings,Acta Inform.,18 (1982), 171–179.

    Article  MATH  MathSciNet  Google Scholar 

  13. E. Ukkonen, Algorithms for approximate string matching,Inform, and Control,64 (1985), 100–118.

    Article  MATH  MathSciNet  Google Scholar 

  14. R. A. Wagner and M. J. Fischer, The string-to-string correction problem,J. Assoc. Comput. Mach, 21(1) (1974), 168–173.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by Alberto Apostolico.

This research was supported in part by ONR Grant N00014-87-K-0833.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chin, F., Poon, C.K. Performance analysis of some simple heuristics for computing longest common subsequences. Algorithmica 12, 293–311 (1994). https://doi.org/10.1007/BF01185429

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01185429

Key words

Navigation