Skip to main content
Log in

Enumeration of Maximal Common Subsequences Between Two Strings

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

A maximal common subsequence (MCS) between two strings X and Y is an inclusion-maximal subsequence of both X and Y. MCSs are a natural generalization of the classical concept of longest common subsequence (LCS), which can be seen as a longest MCS. We study the problem of efficiently listing all the distinct MCSs between two strings. As discussed in the paper, this problem is algorithmically challenging as the same MCS cannot be listed multiple times: for example, dynamic programming [Fraser et al., CPM 1998] incurs in an exponential waste of time, and a recent algorithm for finding an MCS [Sakai, CPM 2018] does not seem to immediately extend to listing. We follow an alternative and novel graph-based approach, proposing the first output-sensitive algorithm for this problem: it takes polynomial time in n per MCS found, where \(n = \max \{ |X|, |Y|\}\), with polynomial preprocessing time and space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The Strong Exponential Time Hypothesis (SETH) [13] states that \(\lim _{k\rightarrow \infty } s_k = 1\), where \(s_k=\inf \{\delta \mid k\text {-SAT can be solved in } O(2^{\delta n}) \text { time}\}\). It is widely believed to be true, and it has been used to prove conditional lower bounds for a variety of problems (see [2] for some examples).

  2. The author employs these edges to improve the Hunt–Szymansky algorithm [12], which extracts one LCS of two strings of length n in \(O((r + n) \log n)\), where r is the total number of ordered pairs of positions at which the two sequences match, that is, the number of edges in the string bipartite graph.

  3. To be precise, each recursive node has up to \(\sigma \) child nodes; the binary partition is seen by the fact that each recursive call corresponds to “using the edge \((l_c,m_c)\)” and the continuation after backtracking corresponds to “not using the edge \((l_c,m_c)\)”.

References

  1. Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pp. 59–78 (2015)

  2. Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 434–443. IEEE (2014)

  3. Apostolico, A.: Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Inf. Process. Lett. 23(2), 63–69 (1986)

    Article  MathSciNet  Google Scholar 

  4. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings 7th International Symposium on String Processing and Information Retrieval, pp. 39–48. SPIRE (2000)

  5. Chain, P., Kurtz, S., Ohlebusch, E., Slezak, T.: An applications-focused review of comparative genomics tools: capabilities, limitations and future challenges. Brief. Bioinf. 4(2), 105–123 (2003)

    Article  Google Scholar 

  6. Conte, A., Grossi, R., Punzi, G., Uno, T.: Polynomial-delay enumeration of maximal common subsequences. In: Brisaboa, N.R., Puglisi, S.J. (eds.) String Processing and Information Retrieval, pp. 189–202. Springer, Cham (2019)

  7. Crochemore, M., Melichar, B., Tronıček, Z.: Directed acyclic subsequence graph-overview. J. Discrete Algorithms 1(3–4), 255–280 (2003)

    Article  MathSciNet  Google Scholar 

  8. Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J., White, O., Salzberg, S.L.: Alignment of whole genomes. Nucl. Acids Res. 27(11), 2369–2376 (1999)

    Article  Google Scholar 

  9. Fraser, C.B., Irving, R.W., Middendorf, M.: Maximal common subsequences and minimal common supersequences. Inf. Comput. 124(2), 145–153 (1996)

    Article  MathSciNet  Google Scholar 

  10. Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM 24(4), 664–675 (1977)

    Article  MathSciNet  Google Scholar 

  11. Hsu, W.J., Du, M.W.: Computing a longest common subsequence for a set of strings. BIT Numer. Math. 24(1), 45–59 (1984)

    Article  MathSciNet  Google Scholar 

  12. Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20(5), 350–353 (1977)

    Article  MathSciNet  Google Scholar 

  13. Impagliazzo, R., Paturi, R.: On the complexity of k-sat. J. Comput. Syst. Sci. 62(2), 367–375 (2001)

    Article  MathSciNet  Google Scholar 

  14. Kanté, M.M., Limouzy, V., Mary, A., Nourine, L.: On the enumeration of minimal dominating sets and related notions. SIAM J. Discrete Math. 28(4), 1916–1929 (2014)

    Article  MathSciNet  Google Scholar 

  15. Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching of Addison-Wesley series in computer science and information processing. Addison-Wesley (1997)

  16. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), 1–9 (2004)

    Article  Google Scholar 

  17. Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Generating all maximal independent sets: NP-hardness and polynomial-time algorithms. SIAM J. Comput. 9(3), 558–565 (1980)

    Article  MathSciNet  Google Scholar 

  18. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sc. 20(1), 18–31 (1980)

    Article  MathSciNet  Google Scholar 

  19. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding \(k\)-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43-es (2007)

  20. Sakai, Y.: Maximal common subsequence algorithms. In: Navarro, G., Sankoff, D., Zhu, B. (eds.) Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Volume 105 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, pp. 1:1–1:10. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2018)

  21. Sakai, Y.: Maximal common subsequence algorithms. Theor. Comput. Sci. 793, 132–139 (2019)

    Article  MathSciNet  Google Scholar 

  22. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulia Punzi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conte, A., Grossi, R., Punzi, G. et al. Enumeration of Maximal Common Subsequences Between Two Strings. Algorithmica 84, 757–783 (2022). https://doi.org/10.1007/s00453-021-00898-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-021-00898-5

Keywords

Navigation