Abstract
A maximal common subsequence (MCS) between two strings X and Y is an inclusion-maximal subsequence of both X and Y. MCSs are a natural generalization of the classical concept of longest common subsequence (LCS), which can be seen as a longest MCS. We study the problem of efficiently listing all the distinct MCSs between two strings. As discussed in the paper, this problem is algorithmically challenging as the same MCS cannot be listed multiple times: for example, dynamic programming [Fraser et al., CPM 1998] incurs in an exponential waste of time, and a recent algorithm for finding an MCS [Sakai, CPM 2018] does not seem to immediately extend to listing. We follow an alternative and novel graph-based approach, proposing the first output-sensitive algorithm for this problem: it takes polynomial time in n per MCS found, where \(n = \max \{ |X|, |Y|\}\), with polynomial preprocessing time and space.













Similar content being viewed by others
Notes
The Strong Exponential Time Hypothesis (SETH) [13] states that \(\lim _{k\rightarrow \infty } s_k = 1\), where \(s_k=\inf \{\delta \mid k\text {-SAT can be solved in } O(2^{\delta n}) \text { time}\}\). It is widely believed to be true, and it has been used to prove conditional lower bounds for a variety of problems (see [2] for some examples).
The author employs these edges to improve the Hunt–Szymansky algorithm [12], which extracts one LCS of two strings of length n in \(O((r + n) \log n)\), where r is the total number of ordered pairs of positions at which the two sequences match, that is, the number of edges in the string bipartite graph.
To be precise, each recursive node has up to \(\sigma \) child nodes; the binary partition is seen by the fact that each recursive call corresponds to “using the edge \((l_c,m_c)\)” and the continuation after backtracking corresponds to “not using the edge \((l_c,m_c)\)”.
References
Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pp. 59–78 (2015)
Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 434–443. IEEE (2014)
Apostolico, A.: Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Inf. Process. Lett. 23(2), 63–69 (1986)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings 7th International Symposium on String Processing and Information Retrieval, pp. 39–48. SPIRE (2000)
Chain, P., Kurtz, S., Ohlebusch, E., Slezak, T.: An applications-focused review of comparative genomics tools: capabilities, limitations and future challenges. Brief. Bioinf. 4(2), 105–123 (2003)
Conte, A., Grossi, R., Punzi, G., Uno, T.: Polynomial-delay enumeration of maximal common subsequences. In: Brisaboa, N.R., Puglisi, S.J. (eds.) String Processing and Information Retrieval, pp. 189–202. Springer, Cham (2019)
Crochemore, M., Melichar, B., Tronıček, Z.: Directed acyclic subsequence graph-overview. J. Discrete Algorithms 1(3–4), 255–280 (2003)
Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J., White, O., Salzberg, S.L.: Alignment of whole genomes. Nucl. Acids Res. 27(11), 2369–2376 (1999)
Fraser, C.B., Irving, R.W., Middendorf, M.: Maximal common subsequences and minimal common supersequences. Inf. Comput. 124(2), 145–153 (1996)
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM 24(4), 664–675 (1977)
Hsu, W.J., Du, M.W.: Computing a longest common subsequence for a set of strings. BIT Numer. Math. 24(1), 45–59 (1984)
Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20(5), 350–353 (1977)
Impagliazzo, R., Paturi, R.: On the complexity of k-sat. J. Comput. Syst. Sci. 62(2), 367–375 (2001)
Kanté, M.M., Limouzy, V., Mary, A., Nourine, L.: On the enumeration of minimal dominating sets and related notions. SIAM J. Discrete Math. 28(4), 1916–1929 (2014)
Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching of Addison-Wesley series in computer science and information processing. Addison-Wesley (1997)
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), 1–9 (2004)
Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Generating all maximal independent sets: NP-hardness and polynomial-time algorithms. SIAM J. Comput. 9(3), 558–565 (1980)
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sc. 20(1), 18–31 (1980)
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding \(k\)-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43-es (2007)
Sakai, Y.: Maximal common subsequence algorithms. In: Navarro, G., Sankoff, D., Zhu, B. (eds.) Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Volume 105 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, pp. 1:1–1:10. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2018)
Sakai, Y.: Maximal common subsequence algorithms. Theor. Comput. Sci. 793, 132–139 (2019)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Conte, A., Grossi, R., Punzi, G. et al. Enumeration of Maximal Common Subsequences Between Two Strings. Algorithmica 84, 757–783 (2022). https://doi.org/10.1007/s00453-021-00898-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-021-00898-5