Enumeration of Maximal Common Subsequences Between Two Strings

Conte, Alessio; Grossi, Roberto; Punzi, Giulia; Uno, Takeaki

doi:10.1007/s00453-021-00898-5

Enumeration of Maximal Common Subsequences Between Two Strings

Published: 12 January 2022

Volume 84, pages 757–783, (2022)
Cite this article

Algorithmica Aims and scope Submit manuscript

Alessio Conte¹,
Roberto Grossi¹,
Giulia Punzi ORCID: orcid.org/0000-0001-8738-1595¹ &
…
Takeaki Uno²

592 Accesses
4 Citations
Explore all metrics

Abstract

A maximal common subsequence (MCS) between two strings X and Y is an inclusion-maximal subsequence of both X and Y. MCSs are a natural generalization of the classical concept of longest common subsequence (LCS), which can be seen as a longest MCS. We study the problem of efficiently listing all the distinct MCSs between two strings. As discussed in the paper, this problem is algorithmically challenging as the same MCS cannot be listed multiple times: for example, dynamic programming [Fraser et al., CPM 1998] incurs in an exponential waste of time, and a recent algorithm for finding an MCS [Sakai, CPM 2018] does not seem to immediately extend to listing. We follow an alternative and novel graph-based approach, proposing the first output-sensitive algorithm for this problem: it takes polynomial time in n per MCS found, where \(n = \max \{ |X|, |Y|\}\), with polynomial preprocessing time and space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polynomial-Delay Enumeration of Maximal Common Subsequences

Longest Common Subsequence in k Length Substrings

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Notes

The Strong Exponential Time Hypothesis (SETH) [13] states that \(\lim _{k\rightarrow \infty } s_k = 1\), where \(s_k=\inf \{\delta \mid k\text {-SAT can be solved in } O(2^{\delta n}) \text { time}\}\). It is widely believed to be true, and it has been used to prove conditional lower bounds for a variety of problems (see [2] for some examples).
The author employs these edges to improve the Hunt–Szymansky algorithm [12], which extracts one LCS of two strings of length n in \(O((r + n) \log n)\), where r is the total number of ordered pairs of positions at which the two sequences match, that is, the number of edges in the string bipartite graph.
To be precise, each recursive node has up to \(\sigma \) child nodes; the binary partition is seen by the fact that each recursive call corresponds to “using the edge \((l_c,m_c)\)” and the continuation after backtracking corresponds to “not using the edge \((l_c,m_c)\)”.

References

Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pp. 59–78 (2015)
Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 434–443. IEEE (2014)
Apostolico, A.: Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings. Inf. Process. Lett. 23(2), 63–69 (1986)
Article MathSciNet Google Scholar
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings 7th International Symposium on String Processing and Information Retrieval, pp. 39–48. SPIRE (2000)
Chain, P., Kurtz, S., Ohlebusch, E., Slezak, T.: An applications-focused review of comparative genomics tools: capabilities, limitations and future challenges. Brief. Bioinf. 4(2), 105–123 (2003)
Article Google Scholar
Conte, A., Grossi, R., Punzi, G., Uno, T.: Polynomial-delay enumeration of maximal common subsequences. In: Brisaboa, N.R., Puglisi, S.J. (eds.) String Processing and Information Retrieval, pp. 189–202. Springer, Cham (2019)
Crochemore, M., Melichar, B., Tronıček, Z.: Directed acyclic subsequence graph-overview. J. Discrete Algorithms 1(3–4), 255–280 (2003)
Article MathSciNet Google Scholar
Delcher, A.L., Kasif, S., Fleischmann, R.D., Peterson, J., White, O., Salzberg, S.L.: Alignment of whole genomes. Nucl. Acids Res. 27(11), 2369–2376 (1999)
Article Google Scholar
Fraser, C.B., Irving, R.W., Middendorf, M.: Maximal common subsequences and minimal common supersequences. Inf. Comput. 124(2), 145–153 (1996)
Article MathSciNet Google Scholar
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM 24(4), 664–675 (1977)
Article MathSciNet Google Scholar
Hsu, W.J., Du, M.W.: Computing a longest common subsequence for a set of strings. BIT Numer. Math. 24(1), 45–59 (1984)
Article MathSciNet Google Scholar
Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20(5), 350–353 (1977)
Article MathSciNet Google Scholar
Impagliazzo, R., Paturi, R.: On the complexity of k-sat. J. Comput. Syst. Sci. 62(2), 367–375 (2001)
Article MathSciNet Google Scholar
Kanté, M.M., Limouzy, V., Mary, A., Nourine, L.: On the enumeration of minimal dominating sets and related notions. SIAM J. Discrete Math. 28(4), 1916–1929 (2014)
Article MathSciNet Google Scholar
Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching of Addison-Wesley series in computer science and information processing. Addison-Wesley (1997)
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), 1–9 (2004)
Article Google Scholar
Lawler, E.L., Lenstra, J.K., Rinnooy Kan, A.H.G.: Generating all maximal independent sets: NP-hardness and polynomial-time algorithms. SIAM J. Comput. 9(3), 558–565 (1980)
Article MathSciNet Google Scholar
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sc. 20(1), 18–31 (1980)
Article MathSciNet Google Scholar
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding \(k\)-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43-es (2007)
Sakai, Y.: Maximal common subsequence algorithms. In: Navarro, G., Sankoff, D., Zhu, B. (eds.) Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Volume 105 of Leibniz International Proceedings in Informatics (LIPIcs), Dagstuhl, Germany, pp. 1:1–1:10. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2018)
Sakai, Y.: Maximal common subsequence algorithms. Theor. Comput. Sci. 793, 132–139 (2019)
Article MathSciNet Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Pisa, Pisa, Italy
Alessio Conte, Roberto Grossi & Giulia Punzi
National Institute of Informatics, Tokyo, Japan
Takeaki Uno

Authors

Alessio Conte
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Grossi
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Punzi
View author publications
You can also search for this author in PubMed Google Scholar
Takeaki Uno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giulia Punzi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conte, A., Grossi, R., Punzi, G. et al. Enumeration of Maximal Common Subsequences Between Two Strings. Algorithmica 84, 757–783 (2022). https://doi.org/10.1007/s00453-021-00898-5

Download citation

Received: 31 August 2020
Accepted: 14 November 2021
Published: 12 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00453-021-00898-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enumeration of Maximal Common Subsequences Between Two Strings

Abstract

Access this article

Similar content being viewed by others

Polynomial-Delay Enumeration of Maximal Common Subsequences

Longest Common Subsequence in k Length Substrings

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enumeration of Maximal Common Subsequences Between Two Strings

Abstract

Access this article

Similar content being viewed by others

Polynomial-Delay Enumeration of Maximal Common Subsequences

Longest Common Subsequence in k Length Substrings

An Efficient Algorithm for Enumerating Longest Common Increasing Subsequences

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation