Abstract
In its simplest form, the longest common substring problem is to find a longest substring common to two or multiple strings. Using (generalized) suffix trees, this problem can be solved in linear time and space. A first generalization is the k -common substring problem: Given m strings of total length n, for all k with 2≤k≤m simultaneously find a longest substring common to at least k of the strings. It is known that the k-common substring problem can also be solved in O(n) time (Hui in Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pp. 230–243, Springer, Berlin, 1992). A further generalization is the k -common repeated substring problem: Given m strings T (1),T (2),…,T (m) of total length n and m positive integers x 1,…,x m , for all k with 1≤k≤m simultaneously find a longest string ω for which there are at least k strings \(T^{(i_{1})},T^{(i_{2})},\ldots,T^{(i_{k})}\) (1≤i 1<i 2<⋅⋅⋅<i k ≤m) such that ω occurs at least \(x_{i_{j}}\) times in \(T^{(i_{j})}\) for each j with 1≤j≤k. (For x 1=⋅⋅⋅=x m =1, we have the k-common substring problem.) In this paper, we present the first O(n) time algorithm for the k-common repeated substring problem. Our solution is based on a new linear time algorithm for the k-common substring problem.
Similar content being viewed by others
References
Hui, L.C.K.: Color set size problem with applications to string matching. In: Proc. 3rd Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 644, pp. 230–243. Springer, Berlin (1992)
Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words, pp. 85–96. Springer, Berlin (1985)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1999)
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE, New York (1973)
Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. J. Discrete Algorithms 5(2), 243–249 (2007)
Lee, I., Pinzon-Ardila, Y.J.: A simple algorithm for finding exact common repeats. IEICE Trans. 90D(12), 2096–2099 (2007)
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. 30th International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 2719, pp. 943–955. Springer, Berlin (2003)
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2676, pp. 200–210. Springer, Berlin (2003)
Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2676, pp. 186–199. Springer, Berlin (2003)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001)
Berkman, O., Vishkin, U.: Recursive star-tree parallel data structure. SIAM J. Comput. 22(2), 221–242 (1993)
Fischer, J., Heun, V.: A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Proc. 1st International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. Lecture Notes in Computer Science, vol. 4614, pp. 459–470. Springer, Berlin (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arnold, M., Ohlebusch, E. Linear Time Algorithms for Generalizations of the Longest Common Substring Problem. Algorithmica 60, 806–818 (2011). https://doi.org/10.1007/s00453-009-9369-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-009-9369-1