Skip to main content
Log in

Linear Time Algorithms for Generalizations of the Longest Common Substring Problem

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In its simplest form, the longest common substring problem is to find a longest substring common to two or multiple strings. Using (generalized) suffix trees, this problem can be solved in linear time and space. A first generalization is the k -common substring problem: Given m strings of total length n, for all k with 2≤km simultaneously find a longest substring common to at least k of the strings. It is known that the k-common substring problem can also be solved in O(n) time (Hui in Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pp. 230–243, Springer, Berlin, 1992). A further generalization is the k -common repeated substring problem: Given m strings T (1),T (2),…,T (m) of total length n and m positive integers x 1,…,x m , for all k with 1≤km simultaneously find a longest string ω for which there are at least k strings \(T^{(i_{1})},T^{(i_{2})},\ldots,T^{(i_{k})}\) (1≤i 1<i 2<⋅⋅⋅<i k m) such that ω occurs at least \(x_{i_{j}}\) times in \(T^{(i_{j})}\) for each j with 1≤jk. (For x 1=⋅⋅⋅=x m =1, we have the k-common substring problem.) In this paper, we present the first O(n) time algorithm for the k-common repeated substring problem. Our solution is based on a new linear time algorithm for the k-common substring problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hui, L.C.K.: Color set size problem with applications to string matching. In: Proc. 3rd Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 644, pp. 230–243. Springer, Berlin (1992)

    Google Scholar 

  2. Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  3. Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words, pp. 85–96. Springer, Berlin (1985)

    Google Scholar 

  4. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1999)

    Google Scholar 

  5. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE, New York (1973)

    Chapter  Google Scholar 

  6. Lee, I., Iliopoulos, C.S., Park, K.: Linear time algorithm for the longest common repeat problem. J. Discrete Algorithms 5(2), 243–249 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lee, I., Pinzon-Ardila, Y.J.: A simple algorithm for finding exact common repeats. IEICE Trans. 90D(12), 2096–2099 (2007)

    Article  Google Scholar 

  8. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. 30th International Colloquium on Automata, Languages and Programming. Lecture Notes in Computer Science, vol. 2719, pp. 943–955. Springer, Berlin (2003)

    Chapter  Google Scholar 

  9. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2676, pp. 200–210. Springer, Berlin (2003)

    Chapter  Google Scholar 

  10. Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-time construction of suffix arrays. In: Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2676, pp. 186–199. Springer, Berlin (2003)

    Chapter  Google Scholar 

  11. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001)

    Chapter  Google Scholar 

  12. Berkman, O., Vishkin, U.: Recursive star-tree parallel data structure. SIAM J. Comput. 22(2), 221–242 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  13. Fischer, J., Heun, V.: A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Proc. 1st International Symposium on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. Lecture Notes in Computer Science, vol. 4614, pp. 459–470. Springer, Berlin (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enno Ohlebusch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arnold, M., Ohlebusch, E. Linear Time Algorithms for Generalizations of the Longest Common Substring Problem. Algorithmica 60, 806–818 (2011). https://doi.org/10.1007/s00453-009-9369-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-009-9369-1

Keywords

Navigation