Skip to main content

Fast and Practical Algorithms for Computing All the Runs in a String

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Abstract

A repetition in a string x is a substring \({ \bf{w}} = {\it \bf{u}}^e\) of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring \({\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}\) of “maximal periodicity”, where \({\it \bf{u}}^e\) is a repetition and u * a maximum-length possibly empty proper prefix of u. A run may encode as many as \(|{\it \bf{u}}|\) repetitions. The maximum number of repetitions in any string \({\it \bf{x}} = {\it \bf{x}}[1..n]\) is well known to be Θ(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach’s Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main’s Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.

The work of the first and third authors was supported in part by grants from the Natural Sciences & Engineering Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algs. 2, 53–86 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theoret. comput. sci. 22, 297–315 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  3. Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inform. process. lett. 12(5), 244–250 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  4. Fan, K., Puglisi, S.J., Smyth, W.F., Turpin, A.: A new periodicity lemma. SIAM J. Discrete Math. 20(3), 656–668 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  5. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th FOCS, pp. 137–143 (1997)

    Google Scholar 

  6. Franek, F., Holub, J., Smyth, W.F., Xiao, X.: Computing quasi suffix arrays. J. Automata, Languages & Combinatorics 8(4), 593–606 (2003)

    MATH  MathSciNet  Google Scholar 

  7. Franek, F., Simpson, R. J., Smyth, W. F.: The maximum number of runs in a string. In: Miller, M., Park, K.(eds.) Proc. 14th AWOCA, pp. 26–35 (2003)

    Google Scholar 

  8. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing & string matching. SIAM J. Computing 35(2), 378–407 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. 30th ICALP. pp. 943–955 (2003)

    Google Scholar 

  10. Karlin, S., Ghandour, G., Ost, F., Tavare, S., Korn, L.J.: New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. USA 80, 5660–5664 (1983)

    Article  MATH  Google Scholar 

  11. Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, Springer, Heidelberg (2001)

    Google Scholar 

  12. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Kolpakov, R., Kucherov, G.: http://bioinfo.lifl.fr/mreps/

  14. Kolpakov, R., Kucherov, G.: On maximal repetitions in words. J. Discrete Algs. 1, 159–186 (2000)

    MathSciNet  Google Scholar 

  15. Kurtz, S.: Reducing the space requirement of suffix trees. Software Practice & Experience 29(13), 1149–1171 (1999)

    Article  Google Scholar 

  16. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Information Theory 22, 75–81 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  17. Lentin, A., Schützenberger, M.P.: A combinatorial problem in the theory of free monoids, Combinatorial Mathematics & Its Applications. In: Bose, R.C., Dowling, T.A. (eds.) University of North Carolina Press, pp. 128–144 (1969)

    Google Scholar 

  18. Main, M.G.: Detecting leftmost maximal periodicities. Discrete Applied Maths 25, 145–153 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  19. Main, M.G., Lorentz, R.J.: An O(n log n) Algorithm for Recognizing Repetition, Tech. Rep. CS-79–056, Computer Science Department, Washington State University (1979)

    Google Scholar 

  20. Main, M.G., Lorentz, R.J.: An O(nlog n) algorithm for finding all repetitions in a string. J. Algs. 5, 422–432 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  21. Mäkinen, V., Navarro, G.: Compressed full-text indices. ACM Computing Surveys (to appear)

    Google Scholar 

  22. Maniscalco, M., Puglisi, S.J.: Faster lightweight suffix array construction. In: Ryan, J., Dafik (eds.) Proc. 17th AWOCA pp. 16–29 (2006)

    Google Scholar 

  23. Manzini, G.: Two space-saving tricks for linear time LCP computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, Springer, Heidelberg (2004)

    Google Scholar 

  24. Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  25. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 32(2), 262–272 (1976)

    MathSciNet  Google Scholar 

  26. Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys (to appear)

    Google Scholar 

  27. Rytter, W.: The number of runs in a string: improved analysis of the linear upper bound. In: Durand, B., Thomas, W. (eds.) Proc. 23rd STACS. LNCS, vol. 2884, pp. 184–195. Springer, Heidelberg (2006)

    Google Scholar 

  28. Sadakane, K.: Space-efficient data structures for flexible text retrieval systems. In: Bose, P., Morin, P. (eds.) ISAAC 2002. LNCS, vol. 2518, Springer, Heidelberg (2002)

    Google Scholar 

  29. Smyth, B.: Computing Patterns in Strings, Pearson Addison-Wesley, p. 423 (2003)

    Google Scholar 

  30. Thue, A.: Über unendliche zeichenreihen. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana 7, 1–22 (1906)

    Google Scholar 

  31. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  32. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Annual IEEE Symp. Switching & Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  33. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Information Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, G., Puglisi, S.J., Smyth, W.F. (2007). Fast and Practical Algorithms for Computing All the Runs in a String. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics