Fast and Practical Algorithms for Computing All the Runs in a String

Chen, Gang; Puglisi, Simon J.; Smyth, W. F.

doi:10.1007/978-3-540-73437-6_31

Gang Chen¹,
Simon J. Puglisi² &
W. F. Smyth^1,2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

774 Accesses
12 Citations

Abstract

A repetition in a string x is a substring \({ \bf{w}} = {\it \bf{u}}^e\) of x, maximum e ≥ 2, where u is not itself a repetition in w. A run in x is a substring \({\it \bf{w}} = {\it \bf{u}}^e{\it \bf{u^{*}}}\) of “maximal periodicity”, where \({\it \bf{u}}^e\) is a repetition and u ^* a maximum-length possibly empty proper prefix of u. A run may encode as many as \(|{\it \bf{u}}|\) repetitions. The maximum number of repetitions in any string \({\it \bf{x}} = {\it \bf{x}}[1..n]\) is well known to be Θ(nlogn). In 2000 Kolpakov & Kucherov showed that the maximum number of runs in x is O(n); they also described a Θ(n)-time algorithm, based on Farach’s Θ(n)-time suffix tree construction algorithm (STCA), Θ(n)-time Lempel-Ziv factorization, and Main’s Θ(n)-time leftmost runs algorithm, to compute all the runs in x. Recently Abouelhoda et al. proposed a Θ(n)-time Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — a suffix array together with other supporting data structures. In this paper we introduce a collection of fast space-efficient algorithms for computing all the runs in a string that appear in many circumstances to be superior to those previously proposed.

The work of the first and third authors was supported in part by grants from the Natural Sciences & Engineering Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algs. 2, 53–86 (2004)
Article MATH MathSciNet Google Scholar
Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theoret. comput. sci. 22, 297–315 (1983)
Article MATH MathSciNet Google Scholar
Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inform. process. lett. 12(5), 244–250 (1981)
Article MATH MathSciNet Google Scholar
Fan, K., Puglisi, S.J., Smyth, W.F., Turpin, A.: A new periodicity lemma. SIAM J. Discrete Math. 20(3), 656–668 (2006)
Article MATH MathSciNet Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38^th FOCS, pp. 137–143 (1997)
Google Scholar
Franek, F., Holub, J., Smyth, W.F., Xiao, X.: Computing quasi suffix arrays. J. Automata, Languages & Combinatorics 8(4), 593–606 (2003)
MATH MathSciNet Google Scholar
Franek, F., Simpson, R. J., Smyth, W. F.: The maximum number of runs in a string. In: Miller, M., Park, K.(eds.) Proc. 14^th AWOCA, pp. 26–35 (2003)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing & string matching. SIAM J. Computing 35(2), 378–407 (2005)
Article MATH MathSciNet Google Scholar
Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction. In: Proc. 30^th ICALP. pp. 943–955 (2003)
Google Scholar
Karlin, S., Ghandour, G., Ost, F., Tavare, S., Korn, L.J.: New approaches for computer analysis of nucleic acid sequences. Proc. Natl. Acad. Sci. USA 80, 5660–5664 (1983)
Article MATH Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, Springer, Heidelberg (2001)
Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. In: Baeza-Yates, R.A., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, Springer, Heidelberg (2003)
Chapter Google Scholar
Kolpakov, R., Kucherov, G.: http://bioinfo.lifl.fr/mreps/
Kolpakov, R., Kucherov, G.: On maximal repetitions in words. J. Discrete Algs. 1, 159–186 (2000)
MathSciNet Google Scholar
Kurtz, S.: Reducing the space requirement of suffix trees. Software Practice & Experience 29(13), 1149–1171 (1999)
Article Google Scholar
Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Information Theory 22, 75–81 (1976)
Article MATH MathSciNet Google Scholar
Lentin, A., Schützenberger, M.P.: A combinatorial problem in the theory of free monoids, Combinatorial Mathematics & Its Applications. In: Bose, R.C., Dowling, T.A. (eds.) University of North Carolina Press, pp. 128–144 (1969)
Google Scholar
Main, M.G.: Detecting leftmost maximal periodicities. Discrete Applied Maths 25, 145–153 (1989)
Article MATH MathSciNet Google Scholar
Main, M.G., Lorentz, R.J.: An O(n log n) Algorithm for Recognizing Repetition, Tech. Rep. CS-79–056, Computer Science Department, Washington State University (1979)
Google Scholar
Main, M.G., Lorentz, R.J.: An O(nlog n) algorithm for finding all repetitions in a string. J. Algs. 5, 422–432 (1984)
Article MATH MathSciNet Google Scholar
Mäkinen, V., Navarro, G.: Compressed full-text indices. ACM Computing Surveys (to appear)
Google Scholar
Maniscalco, M., Puglisi, S.J.: Faster lightweight suffix array construction. In: Ryan, J., Dafik (eds.) Proc. 17^th AWOCA pp. 16–29 (2006)
Google Scholar
Manzini, G.: Two space-saving tricks for linear time LCP computation. In: Hagerup, T., Katajainen, J. (eds.) SWAT 2004. LNCS, vol. 3111, Springer, Heidelberg (2004)
Google Scholar
Manzini, G., Ferragina, P.: Engineering a lightweight suffix array construction algorithm. Algorithmica 40, 33–50 (2004)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 32(2), 262–272 (1976)
MathSciNet Google Scholar
Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys (to appear)
Google Scholar
Rytter, W.: The number of runs in a string: improved analysis of the linear upper bound. In: Durand, B., Thomas, W. (eds.) Proc. 23^rd STACS. LNCS, vol. 2884, pp. 184–195. Springer, Heidelberg (2006)
Google Scholar
Sadakane, K.: Space-efficient data structures for flexible text retrieval systems. In: Bose, P., Morin, P. (eds.) ISAAC 2002. LNCS, vol. 2518, Springer, Heidelberg (2002)
Google Scholar
Smyth, B.: Computing Patterns in Strings, Pearson Addison-Wesley, p. 423 (2003)
Google Scholar
Thue, A.: Über unendliche zeichenreihen. Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana 7, 1–22 (1906)
Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Annual IEEE Symp. Switching & Automata Theory, pp. 1–11 (1973)
Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Information Theory 23, 337–343 (1977)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Algorithms Research Group, Department of Computing & Software, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
Gang Chen & W. F. Smyth
Department of Computing, Curtin University, GPO Box U1987, Perth WA 6845, Australia
Simon J. Puglisi & W. F. Smyth

Authors

Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Simon J. Puglisi
View author publications
You can also search for this author in PubMed Google Scholar
W. F. Smyth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, G., Puglisi, S.J., Smyth, W.F. (2007). Fast and Practical Algorithms for Computing All the Runs in a String. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-73437-6_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics