Abstract.
For 30 years the Lempel–Ziv factorization LZ x of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. Traditionally the standard method for computing LZ x was based on Θ(n)-time (or, depending on the measure used, O(n log n)-time) processing of the suffix tree ST x of x. Recently Abouelhoda et al. proposed an efficient Lempel–Ziv factorization algorithm based on an “enhanced” suffix array – that is, a suffix array SA x together with supporting data structures, principally an “interval tree”. In this paper we introduce a collection of fast space-efficient algorithms for LZ factorization, also based on suffix arrays, that in theory as well as in many practical circumstances are superior to those previously proposed; one family out of this collection achieves true Θ(n)-time alphabet-independent processing in the worst case by avoiding tree structures altogether.
Similar content being viewed by others
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of the first and third authors was supported in part by grants from the Natural Sciences & Engineering Research Council of Canada.
Rights and permissions
About this article
Cite this article
Chen, G., Puglisi, S.J. & Smyth, W.F. Lempel–Ziv Factorization Using Less Time & Space. Math.comput.sci. 1, 605–623 (2008). https://doi.org/10.1007/s11786-007-0024-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-007-0024-4