Abstract
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA ’03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like, Run-Length encoders) versus non-binary compressors (like, Huffman and Arithmetic encoders) and Wavelet Trees of properly-designed shapes. As a corollary, we prove high-order entropy bounds for the challenging combination of Burrows-Wheeler Transform and Wavelet Trees.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abel, J.: Improvements to the Burrows-Wheeler compression algorithm: After BWT stages, http://citeseer.ist.psu.edu/abel03improvements.html
Arnavut, Z., Magliveras, S.: Block sorting and compression. In: DCC: Data Compression Conference, pp. 181–190. IEEE Computer Society TCC, Los Alamitos (1997)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Ferragina, P., Giancarlo, R., Manzini, G., Sciortino, M.: Boosting textual compression in optimal linear time. Journal of the ACM 52, 688–713 (2005)
Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM 52(4), 552–581 (2005)
Foschini, L., Grossi, R., Gupta, A., Vitter, J.: Fast compression with a static model in high order entropy. In: DCC: Data Compression Conference, pp. 62–71. IEEE Computer Society TCC, Los Alamitos (2004)
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2003), pp. 841–850 (2003)
Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments on compressing suffix arrays and applications. In: Proc. 15th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA 2004), pp. 636–645 (2004)
Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35, 378–407 (2005)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on rul-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferragina, P., Giancarlo, R., Manzini, G. (2006). The Myriad Virtues of Wavelet Trees. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds) Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Science, vol 4051. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11786986_49
Download citation
DOI: https://doi.org/10.1007/11786986_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35904-3
Online ISBN: 978-3-540-35905-0
eBook Packages: Computer ScienceComputer Science (R0)