Abstract
The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.
Both authors are partially supported by Italian MURST Project of National Relevance “Bioinformatica e Ricerca Genomica”. Additional support is provided to the first author by FIRB Project “Bioinformatica per la Genomica e la Proteomica” and to the second author by Italian MURST Project of National Relevance “Linguaggi Formali ed Automi: Teoria e Applicazioni”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Burrows, M., Wheeler, D.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)
Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression. The Computer Journal 39 (1996) 731–740
Buchsbaum, A.L., Caldwell, D.F., Church, K.W., Fowler, G.S., Muthukrishnan, S.: Engineering the compression of massive tables: An experimental approach. In: Proc. 11th ACM-SIAM Symp. on Discrete Algorithms. (2000) 175–184
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48 (2001) 407–430
Bentley, J., Sleator, D., Tarjan, R., Wei, V.: A locally adaptive data compression scheme. Comm. of ACM 29 (1986) 320–330
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience (1990)
Effros, M.: Universal lossless source coding with the Burrows-Wheeler transform. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1999) 178–187
Sadakane, K.: On optimality of variants of the block sorting compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1998) 570
Arnavut, Z., Magliveras, S.S.: Block sorting and compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1997) 181–190
Balkenhol, B., Kurtz, S.: Universal data compression based on the Burrows and Wheeler-transformation: Theory and practice. Technical Report 98-069, Sonderforshunngsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, Germany (1998) Available from http://www.mathematik.uni-bielefeld.de/sfb343/preprints.
Wirth, A.I., Moffat, A.: Can we do without ranks in Burrows Wheeler transform compression? In: Proc. IEEE Data Compression Conference, IEEE Computer Society (2001) 419–428
Buchsbaum, A.L., Giancarlo, R., Fowler, G.S.: Improving table compression with combinatorial optimization. In: Proc. 13th ACM-SIAM Symp. on Discrete Algorithms. (2002) 213–222
Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. on Information Theory IT-23 (1977) 337–343
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory IT-24 (1978) 530–578
Moffat, A.: Implementing the PPM data compression scheme. IEEE Trans. on Communication COM-38 (1990) 1917–1921
Cormak, G., Horspool, R.: Data compression using dynamic markov modelling. Computer J. 30 (1987) 541–550
Cleary, J., Teahan, W.: Unbounded length contexts for PPM. Computer J. 40 (1997) 67–75
Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21 (1975) 194–203
Levenshtein, V.: On the redundancy and delay of decodable coding of natural numbers. (Translation from) Problems in Cybernetics, Nauka, Mscow 20 (1968) 173–179
Capocelli, R.M., Giancarlo, R., Taneja, I.: Bounds on the redundancy of Huffman codes. IEEE Transactions on Information Theory 32 (1986) 854–857
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giancarlo, R., Sciortino, M. (2003). Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_10
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive