Skip to main content

Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Included in the following conference series:

Abstract

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.

Both authors are partially supported by Italian MURST Project of National Relevance “Bioinformatica e Ricerca Genomica”. Additional support is provided to the first author by FIRB Project “Bioinformatica per la Genomica e la Proteomica” and to the second author by Italian MURST Project of National Relevance “Linguaggi Formali ed Automi: Teoria e Applicazioni”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burrows, M., Wheeler, D.: A block sorting data compression algorithm. Technical report, DIGITAL System Research Center (1994)

    Google Scholar 

  2. Fenwick, P.: The Burrows-Wheeler transform for block sorting text compression. The Computer Journal 39 (1996) 731–740

    Article  Google Scholar 

  3. Buchsbaum, A.L., Caldwell, D.F., Church, K.W., Fowler, G.S., Muthukrishnan, S.: Engineering the compression of massive tables: An experimental approach. In: Proc. 11th ACM-SIAM Symp. on Discrete Algorithms. (2000) 175–184

    Google Scholar 

  4. Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48 (2001) 407–430

    Article  MathSciNet  Google Scholar 

  5. Bentley, J., Sleator, D., Tarjan, R., Wei, V.: A locally adaptive data compression scheme. Comm. of ACM 29 (1986) 320–330

    Article  MATH  MathSciNet  Google Scholar 

  6. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience (1990)

    Google Scholar 

  7. Effros, M.: Universal lossless source coding with the Burrows-Wheeler transform. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1999) 178–187

    Google Scholar 

  8. Sadakane, K.: On optimality of variants of the block sorting compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1998) 570

    Google Scholar 

  9. Arnavut, Z., Magliveras, S.S.: Block sorting and compression. In: Proc. IEEE Data Compression Conference, IEEE Computer Society (1997) 181–190

    Google Scholar 

  10. Balkenhol, B., Kurtz, S.: Universal data compression based on the Burrows and Wheeler-transformation: Theory and practice. Technical Report 98-069, Sonderforshunngsbereich: Diskrete Strukturen in der Mathematik, Universität Bielefeld, Germany (1998) Available from http://www.mathematik.uni-bielefeld.de/sfb343/preprints.

  11. Wirth, A.I., Moffat, A.: Can we do without ranks in Burrows Wheeler transform compression? In: Proc. IEEE Data Compression Conference, IEEE Computer Society (2001) 419–428

    Google Scholar 

  12. Buchsbaum, A.L., Giancarlo, R., Fowler, G.S.: Improving table compression with combinatorial optimization. In: Proc. 13th ACM-SIAM Symp. on Discrete Algorithms. (2002) 213–222

    Google Scholar 

  13. Lempel, A., Ziv, J.: A universal algorithm for sequential data compression. IEEE Trans. on Information Theory IT-23 (1977) 337–343

    MathSciNet  Google Scholar 

  14. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory IT-24 (1978) 530–578

    Article  MathSciNet  Google Scholar 

  15. Moffat, A.: Implementing the PPM data compression scheme. IEEE Trans. on Communication COM-38 (1990) 1917–1921

    Article  Google Scholar 

  16. Cormak, G., Horspool, R.: Data compression using dynamic markov modelling. Computer J. 30 (1987) 541–550

    Google Scholar 

  17. Cleary, J., Teahan, W.: Unbounded length contexts for PPM. Computer J. 40 (1997) 67–75

    Article  Google Scholar 

  18. Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21 (1975) 194–203

    Article  MATH  MathSciNet  Google Scholar 

  19. Levenshtein, V.: On the redundancy and delay of decodable coding of natural numbers. (Translation from) Problems in Cybernetics, Nauka, Mscow 20 (1968) 173–179

    Google Scholar 

  20. Capocelli, R.M., Giancarlo, R., Taneja, I.: Bounds on the redundancy of Huffman codes. IEEE Transactions on Information Theory 32 (1986) 854–857

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Giancarlo, R., Sciortino, M. (2003). Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics