Skip to main content

New Word-Based Adaptive Dense Compressors

  • Conference paper
Combinatorial Algorithms (IWOCA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5874))

Included in the following conference series:

Abstract

In the last two decades the natural language compression made a great progress. The main step in this evolution was the introduction of word-based compression by Moffat. The word-based statistical compression algorithms are able to achieve 35% improvement in the compression ratio in comparison with character-based ones. We present two new word-based statistical compression algorithms based on dense coding idea: Two Byte Dense Code (TBDC) and Self-Tuning Dense Code (SCDC). TBDC uses the codewords with maximal size 2 bytes and must be implemented with some pruning technique. STDC is able to tune its code space during the compression process and so achieve better compression. Our algorithms improve the compression ratio and are considerate to smaller files which are very often omitted. We present also a generalized concept of dense coding called Open Dense Code (ODC) which provides a frame for definition of these two and many other dense code schemas.

This research has been partially supported by the Ministry of Education, Youth and Sports under research program MSM 6840770014, and by the Czech Science Foundation as project No. 201/09/0807.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Moffat, A.: Word-based Text Compression. Software - Practise and Experience 19(2), 185–198 (1989)

    Article  Google Scholar 

  2. Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  3. Brisaboa, N., Fariña, A., Navarro, G., Esteller, M.F.: (S,C)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)

    Google Scholar 

  4. Heaps, H.S.: Information Retrieval: Computational and Theoretical Aspects. Academic Press, London (1978)

    MATH  Google Scholar 

  5. Zipf, G.K.: Human Behaviour and the Principle of Least Effort. Addison-Wesley, Reading (1949)

    Google Scholar 

  6. Brisaboa, N.R., Fariña, A., Navarro, G., Parama, J.R.: New Adaptive Compressors for Natural Language Text. Software - Practice & Experience 38(13), 1429–1450 (2008)

    Article  Google Scholar 

  7. de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast searching on compressed text allowing errors. In: Proceedings 21st SIGIR, pp. 298–306 (1998)

    Google Scholar 

  8. Moffat, A.: Arithmetic coding revisited. ACM Trans. on Inf. Systems 16, 256–294 (1998)

    Article  Google Scholar 

  9. Vitter, J.S.: Algorithm 673: Dynamic Huffman coding. ACM Transactions on Mathematical Software (TOMS) 15(2), 158–167 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  10. Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  11. Moffat, A.: The Arithmetic Coding Page, http://www.cs.mu.oz.au/~alistair/

  12. Brisaboa, et al.: Family of Dense Compressors, http://vios.dc.fi.udc.es/codes/

  13. Geelnard, M.: Basic Compression Library, http://bcl.comli.eu/

  14. Scott, D.: Vitter Adaptive Compression, http://bijective.dogma.net

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Procházka, P., Holub, J. (2009). New Word-Based Adaptive Dense Compressors. In: Fiala, J., Kratochvíl, J., Miller, M. (eds) Combinatorial Algorithms. IWOCA 2009. Lecture Notes in Computer Science, vol 5874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10217-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10217-2_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10216-5

  • Online ISBN: 978-3-642-10217-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics