Abstract
In the last two decades the natural language compression made a great progress. The main step in this evolution was the introduction of word-based compression by Moffat. The word-based statistical compression algorithms are able to achieve 35% improvement in the compression ratio in comparison with character-based ones. We present two new word-based statistical compression algorithms based on dense coding idea: Two Byte Dense Code (TBDC) and Self-Tuning Dense Code (SCDC). TBDC uses the codewords with maximal size 2 bytes and must be implemented with some pruning technique. STDC is able to tune its code space during the compression process and so achieve better compression. Our algorithms improve the compression ratio and are considerate to smaller files which are very often omitted. We present also a generalized concept of dense coding called Open Dense Code (ODC) which provides a frame for definition of these two and many other dense code schemas.
This research has been partially supported by the Ministry of Education, Youth and Sports under research program MSM 6840770014, and by the Czech Science Foundation as project No. 201/09/0807.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Moffat, A.: Word-based Text Compression. Software - Practise and Experience 19(2), 185–198 (1989)
Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Brisaboa, N., Fariña, A., Navarro, G., Esteller, M.F.: (S,C)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Heaps, H.S.: Information Retrieval: Computational and Theoretical Aspects. Academic Press, London (1978)
Zipf, G.K.: Human Behaviour and the Principle of Least Effort. Addison-Wesley, Reading (1949)
Brisaboa, N.R., Fariña, A., Navarro, G., Parama, J.R.: New Adaptive Compressors for Natural Language Text. Software - Practice & Experience 38(13), 1429–1450 (2008)
de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast searching on compressed text allowing errors. In: Proceedings 21st SIGIR, pp. 298–306 (1998)
Moffat, A.: Arithmetic coding revisited. ACM Trans. on Inf. Systems 16, 256–294 (1998)
Vitter, J.S.: Algorithm 673: Dynamic Huffman coding. ACM Transactions on Mathematical Software (TOMS) 15(2), 158–167 (1989)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Moffat, A.: The Arithmetic Coding Page, http://www.cs.mu.oz.au/~alistair/
Brisaboa, et al.: Family of Dense Compressors, http://vios.dc.fi.udc.es/codes/
Geelnard, M.: Basic Compression Library, http://bcl.comli.eu/
Scott, D.: Vitter Adaptive Compression, http://bijective.dogma.net
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Procházka, P., Holub, J. (2009). New Word-Based Adaptive Dense Compressors. In: Fiala, J., Kratochvíl, J., Miller, M. (eds) Combinatorial Algorithms. IWOCA 2009. Lecture Notes in Computer Science, vol 5874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10217-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-10217-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10216-5
Online ISBN: 978-3-642-10217-2
eBook Packages: Computer ScienceComputer Science (R0)