Abstract
A new data structure is investigated, which allows fast decoding of texts encoded by canonical Huffman codes. The storage requirements are much lower than for conventional Huffman trees, O(log2 n) for trees of depth O(log n), and decoding is faster, because a part of the bit-comparisons necessary for the decoding may be saved. Empirical results on large real-life distributions show a reduction of up to 50% and more in the number of bit operations. The basic idea is then generalized, yielding further savings.
Article PDF
Similar content being viewed by others
References
Bell TC, Moffat A, Nevill-Manning CG, Witten IH and Zobel J (1993) Data compression in full-text retrieval systems. Journal ASIS, 44:508-531.
Bookstein A and Klein ST (1990) Compression, information theory and grammars: A unified approach. ACM Trans. on Information Systems, 8:27-49.
Bookstein and Klein ST (1993) Is Huffman coding dead? Computing, 50:279-296.
Bookstein A, Klein ST and Ziff DA (1992) A systematic approach to compressing a full text retrieval system. Information Processing & Management, 28:795-806.
Choueka Y, Klein ST and Perl Y (1985) Efficient variants of Huffman codes in high level languages. In: Proc. 8-th ACM-SIGIR Conf., Montreal, pp. 122-130.
de Moura ES, Navarro G, Ziviani N and Baeza-Yates R (1998) Fast searching on compressed text allowing errors. In: Proc. 21-st ACM-SIGIR Conf., Melbourne, Australia, pp. 295-306.
Ferguson TJ and Rabinowitz JH (1984) Self-synchronizing Huffman codes. IEEE Trans. on Information Theory. IT-30:687-693.
Fraenkel AS (1976) All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, expanded summary. Jurimetrics J., 16:149-156.
Fraenkel AS and Klein ST (1985) Novel compression of sparse bit-strings. In: Combinatorial Algorithms on Words, Springer Verlag, Berlin, pp. 169-183. NATO ASI Series, Vol. F12.
Fraenkel AS and Klein ST (1990) Bidirectional Huffman coding. The Computer Journal, 33:296-307.
Fraenkel AS and Klein ST (1993) Bounding the depth of search trees. The Computer Journal, 36:668-678.
Gilbert EN and Moore EF (1959) Variable-length binary encodings. The Bell System Technical Journal, 38:933-968.
Heaps HS (1978) Information Retrieval, Computational and Theoretical Aspects. Academic Press, New York.
Hirschberg DS and Lelewer DA (1990) Efficient decoding of prefix codes. Comm. of the ACM, 33:449-459.
Huffman D (1952) A method for the construction of minimum redundancy codes. Proc. of the IRE, 40:1098-1101.
Katona GHO and Nemetz TOH (1965) Huffman codes and self-information. IEEE Trans. on Inf. Th., IT-11:284-292.
Klein ST, Bookstein A and Deerwester S (1989) Storing text retrieval systems on CD-ROM: Compression and encryption considerations. ACM Trans. on Information Systems, 7:230-245.
Knuth DE (1973) The Art of Computer Programming, Vol. I: Fundamental Algorithms. Addison-Wesley, Reading, MA.
Lelewer DA and Hirschberg DS (1987) Data compression. ACM Computing Surveys, 19:261-296.
Longo G and Galasso G (1982) An application of informational divergence to Huffman codes. IEEE Trans. on Inf. Th., IT-28:36-43.
Moffat A and Bell T (1995) In-situ generation of compressed inverted files. J. ASIS, 46:537-550.
Moffat A and Turpin A (1997) On the implementation of minimum redundancy prefix codes. IEEE Trans. on Communications, 45:1200-1207.
Moffat A, Turpin A and Katajainen J (1995) Space-efficient construction of optimal prefix codes. In: Proc. Data Compression Conference DCC-95, Snowbird, Utah, pp. 192-201.
Moffat A, Zobel J and Sharman N (1997) Text compression for dynamic document databases. IEEE Transactions on Knowledge and Data Engineering, 9:302-313.
Schwartz ES and Kallick B (1964) Generating a canonical prefix encoding. Comm. of the ACM, 7:166-169.
Sieminski A (1988) Fast decoding of the Huffman codes. Information Processing Letters, 26:237-241.
Witten IH, Moffat A and Bell TC (1994) Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York.
Zipf GK (1935) The Psycho-Biology of Language. Boston, Houghton.
Ziv J and Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans. on Inf. Th., IT-23:337-343.
Ziv J and Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans. on Inf. Th., IT-24:530-536.
Zobel J and Moffat A(1995) Adding compression to a full-text retrieval system. Software-Practice & Experience, 26:891-903.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Klein, S.T. Skeleton Trees for the Efficient Decoding of Huffman Encoded Texts. Information Retrieval 3, 7–23 (2000). https://doi.org/10.1023/A:1009910017828
Issue Date:
DOI: https://doi.org/10.1023/A:1009910017828