Abstract
Bitmaps are a useful, but storage voracious, component of many information retrieval systems. Earlier efforts to compress bitmaps were based on models of bit generation, particularly Markov models. While these permitted considerable reduction in storage, the short memory of Markov models may limit their compression efficiency. In this paper we accept the state orientation of Markov models, but introduce a Bayesian approach to assess the state; the analysis is based on data accumulating in a growing window. The paper describes the details of the probabilistic assumptions governing the Bayesian analysis, as well as the protocol for controlling the window that receives the data. We find slight improvement over the best performing strictly Markov models.
Article PDF
Similar content being viewed by others
References
Bookstein A, Klein ST and Raita T (1992) Model based concordance compression. In: Storer JA and Cohn M, Eds., Proc. Data Compression Conference DCC-92, Snowbird, Utah, pp. 82¶91.
Bookstein A, Klein ST and Raita T (1994) Markov models for clusters in concordance compression. In: Storer J.A and Cohn M, Eds., Proc. Data Compression Conference DCC-94, Snowbird, Utah, pp. 116¶125.
Bookstein A, Klein ST and Raita T (1997) Modeling word occurrences for the compression of concordances. ACM Transactions on Information Systems, 15:254¶290.
Bookstein A, Klein ST and Ziff DA (1992) A systematic approach to compressing a full text retrieval system. Information Processing and Management, 28:795¶806.
Choueka Y, Fraenkel AS, Klein ST and Segal E (1987) Improved Techniques for Processing Queries in Full-Text Systems, Proc. 10th ACM-SIGIR Conf., New Orleans, 306¶315.
Cover TM and Thomas JA (1991) Elements of Information Theory. John Wiley & Sons, New York.
Elias P (1975) Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory, IT-12:194¶203.
Feller W (1957) An Introduction to Probability Theory and Its Applications, vol. I. John Wiley & Sons, New York.
Fiala ER and Greene DH (1989) Data compression with finite windows. Communications of the ACM, 32:490¶505.
Fraenkel AS and Klein ST (1985) Novel Compression of Sparse Bit-Strings·Preliminary Report. In: Apostolico A and Galil Z, Eds., Combinatorial Algorithms on Words, Vol. 12, NATOASI Series F, Spring-Verlag, Heidelberg, 1985, pp. 169¶183
Hamming RW (1980) Coding and Information Theory. Prentice-Hall, Englewood Cliffs, NJ.
Johnson NL and Kotz S (1970) Distributions in Statistics: Continuous Univeriate Distributions-2. Wiley, New York.
Moffat A and Stuiver L (1996): Exploiting Clustering in Inverted File Compression. In: Storer JA and Cohn M, Eds., Proc. Data Compression Conference DCC-96, Snowbird, Utah, pp. 82¶91.
Press JS (1989) Bayesian statistics: principles, models, and applications. Wiley, New York.
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77:257¶286.
Teuhola, J (1978) A Compression Method for Clustered Bit-Vectors. Information Processing Letters, 7:308¶311.
Vitter JS (1987) Design and analysis of dynamic Huffman codes. Journal of the ACM, 34:825¶845.
Welch TA (1984) A technique for high performance data compression. IEEE Computer, 17:8¶19.
Witten IH, Moffat A and Bell TC (1994) Managing Gigabytes, Compressing and Indexing Documents and Images. International Thomson Publishing, London.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bookstein, A., Klein, S. & Raita, T. Simple Bayesian Model for Bitmap Compression. Information Retrieval 1, 315–328 (2000). https://doi.org/10.1023/A:1009931317394
Issue Date:
DOI: https://doi.org/10.1023/A:1009931317394