Predictive encoding in text compression

https://doi.org/10.1016/0306-4573(89)90003-4Get rights and content

Abstract

In predictive text compression the characters are encoded one by one on the basis of a few preceding characters. The usage of contextual knowledge makes the compression more effective than the plain coding of characters independently of their neighbors. In the simplest case we merely try to guess the next character, and the success/ failure is encoded. Generally, the preceding substring determines the probability distribution of the successor, providing a basis for encoding. In this article, three compression methods of increasing power are presented. Special attention is paid to the trade-off between compression gain and processing time. As for speed, hashing turns out to be an ideal technique for maintaining the prediction information. The best gain is achieved by applying the optimal arithmetic coding to the successor information, extracted from the dependencies between characters.

References (26)

  • J. Teuhola

    A compression method for clustered bit-vectors

    Information Processing Letters

    (1978)
  • C.E. Shannon

    Prediction and entropy of printed English

    Bell System Technical Journal

    (1951)
  • G. Ott

    Compact encoding of stationary Markov sources

    IEEE Transactions on Information Theory

    (1967)
  • J.H. Mommens et al.

    Coding for data compaction

  • J. Teuhola et al.

    Text compression using prediction

  • T. Raita et al.

    Predictive text compression by hashing

    ACM 1987 international conference on research and development in information retrieval

    (June 1987)
  • J.G. Cleary et al.

    Data compression using adaptive coding and partial string matching

    IEEE Transactions on Communications

    (1984)
  • G.V. Cormack et al.

    Data compression using dynamic Markov modelling

  • S. Guiasu

    Information theory with applications

    (1977)
  • A. Lempel et al.

    On the complexity of finite sequences

    IEEE Transactions on Information Theory

    (1976)
  • J. Ziv et al.

    A universal algorithm for sequential data compression

    IEEE Transactions on Information Theory

    (1977)
  • J.A. Storer

    Data compression: Methods and complexity issues

  • J. Rissanen et al.

    Universal modeling and coding

    IEEE Transactions on Information Theory

    (1981)
  • Cited by (2)

    An early version of this work was presented during the New Orleans ACM SIGIR meeting, June 3–5, 1987, and appeared as “Predictive text compression by hashing” on pages 223–233 in Proceedings of the Tenth Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, edited by C.T. Yu and C.J. van Rijsbergen. This final version was submitted March 31, 1988.

    View full text