Predictive encoding in text compression

doi:10.1016/0306-4573(89)90003-4

Information Processing & Management

Volume 25, Issue 2, 1989, Pages 151-160

https://doi.org/10.1016/0306-4573(89)90003-4 Get rights and content

Abstract

In predictive text compression the characters are encoded one by one on the basis of a few preceding characters. The usage of contextual knowledge makes the compression more effective than the plain coding of characters independently of their neighbors. In the simplest case we merely try to guess the next character, and the success/ failure is encoded. Generally, the preceding substring determines the probability distribution of the successor, providing a basis for encoding. In this article, three compression methods of increasing power are presented. Special attention is paid to the trade-off between compression gain and processing time. As for speed, hashing turns out to be an ideal technique for maintaining the prediction information. The best gain is achieved by applying the optimal arithmetic coding to the successor information, extracted from the dependencies between characters.

References (26)

J. Teuhola
A compression method for clustered bit-vectors
Information Processing Letters
(1978)
C.E. Shannon
Prediction and entropy of printed English
Bell System Technical Journal
(1951)
G. Ott
Compact encoding of stationary Markov sources
IEEE Transactions on Information Theory
(1967)
J.H. Mommens et al.
Coding for data compaction
J. Teuhola et al.
Text compression using prediction
T. Raita et al.
Predictive text compression by hashing
ACM 1987 international conference on research and development in information retrieval
(June 1987)
J.G. Cleary et al.
Data compression using adaptive coding and partial string matching
IEEE Transactions on Communications
(1984)
G.V. Cormack et al.
Data compression using dynamic Markov modelling
S. Guiasu
Information theory with applications
(1977)
A. Lempel et al.
On the complexity of finite sequences
IEEE Transactions on Information Theory
(1976)

J. Ziv et al.

A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

(1977)

J.A. Storer

Data compression: Methods and complexity issues

J. Rissanen et al.

Universal modeling and coding

IEEE Transactions on Information Theory

(1981)

Cited by (2)

Research and development of information retrieval models and their application
1989, Information Processing and Management
In recent years there has been a rapid growth in interest about information retrieval as new storage, processing, and communication technologies emerge that will aid individuals to directly search for and utilize helpful data, information, and knowledge. Since conventional retrieval systems have numerous limitations it is important to consider work by leading researchers in developing new models and in extending and applying already proposed experimental models. Much of that work has been first presented at 1 of the 11 conferences sponsored by the Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR). This article summarizes and introduces a series of 11 articles that had their origins in papers presented at the Tenth Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, June 3–5, 1987, New Orleans, Louisiana. Seven of those articles appear in this Special Issue; the remaining four and perhaps others derived from the 1987 conference papers will appear in subsequent issues of this journal. The articles and ideas introduced relate to: modeling data, information, and knowledge; Boolean logic; probability theory; artificial intelligence; organizing and encoding information and data; and characteristics of users of retrieval systems.
Condensed representation of sentences in graphic displays of text structures
1990, Journal of Documentation

^☆: An early version of this work was presented during the New Orleans ACM SIGIR meeting, June 3–5, 1987, and appeared as “Predictive text compression by hashing” on pages 223–233 in Proceedings of the Tenth Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, edited by C.T. Yu and C.J. van Rijsbergen. This final version was submitted March 31, 1988.

View full text

Predictive encoding in text compression☆

Abstract

Information Processing Letters

Prediction and entropy of printed English

Bell System Technical Journal

Compact encoding of stationary Markov sources

IEEE Transactions on Information Theory

Coding for data compaction

Text compression using prediction

Predictive text compression by hashing

ACM 1987 international conference on research and development in information retrieval

Data compression using adaptive coding and partial string matching

IEEE Transactions on Communications

Data compression using dynamic Markov modelling

Information theory with applications

On the complexity of finite sequences

IEEE Transactions on Information Theory

A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Data compression: Methods and complexity issues

Universal modeling and coding

IEEE Transactions on Information Theory