Abstract
A numerical coding method for saving the storage space for text information is described. Text may be coded at multiple levels—such as at the levels of characters, words, fields, or groups of fields—by repeatedly applying the same encoding procedure. At each subsequent level, the redundantly used common messages found at that level are replaced by a numeric code requiring less storage space. The efficiency of the method depends upon the homogeneity of the text used. The encoding and decoding procedures are described. The storage of the common messages is so arranged that the address of the message can be directly computed from the code, thus simplifying the decoding process. Some empirical data were collected, and significant savings in storage requirements were observed. The analysis of storage requirements is discussed.
Similar content being viewed by others
References
D. E. Kunth,The Art of Computer Programming. Vol. 3: Sorting and Searching (Addison-Wesley, Reading, Massachusetts, 1973).
W. P. Heising, “Note on random addressing techniques,”IBM Syst. J. 2(2) (June 1963).
IBM, “10,000 Division Code for Proper Names,” Technical Publications Department, IBM, New York (January 1970).
IBM, “Numerical Code for States, Counties and Cities of the United States,” Technical Publications Department, IBM, New York (October 1969).
M. Snyderman and B. Hunt, “The virtues of text compaction,”Datamation 16(12) (December 1970).
R. A. Wagner, “Common phrases and minimum space text storage,”Commun. ACM 16(3) (March 1973).
R. A. Wagner, “444-An algorithm for extracting phrases in a space optimal fashion,”Commun. ACM 16(3) (March 1973).
G. K. Zipf,Human Behavior and Principle of the Least Effort, an Introduction to Human Ecology (Addison-Wesley, Reading, Massachusetts, 1949).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ting, T.C. Compacting homogeneous text for minimizing storage space. International Journal of Computer and Information Sciences 6, 211–221 (1977). https://doi.org/10.1007/BF01002332
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01002332