Abstract
Compression of a formatted file by a minimal spanning tree (MST) is studied. Here the records of the file are considered as the nodes of a weighted undirected graph. Each record pair is connected in the graph and the corresponding arc is weighted by the sum of field lengths of those fields which differ in the two records. The actual compression is made by constructing an MST of the graph and by storing it in an economic way to preserve the information of the file. The length of the MST is a useful measure in the estimation of the power of the compression. In the paper we study upper bounds of this length, especially in the case where the field lengths of the different fields may vary. The upper bounds are derived by analyzing the so-called Gray-code sequences of the records. These sequences may be considered as spanning paths of the graph and their lengths give upper bounds of the length of the MST. In the study we show how a short spanning path can be constructed in this way. The results are also experimentally tested.
Similar content being viewed by others
References
J. Ernvall,On the construction of spanning paths by Gray-code, Univ. of Turku, Finland, manuscript 1980.
J. Ernvall and O. Nevalainen,Compact storage schemes for formatted files by spanning trees, BIT 19 (1979), pp. 463–475.
J. Ernvall and O. Nevalainen,Estimating the length of minimal spanning trees in compression of files, Report B32, Univ. of Turku, 1983.
E. Horowitz and S. Sahni,Fundamentals of Computer Algorithms, Pitman, 1978.
A. N. C. Kang, R. C. T. Lee, C. L. Chang and S. K. Chang,Storage reduction through minimal spanning trees and spanning forests, IEEE Trans. on Computers, Vol. C-26, No. 5 (1977).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ernvall, J., Nevalainen, O. Estimating the length of minimal spanning trees in compression of files. BIT 24, 19–32 (1984). https://doi.org/10.1007/BF01934512
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01934512
CR Categories and Subject Descriptions
- E.4. (Coding and Information Theory)
- Data Compaction and Compression
- H.3.2. (Information Storage and Retrieval)
- File Organization