Elsevier

Discrete Applied Mathematics

Volume 274, 15 March 2020, Pages 130-140
Discrete Applied Mathematics

Direct merging of delta encoded files

https://doi.org/10.1016/j.dam.2018.07.011Get rights and content
Under an Elsevier user license
open archive

Abstract

Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms.

Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R).

The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.

Keywords

Data compression
Delta encoding
Lempel–Ziv 1977 encoding

Cited by (0)

This is an extended version of a paper that was presented at the Data Compression Conference DCC’09, Snowbird, Utah (2009) and appeared in its Proceedings, pp. 203–212.