Abstract
Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters of several delta algorithms, using a benchmark of over 1300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files; one of them is even faster than diff in both compression and decompression speed.
Preview
Unable to display preview. Download preview PDF.
References
James A. Gosling. A redisplay algorithm. In Proc. of the ACM SIGPLAN/SIGOA Symposium on Text Manipulation, pages 123–129, 1981.
James W. Hunt and M.D. McIllroy. An algorithm for differential file comparison. Technical Report Computing Science Technical Report 41, Bell Laboratories, June 1976.
James W. Hunt and Thomas G. Szymanski. A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350–353, May 1977.
Douglas W. Jones. Application of splay trees to data compression. Communications of the ACM, 31(8):996–1007, August 1988.
David G. Korn and Kiem-Phong Vo. Vdelta: Efficient data differencing and compression. In preparation, 1995.
E. M. McCreight. A space economical suffix tree construction algorithm. Journal of the ACM, 32:262–272, 1976.
Webb Miller and Eugene W. Meyers. A file comparison program. Software—Practice and Experience, 15(11):1025–1039, November 1985.
Narao Nakatsu, Yahiko Kambayashi, and Shuzo Yajima. A longest common subsequence algorithm for similar text strings. Acta Informatica, 18:171–179, 1982.
Wolfgang Obst. Delta technique and string-to-string correction. In Proc. of the First European Software Engineering Conference, pages 69–73. AFCET, Springer Verlag, September 1987.
Marc J. Rochkind. The source code control system. IEEE Transactions on Software Engineering, SE-1(4):364–370, December 1975.
Walter F. Tichy. The string-to-string correction problem with block moves. ACM Transactions on Computer Systems, 2(4):309–321, November 1984.
Walter F. Tichy. RCS — a system for version control. Software—Practice and Experience, 15(7):637–654, July 1985.
Kiem-Phong Vo. A prefix matching algorithm suitable for data compression. In preparation, 1995.
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. on Information Theory, IT-24(5):5306, September 1978.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hunt, J.J., Vo, K.P., Tichy, W.F. (1996). An empirical study of delta algorithms. In: Sommerville, I. (eds) Software Configuration Management. SCM 1996. Lecture Notes in Computer Science, vol 1167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0023080
Download citation
DOI: https://doi.org/10.1007/BFb0023080
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61964-2
Online ISBN: 978-3-540-49569-7
eBook Packages: Springer Book Archive