Abstract
A number of researchers have studied delta compression techniques for improving the efficiency of web page accesses over slow communication links. Most of these schemes exploit the fact that updated web pages often change only very slightly, thus resulting in very small sizes for the transmitted deltas. However, these schemes are only applicable to a minority of page accesses, and require web or proxy servers to retain potentially many different outdated versions of pages for use as reference files in the encoding. Another approach, studied by Chan and Woo [4], encodes a page with respect to similar files located on the same web server that are already in the client’s browser cache.
Based on the latter approach, we study different delta compression policies for web access. Our emphasis is on web and proxy server-friendly policies that do not require the maintenance of multiple older versions of a page, but only use reference files accessed by the client within the last few minutes. We compare several policies for identifying appropriate reference files and evaluate their performance on a set of traces. We show that there are very simple policies that achieve significant benefits over gzip compression on most web accesses, and that can be efficiently implemented at web or proxy servers. We also study the potential of file synchronization techniques such as rsync [28] for web access.
CIS Department, Polytechnic University
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Banga, F. Douglis, and M. Rabinovich. Optimistic deltas for WWW latency reduction. In USENIX Annual Technical Conference, pages 289–303, 1997.
A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, pages 21–29. IEEE Computer Society, 1997.
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. In Sixth Int. World Wide Web Conference, 1997.
M. Chan and T. Woo. Cache-based compaction: A new technique for optimizing web transfer. In Proc. of INFOCOM’99, 1999.
G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin. Communication complexity of document exchange. In Proc. of the ACM-SIAM Symp. on Discrete Algorithms, 2000.
L. Cox, C. Murray, and B. Noble. Pastiche: Making backup cheap and easy. In Proc. of the 5th Symp. on Operating System Design and Implementation, 2002.
M. Delco and M. Ionescu. xProxy: A transparent caching and delta transfer system for web objects. May 2000. unpublished manuscript.
F. Douglis and A. Iyengar. Application-specific delta-encoding via resemblance detection. In Proc. of the USENIX Annual Technical Conference, June 2003.
B. Housel and D. Lindquist. WebExpress: A system for optimizing web browsing in a wireless environment. In Proc. of the 2nd ACM Conf. on Mobile Computing and Net-working, pages 108–116, November 1996.
J. Hunt, K.-P. Vo, and W. Tichy. Delta algorithms: An empirical analysis. ACM Transactions on Software Engineering and Methodology, 7, 1998.
R. Karp and M. Rabin. Efficient randomized patternmatching algorithms. IBM Journal of Research and Development, 31(2):249–260, 1987.
J. Kieffer and E. Yang. Grammar based codes: A new class of universal lossless source codes. IEEE Trans. on Information Theory, 46(3):737–754, 2000.
D. Korn and K.-P. Vo. Engineering a differencing and compression data format. In Proc. of the Usenix Annual Technical Conference, pages 219–228, 2002.
J. MacDonald. File system support for delta compression. MS Thesis, UC Berkeley, May 2000.
U. Manber and S. Wu. GLIMPSE: A tool to search through entire file systems. In Proc. of the 1994 Winter USENIX Conference, pages 23–32, January 1994.
J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, and D. Hellerstein. Delta Encoding in HTTP. 2002. IETF RFC 3229.
J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta-encoding and data compression for HTTP. In Proc. of ACM SIGCOMM, 1997.
A. Muthitacharoen, B. Chen, and D. Mazìeres. A low-bandwidth network file system. In Proc. of the 18th ACM Symp. on Operating Systems Principles, pages 174–187, 2001.
A. Orlitsky and K. Viswanathan. Practical algorithms for interactive communication. In IEEE Int. Symp. on Information Theory, June 2001.
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. In Third Int. Conf. on Web Information Systems Engineering, 2002.
S. Rhea, K. Liang, and E. Brewer. Value-based web caching. In Proc. of the 12th Int. World Wide Web Conference, May 2003.
M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa. A framework for the evaluation of session reconstruction heuristics in web usage analysis. INFORMS Journal on Computing, 15, 2003.
T. Suel and N. Memon. Algorithms for delta compression and remote file synchronization. In Lossless Compression Handbook. Academic Press, 2002.
T. Suel, P. Noel, and D. Trendafilov. Improved file synchronization techniques for maintaining large replicated collections over slow networks. In Proc. of the Int. Conf. on Data Engineering, March 2004.
D. Trendafilov, N. Memon, and T. Suel. zdelta: a simple delta compression tool. Technical Report TR-CIS-2002-02, Polytechnic University, June 2002.
A. Tridgell. Efficient Algorithms for Sorting and Synchronization. PhD thesis, Australian National University, April 2000.
A. Tridgell, P. Barker, and P. MacKerras. rsync in http. In Conference of Australian Linux Users, 1999.
A. Tridgell and P. MacKerras. The rsync algorithm. Technical Report TR-CS-96-05, Australian National University, June 1996.
S. Williams, M. Abrams, C. Standridge, G. Abdulla, and E. Fox. Removal policies in network caches for World-Wide Web documents. In Proc. of ACM SIGCOMM, 1996.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Kluwer Academic Publishers
About this paper
Cite this paper
Savant, A., Suel, T. (2004). Server-Friendly Delta Compression for Efficient Web Access. In: Douglis, F., Davison, B.D. (eds) Web Content Caching and Distribution. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2258-1_22
Download citation
DOI: https://doi.org/10.1007/1-4020-2258-1_22
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2257-9
Online ISBN: 978-1-4020-2258-6
eBook Packages: Springer Book Archive