Skip to main content

Server-Friendly Delta Compression for Efficient Web Access

  • Conference paper
Web Content Caching and Distribution

Abstract

A number of researchers have studied delta compression techniques for improving the efficiency of web page accesses over slow communication links. Most of these schemes exploit the fact that updated web pages often change only very slightly, thus resulting in very small sizes for the transmitted deltas. However, these schemes are only applicable to a minority of page accesses, and require web or proxy servers to retain potentially many different outdated versions of pages for use as reference files in the encoding. Another approach, studied by Chan and Woo [4], encodes a page with respect to similar files located on the same web server that are already in the client’s browser cache.

Based on the latter approach, we study different delta compression policies for web access. Our emphasis is on web and proxy server-friendly policies that do not require the maintenance of multiple older versions of a page, but only use reference files accessed by the client within the last few minutes. We compare several policies for identifying appropriate reference files and evaluate their performance on a set of traces. We show that there are very simple policies that achieve significant benefits over gzip compression on most web accesses, and that can be efficiently implemented at web or proxy servers. We also study the potential of file synchronization techniques such as rsync [28] for web access.

CIS Department, Polytechnic University

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Banga, F. Douglis, and M. Rabinovich. Optimistic deltas for WWW latency reduction. In USENIX Annual Technical Conference, pages 289–303, 1997.

    Google Scholar 

  2. A. Broder. On the resemblance and containment of documents. In Compression and Complexity of Sequences, pages 21–29. IEEE Computer Society, 1997.

    Google Scholar 

  3. A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. In Sixth Int. World Wide Web Conference, 1997.

    Google Scholar 

  4. M. Chan and T. Woo. Cache-based compaction: A new technique for optimizing web transfer. In Proc. of INFOCOM’99, 1999.

    Google Scholar 

  5. G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin. Communication complexity of document exchange. In Proc. of the ACM-SIAM Symp. on Discrete Algorithms, 2000.

    Google Scholar 

  6. L. Cox, C. Murray, and B. Noble. Pastiche: Making backup cheap and easy. In Proc. of the 5th Symp. on Operating System Design and Implementation, 2002.

    Google Scholar 

  7. M. Delco and M. Ionescu. xProxy: A transparent caching and delta transfer system for web objects. May 2000. unpublished manuscript.

    Google Scholar 

  8. F. Douglis and A. Iyengar. Application-specific delta-encoding via resemblance detection. In Proc. of the USENIX Annual Technical Conference, June 2003.

    Google Scholar 

  9. B. Housel and D. Lindquist. WebExpress: A system for optimizing web browsing in a wireless environment. In Proc. of the 2nd ACM Conf. on Mobile Computing and Net-working, pages 108–116, November 1996.

    Google Scholar 

  10. J. Hunt, K.-P. Vo, and W. Tichy. Delta algorithms: An empirical analysis. ACM Transactions on Software Engineering and Methodology, 7, 1998.

    Google Scholar 

  11. R. Karp and M. Rabin. Efficient randomized patternmatching algorithms. IBM Journal of Research and Development, 31(2):249–260, 1987.

    MathSciNet  Google Scholar 

  12. J. Kieffer and E. Yang. Grammar based codes: A new class of universal lossless source codes. IEEE Trans. on Information Theory, 46(3):737–754, 2000.

    Article  MathSciNet  Google Scholar 

  13. D. Korn and K.-P. Vo. Engineering a differencing and compression data format. In Proc. of the Usenix Annual Technical Conference, pages 219–228, 2002.

    Google Scholar 

  14. J. MacDonald. File system support for delta compression. MS Thesis, UC Berkeley, May 2000.

    Google Scholar 

  15. U. Manber and S. Wu. GLIMPSE: A tool to search through entire file systems. In Proc. of the 1994 Winter USENIX Conference, pages 23–32, January 1994.

    Google Scholar 

  16. J. Mogul, B. Krishnamurthy, F. Douglis, A. Feldmann, Y. Goland, A. van Hoff, and D. Hellerstein. Delta Encoding in HTTP. 2002. IETF RFC 3229.

    Google Scholar 

  17. J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta-encoding and data compression for HTTP. In Proc. of ACM SIGCOMM, 1997.

    Google Scholar 

  18. A. Muthitacharoen, B. Chen, and D. Mazìeres. A low-bandwidth network file system. In Proc. of the 18th ACM Symp. on Operating Systems Principles, pages 174–187, 2001.

    Google Scholar 

  19. A. Orlitsky and K. Viswanathan. Practical algorithms for interactive communication. In IEEE Int. Symp. on Information Theory, June 2001.

    Google Scholar 

  20. Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of a collection of files. In Third Int. Conf. on Web Information Systems Engineering, 2002.

    Google Scholar 

  21. S. Rhea, K. Liang, and E. Brewer. Value-based web caching. In Proc. of the 12th Int. World Wide Web Conference, May 2003.

    Google Scholar 

  22. M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa. A framework for the evaluation of session reconstruction heuristics in web usage analysis. INFORMS Journal on Computing, 15, 2003.

    Google Scholar 

  23. T. Suel and N. Memon. Algorithms for delta compression and remote file synchronization. In Lossless Compression Handbook. Academic Press, 2002.

    Google Scholar 

  24. T. Suel, P. Noel, and D. Trendafilov. Improved file synchronization techniques for maintaining large replicated collections over slow networks. In Proc. of the Int. Conf. on Data Engineering, March 2004.

    Google Scholar 

  25. D. Trendafilov, N. Memon, and T. Suel. zdelta: a simple delta compression tool. Technical Report TR-CIS-2002-02, Polytechnic University, June 2002.

    Google Scholar 

  26. A. Tridgell. Efficient Algorithms for Sorting and Synchronization. PhD thesis, Australian National University, April 2000.

    Google Scholar 

  27. A. Tridgell, P. Barker, and P. MacKerras. rsync in http. In Conference of Australian Linux Users, 1999.

    Google Scholar 

  28. A. Tridgell and P. MacKerras. The rsync algorithm. Technical Report TR-CS-96-05, Australian National University, June 1996.

    Google Scholar 

  29. S. Williams, M. Abrams, C. Standridge, G. Abdulla, and E. Fox. Removal policies in network caches for World-Wide Web documents. In Proc. of ACM SIGCOMM, 1996.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Kluwer Academic Publishers

About this paper

Cite this paper

Savant, A., Suel, T. (2004). Server-Friendly Delta Compression for Efficient Web Access. In: Douglis, F., Davison, B.D. (eds) Web Content Caching and Distribution. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2258-1_22

Download citation

  • DOI: https://doi.org/10.1007/1-4020-2258-1_22

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-2257-9

  • Online ISBN: 978-1-4020-2258-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics