Skip to main content

Caching for Web Searching

  • Conference paper
  • First Online:
Algorithm Theory - SWAT 2000 (SWAT 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1851))

Included in the following conference series:

  • 994 Accesses

Abstract p]We study web caching when the input sequence is a depth first search traversal of some tree. There are at least two good motivations for investigating tree traversal as a search technique on the WWW: First, empirical studies of people browsing and searching the WWW have shown that user access patterns commonly are nearly depth first traversals of some tree. Secondly, (as we will show in this paper) the problem of visiting all the pages on some WWW site using anchor clicks (clicks on links) and back button clicks — by far the two most common user actions — reduces to the problem of how to best cache a tree traversal sequence (up to constant factors).

We show that for tree traversal sequences the optimal offline strategy can be computed efficiently. In the bit model, where the access time of a page is proportional to its size, we show that the online algorithm LRU is (1 + 1/∈)-competitive against an adversary with unbounded cache as long as LRU has a cache of size at least (1 + ∈) times the size of the largest item in the input sequence. In the general model, where pages have arbitrary access times and sizes, we show that in order to be constant competitive, any online algorithm needs a cache large enough to store Ω (log n) pages; here n is the number of distinct pages in the input sequence. We provide a matching upper bound by showing that the online algorithm Landlord is constant competitive against an adversary with an unbounded cache if Landlord has a cache large enough to store the Ω(log n) largest pages. This is further theoretical evidence that Landlord is the “ght” algorithm for web caching.

Supported in part by NSF Grant CCR-9734927 and by ASOSR grant F49620010011.

Supported by the START program Y43-MAT of the Austrian Ministry of Science.

Supported in part by NSF Grant CCR-9734927 and by ASOSR grant F49620010011.

Supported by the START program Y43-MAT of the Austrian Ministry of Science.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Albers, S. Arora, and S. Khanna, “Page replacement for general caching problems”, ACM/SIAM Symposium on Discrete Algorithms, 31–40, 1999.

    Google Scholar 

  2. A. Borodin, S. Irani, P. Raghavan, and B. Schieber, “Competitive paging with locality of reference”, Journal of Computer and System Sciences 50, 244–258, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  3. P. Cao and S. Irani, “Cost-aware WWW proxy caching algorithms”, USENIX Symposium on Internet Technologies and Systems, 193–206, 1997.

    Google Scholar 

  4. L. Catledge and J. Pitkow, “Characterizing browsing strategies in the world wide web”, Computer Networks and ISDN Systems 27, 1065–1073, 1995.

    Article  Google Scholar 

  5. E. Cohen, and H. Kaplan, “Caching documents with variable sizes and fetching costs: an LP based approach”, ACM/SIAM Symposium on Discrete Algorithms, S879–S880, 1999.

    Google Scholar 

  6. R.E. Ladner, J.D. Fix, and A. LaMarca, “Cache performance analysis of traversal and random accesses”, ACM/SIAM Symposium on Discrete Algorithms, 613–622, 1999.

    Google Scholar 

  7. B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose, “Strong regularities in world wide web surfing”, Science 280, 95–97, 1998.

    Article  Google Scholar 

  8. S. Irani, “Page replacement with multi-size pages and applications to web caching”, ACM Symposium on Theory of Computing, 701–710, 1997.

    Google Scholar 

  9. B. Jiang, “DFS-traversing graphs in a paging environment, LRU or MRU”, Information Processing Letters 40, 193–196, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  10. B. Kalyanasundaram and K. Pruhs, “Constructing competitive tours from local information”, Theoretical Computer Science 130, 125–138, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  11. A. Karlin, S. Phillips, and P. Raghavan, ”Markov paging”, IEEE Symposium on Foundations of Computer Science, 208–217, 1992.

    Google Scholar 

  12. L. Tauscher and S. Greenberg, “How people revisit web pages: empirical findings and implications for the design of history systems”, International Journal of Human-Computer Studies 47, 97–137, 1997.

    Article  Google Scholar 

  13. N. Young, “The k-server dual and loose competitiveness”, Algorithmica 11, 525–541, 1994.

    Article  MathSciNet  Google Scholar 

  14. N. Young, “On-line file caching”, ACM/SIAM Symposium on Discrete Algorithms, 82–86, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kalyanasundaram, B., Noga, J., Pruhs, K., Woeginger†, G. (2000). Caching for Web Searching. In: Algorithm Theory - SWAT 2000. SWAT 2000. Lecture Notes in Computer Science, vol 1851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44985-X_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-44985-X_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67690-4

  • Online ISBN: 978-3-540-44985-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics