Skip to main content
Log in

Online File Caching with Rejection Penalties

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In the file caching problem, the input is a sequence of requests for files out of a slow memory. A file has two attributes, a positive retrieval cost and an integer size. An algorithm is required to maintain a cache of size k such that the total size of files stored in the cache never exceeds k. Given a request for a file that is not present in the cache at the time of request, the file must be brought from the slow memory into the cache, possibly evicting other files from the cache. This incurs a cost equal to the retrieval cost of the requested file. Well-known special cases include paging (all costs and sizes are equal to 1), the cost model, which is also known as weighted paging, (all sizes are equal to 1), the fault model (all costs are equal to 1), and the bit model (the cost of a file is equal to its size). If bypassing is allowed, a miss for a file still results in an access to this file in the slow memory, but its subsequent insertion into the cache is optional.

We study a new online variant of caching, called caching with rejection. In this variant, each request for a file has a rejection penalty associated with the request. The penalty of a request is given to the algorithm together with the request. When a file that is not present in the cache is requested, the algorithm must either bring the file into the cache, paying the retrieval cost of the file, or reject the file, paying the rejection penalty of the request. The objective function is the sum of total rejection penalty and the total retrieval cost. This problem generalizes both caching and caching with bypassing.

We design deterministic and randomized algorithms for this problem. The competitive ratio of the randomized algorithm is O(logk), and this is optimal up to a constant factor. In the deterministic case, a k-competitive algorithm for caching, and a (k+1)-competitive algorithm for caching with bypassing are known. Moreover, these are the best possible competitive ratios. In contrast, we present a lower bound of 2k+1 on the competitive ratio of any deterministic algorithm for the variant with rejection. The lower bound is valid already for paging. We design a (2k+2)-competitive algorithm for caching with rejection. We also design a different (2k+1)-competitive algorithm, that can be used for paging and for caching in the bit and fault models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. It is possible to relax this condition and allow non-negative costs instead. This does not change the results as explained in the Appendix, see Claim 14.

  2. Requests of zero rejection penalty can obviously be always rejected without incurring any cost, and thus we assume that no such requests exist.

  3. Clearly, all upper bounds hold also for the case that every file has a fixed rejection penalty. The lower bound that we will show in this paper has uniform rejection penalties for all requests, and thus it will be valid for this model too. It can be seen that lower bounds following from previous work (since caching with rejection generalizes caching and caching with bypassing) also apply to the model where the rejection penalty is a property of a file.

References

  1. Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theor. Comput. Sci. 234(1–2), 203–218 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Adamaszek, A., Czumaj, A., Englert, M., Räcke, H.: An O(logk)-competitive algorithm for generalized caching. In: Proc. of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA2012), pp. 1681–1689 (2012)

    Chapter  Google Scholar 

  3. Albers, S.: New results on web caching with request reordering. Algorithmica 58(2), 461–477 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  4. Albers, S., Arora, S., Khanna, S.: Page replacement for general caching problems. In: Proc. of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA1999), pp. 31–40 (1999)

    Google Scholar 

  5. Bansal, N., Buchbinder, N., Naor, J.: Towards the randomized k-server conjecture: a primal-dual approach. In: Proc. of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA2010), pp. 40–55 (2010)

    Chapter  Google Scholar 

  6. Bansal, N., Buchbinder, N., Naor, S.: A primal-dual randomized algorithm for weighted paging. J. ACM 59(4), 19 (2012), 24 pp.

    MathSciNet  Google Scholar 

  7. Bansal, N., Buchbinder, N., Naor, S.: Randomized competitive algorithms for generalized caching. SIAM J. Comput. 41(2), 391–414 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bar-Noy, A., Bar-Yehuda, R., Freund, A., Naor, J., Schieber, B.: A unified approach to approximating resource allocation and scheduling. J. ACM 48(5), 1069–1090 (2001)

    Article  MathSciNet  Google Scholar 

  9. Bartal, Y., Leonardi, S., Marchetti-Spaccamela, A., Sgall, J., Stougie, L.: Multiprocessor scheduling with rejection. SIAM J. Discrete Math. 13(1), 64–78 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  10. Bein, W.W., Larmore, L.L., Noga, J.: Equitable revisited. In: Proc. of the 15th Annual European Symposium on Algorithms (ESA2007), pp. 419–426 (2007)

    Google Scholar 

  11. Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal 5(2), 78–101 (1966)

    Article  Google Scholar 

  12. Blum, A., Burch, C., Kalai, A.: Finely-competitive paging. In: Proc. of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS1999), pp. 450–458 (1999)

    Google Scholar 

  13. Brehob, M., Enbody, R.J., Torng, E., Wagner, S.: On-line restricted caching. J. Sched. 6(2), 149–166 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  14. Brodal, G.S., Moruz, G., Negoescu, A.: OnlineMin: a fast strongly competitive randomized paging algorithm. In: Proc. of the 9th International Workshop on Approximation and Online Algorithms (WAOA2011), pp. 164–175 (2011)

    Google Scholar 

  15. Cao, P., Irani, S.: Cost-aware www proxy caching algorithms. In: Proc. of the USENIX Symposium on Internet Technologies and Systems, pp. 193–206 (1997)

    Google Scholar 

  16. Chrobak, M., Karloff, H.J., Payne, T.H., Vishwanathan, S.: New results on server problems. SIAM J. Discrete Math. 4(2), 172–181 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  17. Chrobak, M., Noga, J.: Competitive algorithms for relaxed list update and multilevel caching. J. Algorithms 34(2), 282–308 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  18. Chrobak, M., Woeginger, G.J., Makino, K., Xu, H.: Caching is hard—even in the fault model. Algorithmica 63(4), 781–794 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  19. Cohen, E., Kaplan, H.: Caching documents with variable sizes and fetching costs: an LP-based approach. Algorithmica 32(3), 459–466 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  20. Dósa, G., He, Y.: Bin packing problems with rejection penalties and their dual problems. Inf. Comput. 204(5), 795–815 (2006)

    Article  MATH  Google Scholar 

  21. Epstein, L.: Bin packing with rejection revisited. Algorithmica 56(4), 505–528 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  22. Epstein, L., Levin, A., Woeginger, G.J.: Graph coloring with rejection. J. Comput. Syst. Sci. 77(2), 439–447 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  23. Epstein, L., Noga, J., Woeginger, G.J.: On-line scheduling of unit time jobs with rejection: minimizing the total completion time. Oper. Res. Lett. 30(6), 415–420 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  24. Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.: Competitive paging algorithms. J. Algorithms 12(4), 685–699 (1991)

    Article  MATH  Google Scholar 

  25. Goemans, M.X., Williamson, D.P.: A general approximation technique for constrained forest problems. SIAM J. Comput. 24(2), 296–317 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  26. Irani, S.: Page replacement with multi-size pages and applications to web caching. Algorithmica 33(3), 384–409 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  27. Karlin, A., Manasse, M., Rudolph, L., Sleator, D.D.: Competitive snoopy caching. Algorithmica 3(1–4), 79–119 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  28. Karp, R.M.: On-line algorithms versus off-line algorithms: how much is it worth to know the future? In: van Leeuwen, J. (ed.) IFIP Congress (1). IFIP Transactions, vol. A-12, pp. 416–429. North-Holland, Amsterdam (1992)

    Google Scholar 

  29. Koufogiannakis, C., Young, N.E.: Greedy Δ-approximation algorithm for covering with arbitrary constraints and submodular cost. Algorithmica 66(1), 113–152 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  30. Manasse, M.S., McGeoch, L.A., Sleator, D.D.: Competitive algorithms for server problems. J. Algorithms 11(2), 208–230 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  31. McGeoch, L.A., Sleator, D.D.: A strongly competitive randomized paging algorithm. Algorithmica 6(6), 816–825 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  32. Mendel, M., Seiden, S.S.: Online companion caching. Theor. Comput. Sci. 324(2–3), 183–200 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  33. Nagy-György, J., Imreh, C.: Online scheduling with machine cost and rejection. Discrete Appl. Math. 155(18), 2546–2554 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  34. Seiden, S.S.: Preemptive multiprocessor scheduling with rejection. Theor. Comput. Sci. 262(1), 437–458 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  35. Sleator, D.D., Tarjan, R.E.: Amoritzed efficiency of list update and paging rules. Commun. ACM 28(2), 202–208 (1985)

    Article  MathSciNet  Google Scholar 

  36. Young, N.E.: The k-server dual and loose competitiveness for paging. Algorithmica 11(6), 525–541 (1994)

    Article  MathSciNet  Google Scholar 

  37. Young, N.E.: On-line file caching. Algorithmica 33(3), 371–383 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank an anonymous reviewer who suggested the reduction presented in Theorem 6 below, and thus allowed us to simplify the randomized result significantly, making the paper much more elegant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leah Epstein.

Additional information

An extended abstract of this paper appears in Proc. of ICALP 2011 as “On variants of file caching”. This research was partially supported by the TÁMOP-4.2.2/08/1/2008-0008 program of the Hungarian National Development Agency.

C. Imreh Supported by the Bolyai Scholarship of the Hungarian Academy of Sciences.

Appendix: Some Properties and a Randomized Algorithm

Appendix: Some Properties and a Randomized Algorithm

We will use a reduction between caching with bypassing and caching to obtain a randomized algorithm. In this reduction, given an input for caching with bypassing and a cache of size k, the cache size for the resulting instance is different, and moreover, the instance is modified. We briefly discuss the relation between the models with and without bypassing for a fixed input. Thus, we show that the most straightforward reduction between the case with and without bypassing for the bit model and the fault model fails, while for paging the difference between the optimal costs is much smaller. Moreover, in the case of weighted paging, (and therefore, also for caching), the ratio between the optimal costs of the two variants is unbounded.

Proposition 1

For paging, \(\sup_{I} \frac{\mbox {\textsc {opt}}_{s}(I)}{\mbox {\textsc {opt}}_{b}(I)}=2\). In the bit model and in the fault model, \(\sup_{I} \frac{\mbox {\textsc {opt}}_{s}(I)}{\mbox {\textsc {opt}}_{b}(I)}=k+1\). For weighted paging and file caching \(\frac{\mbox {\textsc {opt}}_{s}(I)}{\mbox {\textsc {opt}}_{b}(I)}\) can be arbitrarily large.

Proof

We first show the upper bounds (the upper bound for paging is folklore and was previously mentioned in [5]). Consider an optimal offline algorithm opt b with bypassing. We convert it into an algorithm off s that never uses bypassing. Each time that opt b bypasses a file f, off s inserts this file into the cache, temporarily evicting arbitrary files until there is sufficient room for it, and then it removes the file f again, inserting the temporarily evicted files back into the cache. For paging, there is at most one evicted page for every request page. The cost incurred by off s for one request page is at most 2, i.e., at most twice the cost of opt b for a page. In the bit model, the total size of evicted files is at most k, so the cost of off s for f is at most \(\frac{\operatorname{size}(f)+k}{\operatorname{size}(f)} \leq k+1\) times the cost of opt b for it. Similarly, in the fault model, at most k files are evicted, so the cost of off s for f is at most k+1 while the cost of opt b for it is 1.

Next, we show the lower bounds. For the bit model, consider an input that repeats M times a sequence of two files f and g, where \(\operatorname{size}(f)=k\), and \(\operatorname{size}(g)=1\). We get opt b =k+M, by keeping f in the cache and bypassing g each time it is requested, while opt s =(k+1)M, since a miss occurs on every file.

For the fault model, consider an input that repeats M times a sequence of k+1 files: f i , for 1≤ik, and g, where \(\operatorname{size}(f_{i})=1\), and \(\operatorname{size}(g)=k\). We get opt b =k+M, by keeping the k files f i for 1≤ik in the cache and bypassing g each time it is requested, while opt s =(k+1)M, since a miss occurs on every file.

In the case of weighted paging, consider an input that repeats M times a sequence of k+1 files f i , for 1≤ik, and g, where \(\operatorname{cost}(f_{i})=N\), and \(\operatorname{cost}(g)=1\). We get opt b =Nk+M, by keeping the first k files in the cache and bypassing g each time it is requested, while opt s Nk+(M−1)N, since every subsequence of the form g,f 1,f 2,…,f k results in at least one miss on a file of cost N, since at least one of the last k files in the subsequence is not in the cache after g is requested.

In the last case of paging, consider a sequence that consists of M phases, where in phase j, there are k+1 requests for pages 1,2,…,k,k+j. An offline algorithm with bypassing keeps the pages 1,2,…,k in the cache, and bypasses all requests for other pages, that is for any page k+j such that 1≤jM. The cost of this algorithm is k+M. As for an optimal solution without bypassing, we use the optimal offline algorithm of Belady [11] that on a miss, evicts the page that will be requested again furthest in the future. Each time that a page k+j for some j≥1 is requested, the cache would contain pages 1,2,…,k, so page k would be evicted. Then when page k is requested, since page k+j would never be requested again, it is evicted. Therefore, for each phase, the cost of the algorithm is at least 2 (in the first phase this holds for a different reason, since the cache is initially empty). We conclude that the total cost of an optimal solution is at least 2M. This gives a ratio of 2 between the optimal costs. □

Note that the bounds above hold for arbitrarily large values of opt b .

Next, we discuss some relations between the different models. The next claim shows that caching with rejection generalizes both caching and caching with bypassing.

Claim 13

For any \({\mathcal {R}}\)-competitive algorithm alg for caching with rejection, there exists an \({\mathcal {R}}\)-competitive algorithm \({\mathcal {G}}\)(alg) for caching, and an \({\mathcal {R}}\)-competitive algorithm \({\mathcal {G}}'\)(alg) for caching with bypassing. If alg is deterministic, then so are \({\mathcal {G}}\)(alg) and \({\mathcal {G}}'\)(alg).

Proof

First, we show the existence of \({\mathcal {G}}'\). Given an input for caching with bypassing, we define an input for caching with rejection that contains the same requests, and the rejection penalty of a request is its cost. The resulting caching problem with rejection is equivalent to the original caching problem with bypassing, and thus the algorithm \({\mathcal {G}}'\)(alg) only needs to apply this transformation on the input while running alg.

Let λ denote the total cost of all files in the slow memory. Given an input I for caching, the input I′ is defined by the same sequence of file requests, where every request has a rejection penalty of 2λ. This transformation can be computed online. For an algorithm alg, \({\mathcal {G}}\)(alg) applied on I maintains the same cache contents as alg applied on I′ (for every realization of the random bits). Moreover, \({\mathcal {G}}\)(alg) acts exactly as alg does in cases that alg does not reject the request. In a case that alg rejects a file f, \({\mathcal {G}}\)(alg) empties the cache, inserts f, removes it, and re-inserts the previous contents of the cache. The cost for serving such a request is less than 2λ. Thus, the cost of \({\mathcal {G}}(\mbox {\textsc {alg}})\) for I (for every realization of the random bits) is no larger than the cost of alg for I′. Next, we show opt(I)=opt(I′), where opt(I) is an optimal solution for the caching problem (and the input I), and opt(I′) is an optimal solution for caching with rejection (and the input I′). Obviously, opt(I′)≤opt(I), as any solution for I defines a solution for I′ (it never rejects any file, and performs the same sequence of actions). We can also define a solution for I that is based on opt(I′), where this solution is simply \({\mathcal {G}}(\mbox {\textsc {opt}}(I'))\). Thus, opt(I)≤opt(I′). □

Next, we prove that our assumption that all file costs are positive is not restrictive.

Claim 14

For any \({\mathcal {R}}\)-competitive algorithm alg for caching (caching with rejection) that does not allow zero costs, there exists an \({\mathcal {R}}\)-competitive algorithm \({\mathcal {H}}\)(alg) for caching (caching with rejection, respectively) that allows zero costs. If alg is deterministic, then so is \({\mathcal {H}}\)(alg).

Proof

Given an input I, possibly with zero file costs, we define an input \(\tilde{I}\), where no file of cost zero is requested more than once. For that, for every file f of cost zero, we create copies f 1,f 2,…, all of cost zero. The process of creating \(\tilde{I}\) from I (online) is as follows. If the next request of I is a file with a positive cost, then the next request of \(\tilde{I}\) is the same as in I (including the rejection penalty, if I is an input for caching with rejection). Otherwise, if the next request of I is a zero cost file g, and this is the ith request for g in I, then the next request of \(\tilde{I}\) is g i (and if I is an input for caching with rejection, then it has the same rejection penalty as the current request for g in I).

We prove \(\mbox {\textsc {opt}}(I)=\mbox {\textsc {opt}}(\tilde{I})\). Given an optimal solution for \(\tilde{I}\), this solution immediately implies a solution for I, where any action on a file f i of \(\tilde{I}\) is applied on f for I. Given an optimal solution for I, the solution is adapted such that the modified solution will always have the same cache contents as opt(I), where for each zero cost file it has some instance of this file. Every request of positive cost is served as before (it is either in the cache or not in the cache of both solutions before it is served). Given a request f i of zero cost, if opt(I) does nothing, that is, it already has f in the cache, then the modified solution must have f j in the cache for some j<i, and it replaces f j with f i . Otherwise, the modified solution acts as the original solution (if it inserts f into the cache, then the modified solution inserts f i ). Thus, the required property of the cache contents is maintained. As f i has cost zero, the cost of the modified solution equals opt(I).

Next, we modify \(\tilde{I}\) into an input I′ without zero costs as follows. Given a request for a file f j with zero cost, such that this request is the th request in the input, its cost is defined to be \(\frac{1}{2^{\ell}}\) (the other properties of \(\tilde{I}\) are unchanged).

We define \({\mathcal {H}}(\mbox {\textsc {alg}})\) for I based on the action of alg. Specifically, it constructs I′ in an online fashion, and maintains the same cache contents as alg applied on I′ (for every realization of the random bits), where in cases that alg has a file f i in the cache, \({\mathcal {H}}(\mbox {\textsc {alg}})\) simply has f. We have \({\mathcal {H}}(\mbox {\textsc {alg}})(I) \leq \mbox {\textsc {alg}}(I')\), as applying all the actions of alg on I′ results in serving all the requests of I, and the cost of a file f i in I is no larger than the costs of f in I′.

Consider now an optimal solution for \(\tilde{I}\). Using it as a solution for I′, we obtain a feasible solution whose cost is larger by at most the total increase of all file costs in I′ compared to \(\tilde{I}\), that is by at most 1, and we find \(\mbox {\textsc {opt}}(I') \leq \mbox {\textsc {opt}}(\tilde{I})+1\).

We have (for a given constant C) \({\mathcal {H}}(\mbox {\textsc {alg}})(I) \leq \mbox {\textsc {alg}}(I') \leq {\mathcal {R}}\cdot \mbox {\textsc {opt}}(I')+C \leq {\mathcal {R}}\cdot \mbox {\textsc {opt}}(\tilde{I})+{\mathcal {R}}+C={\mathcal {R}}\cdot \mbox {\textsc {opt}}(I)+{\mathcal {R}}+C\). □

Finally, we show a reduction that allows us to obtain a randomized algorithm.

Theorem 6

For any \({\mathcal {R}}\)-competitive algorithm alg for caching with cache size 2k, there exists an \({\mathcal {R}}\)-competitive algorithm Ψ(alg) for caching with bypassing with cache size k. If alg is deterministic, then so is Ψ(alg).

Proof

By Claim 14, we assume that alg acts on inputs that may contain files of cost zero.

The algorithm Ψ(alg) applies a modification on the input I while running alg on the modified input I′. Given the jth request of I for a file f, insert a request for f followed by a request for a new file x j of size k and cost zero. Such requests are called additional requests. After alg deals with x j , it must have x j in the cache (for every realization of the random bits), and the total size of other files in its cache is at most k. The state of its cache is defined to be the set of files that it has in the cache, excluding x j . To deal with this request for f, Ψ(alg) moves to the cache state of alg after it has dealt with x j (this is done for every realization of the random bits), and Ψ(alg) bypasses f if it is not in the cache. It is left to show that the two algorithms have the same cost (for their respective inputs). For every request f of I, the contents of the cache of Ψ(alg) and the state of the cache of alg (after the additional request) is the same (for every realization of the random bits). Thus, their cost for f must be the same. Since the cost of alg on the additional requests is zero, the total costs are the same as well.

Moreover, we can show that the costs of optimal solutions (an optimal solution for caching with bypassing, the input I, and a cache of size k, and an optimal solution for caching, the augmented input, and a cache of size 2k) are equal. By the same arguments Ψ(opt)(I)=opt(I′), and so opt(I)≤opt(I′). Consider an optimal solution for I, and create a solution for I′ (and a cache of size 2k) as follows. The first k slots of the cache are called reserved and are never used for the additional requests. The other k slots are called additional slots, and they are used for all the additional requests. The solution for I′ imitates the solution for I, and always has the files of the solution for I in the reserved slots. Whenever the solution for I bypasses a file, the solution for I′ inserts it into the additional slots (the number of additional slots may be larger than the size of the file). Thus, after an additional request, the caches have the same files (neglecting an additional request that occupies all the additional slots of the larger cache). We described a solution for I′ of cost opt(I), and thus opt(I′)≤opt(I). □

Corollary 1

There is a randomized O(logk)-competitive algorithm for caching with rejection.

Proof

Given the randomized O(logk)-competitive algorithm of [2] for caching and the reduction of Theorem 6, we find that there exists a randomized O(log2k)=O(logk)-competitive algorithm for caching with bypassing. Using the first reduction, Theorem 2, we find that there exists a randomized O(logk)-competitive algorithm for caching with rejection. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Epstein, L., Imreh, C., Levin, A. et al. Online File Caching with Rejection Penalties. Algorithmica 71, 279–306 (2015). https://doi.org/10.1007/s00453-013-9793-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9793-0

Keywords

Navigation