Skip to main content
Log in

Queueing with redundant requests: exact analysis

Queueing Systems Aims and scope Submit manuscript

Abstract

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However, there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution of the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the “gain” to redundant classes and “pain” to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple exponential distribution and that of the non-redundant class follows a generalized hyperexponential. Second, fully redundant classes are “immune” to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and join-the-shortest-queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This is counterintuitive because, as we will see in Lemma 3, the distribution of response time for class R does not depend on whether class A is redundant or non-redundant.

  2. A generalized hyperexponential \(H_2(\nu _1,\nu _2,\omega )\) is defined as the weighted mixture of two exponentials with rates \(\nu _1\) and \(\nu _2\), where the first exponential is given weight \(1+\omega \) and the second is given weight \(-\omega \). Note that \(\omega \) can be any real number; it need not be a probability [8].

References

  1. Adan, I., Weiss, G.: A skill based parallel service system under FCFS-ALIS—steady state, overloads, and abandonments. Stoch. Syst. 4(1), 250–299 (2014)

    Article  Google Scholar 

  2. Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. In NSDI, pp. 185–198, April 2013

  3. Ananthanarayanan, G., Hung, M.C.C., Ren, X., Stoica, I., Wierman, A., Yu, M.: Grass: trimming stragglers in approximation analytics. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, pp. 289–302. USENIX Association, 2014

  4. Baccelli, F., Makowski, A.: Simple computable bounds for the fork-join queue. Technical Report RR-0394, Inria, 1985

  5. Baccelli, F., Makowski, A.M., Shwartz, A.: The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Prob. 21, 629–660 (1989)

    Article  Google Scholar 

  6. Bassamboo, A., Randhawa, R.S., Mieghem, J.A.V.: A little flexibility is all you need: on the value of flexible resources in queueing systems. Oper. Res. 60, 1423–1435 (2012)

    Article  Google Scholar 

  7. Borst, S., Boxma, O., Uitert, M.V.: The asymptotic workload behavior of two coupled queues. Queueing Syst. 43(1–2), 81–102 (2003)

    Article  Google Scholar 

  8. Botta, R.F., Harris, C.M., Marchal, W.G.: Characterizations of generalized hyperexponential distribution functions. Commun. Stat. Stoch. Models 3(1), 115–148 (1987)

    Article  Google Scholar 

  9. Boxma, O., Koole, G., Liu, Z.: Queueing-theoretic solution methods for models of parallel and distributed systems. In Performance Evaluation of Parallel and Distributed Systems Solution Methods. CWI Tract 105 & 106, pp. 1–24, 1994

  10. Casanova, H.: Benefits and drawbacks of redundant batch requests. J. Grid Comput. 5(2), 235–250 (2007)

    Article  Google Scholar 

  11. Cohen, J.W., Boxma, O.J.: Boundary Value Problems in Queueing System Analysis. North-Holland Publishing Company, Amsterdam (1983)

    Google Scholar 

  12. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  13. Fayolle, G., Iasnogorodski, R.: Two coupled processors: the reduction to a Riemann-Hilbert problem. Zeitschrift fur Wahrscheinlichkeitstheorie und vervandte Gebiete 47(3), 325–351 (1979)

    Article  Google Scholar 

  14. Flatto, L.: Two parallel queues created by arrivals with two demands II. SIAM J. Appl. Math. 45(5), 1159–1166 (1985)

    Article  Google Scholar 

  15. Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math. 44(5), 250–255 (1984)

    Article  Google Scholar 

  16. Hall, P.: On representatives of subsets. J. Lond. Math. Soc. 10(1), 26–30 (1935)

    Google Scholar 

  17. Harchol-Balter, M., Li, C., Osogami, T., Scheller-Wolf, A., Squillante, M.: Cycle stealing under immediate dispatch task assignment. In Annual Symposium on Parallel Algorithms and Architectures, pp. 274–285, June 2003

  18. Hooghiemstra, G., Keane, M., de Ree, S.V.: Power series for stationary distributions of coupled processor models. SIAM J. Appl. Math. 48(5), 861–878 (1988)

    Article  Google Scholar 

  19. Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the SOSP’05, pp. 263–276, December 2005

  20. Joshi, G., Liu, Y., Soljanin, E.: Coding for fast content download. In Allerton Conference’12, pp. 326–333, 2012

  21. Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 32(5), 989–997 (2014)

    Article  Google Scholar 

  22. Keilson, J., Servi, L.: A distributional form of Little’s Law. Oper. Res. Lett. 7(5), 223–227 (1988)

    Article  Google Scholar 

  23. Kim, C., Agrawala, A.K.: Analysis of the fork-join queue. IEEE Trans. Comput. 38(2), 1041–1053 (1989)

    Article  Google Scholar 

  24. Konheim, A.G., Meilijson, I., Melkman, A.: Processor-sharing of two parallel lines. J. Appl. Prob. 18(4), 952–956 (1981)

    Article  Google Scholar 

  25. Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11, 163–173 (2009)

    Article  Google Scholar 

  26. Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)

    Article  Google Scholar 

  27. Osogami, T., Harchol-Balter, M., Scheller-Wolf, A.: Analysis of cycle stealing with switching times and thresholds. In SIGMETRICS, pp. 184–195, June 2003

  28. Ren, X., Ananthanarayanan, G., Wierman, A., Yu, M.: Hopper: decentralized speculation-aware cluster scheduling at scale

  29. Shah, N.B., Lee, K., Ramchandran, K.: The MDS queue: analysing latency performance of codes and redundant requests. Technical Report arXiv:1211.5405, November 2012

  30. Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? Technical Report arXiv:1311.2851, June 2013

  31. Stolyar, A.L., Tezcan, T.: Control of systems with flexible multi-server pools: a shadow routing approach. Queueing Syst. 66, 1–51 (2010)

    Article  Google Scholar 

  32. Tsitsiklis, J., Xu, K.: On the power of (even a little) resource pooling. Stoch. Syst. 2, 1–66 (2012)

    Article  Google Scholar 

  33. Tsitsiklis, J., Xu, K.: Queueing system topologies with limited flexibility. In SIGMETRICS, 2013

  34. Visschers, J., Adan, I., Weiss, G.: A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 269–298 (2012)

    Article  Google Scholar 

  35. Vulimiri, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In CoNEXT, pp. 283–294, December 2013

  36. Wang, D., Joshi, G., Wornell, G.: Efficient task replication for fast response times in parallel computation. Technical Report arXiv:1404.1328, April 2014

  37. Xia, C., Liu, Z., Towsley, D., Lelarge, M.: Scalability of fork/join queueing networks with blocking. In SIGMETRICS, pp. 133–144, June 2007

  38. Xu, Y., Bailey, M., Noble, B., Jahanian, F.: Small is better: avoiding latency traps in virtualized data centers. In Proceedings of the 4th annual Symposium on Cloud Computing, p. 7. ACM, 2013

Download references

Funding

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1252522; was funded by NSF-CMMI-1334194, NSF-CSR-1116282, and NSF-CMMI-1538204, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in FQ4BD and TOP-Energy projects (Grant Nos. 296206 and 268992).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristen Gardner.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gardner, K., Zbarsky, S., Doroudi, S. et al. Queueing with redundant requests: exact analysis. Queueing Syst 83, 227–259 (2016). https://doi.org/10.1007/s11134-016-9485-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11134-016-9485-y

Keywords

Mathematics Subject Classification

Navigation