Abstract
Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However, there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution of the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the “gain” to redundant classes and “pain” to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple exponential distribution and that of the non-redundant class follows a generalized hyperexponential. Second, fully redundant classes are “immune” to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and join-the-shortest-queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.
Similar content being viewed by others
Notes
This is counterintuitive because, as we will see in Lemma 3, the distribution of response time for class R does not depend on whether class A is redundant or non-redundant.
A generalized hyperexponential \(H_2(\nu _1,\nu _2,\omega )\) is defined as the weighted mixture of two exponentials with rates \(\nu _1\) and \(\nu _2\), where the first exponential is given weight \(1+\omega \) and the second is given weight \(-\omega \). Note that \(\omega \) can be any real number; it need not be a probability [8].
References
Adan, I., Weiss, G.: A skill based parallel service system under FCFS-ALIS—steady state, overloads, and abandonments. Stoch. Syst. 4(1), 250–299 (2014)
Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective straggler mitigation: attack of the clones. In NSDI, pp. 185–198, April 2013
Ananthanarayanan, G., Hung, M.C.C., Ren, X., Stoica, I., Wierman, A., Yu, M.: Grass: trimming stragglers in approximation analytics. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, pp. 289–302. USENIX Association, 2014
Baccelli, F., Makowski, A.: Simple computable bounds for the fork-join queue. Technical Report RR-0394, Inria, 1985
Baccelli, F., Makowski, A.M., Shwartz, A.: The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Prob. 21, 629–660 (1989)
Bassamboo, A., Randhawa, R.S., Mieghem, J.A.V.: A little flexibility is all you need: on the value of flexible resources in queueing systems. Oper. Res. 60, 1423–1435 (2012)
Borst, S., Boxma, O., Uitert, M.V.: The asymptotic workload behavior of two coupled queues. Queueing Syst. 43(1–2), 81–102 (2003)
Botta, R.F., Harris, C.M., Marchal, W.G.: Characterizations of generalized hyperexponential distribution functions. Commun. Stat. Stoch. Models 3(1), 115–148 (1987)
Boxma, O., Koole, G., Liu, Z.: Queueing-theoretic solution methods for models of parallel and distributed systems. In Performance Evaluation of Parallel and Distributed Systems Solution Methods. CWI Tract 105 & 106, pp. 1–24, 1994
Casanova, H.: Benefits and drawbacks of redundant batch requests. J. Grid Comput. 5(2), 235–250 (2007)
Cohen, J.W., Boxma, O.J.: Boundary Value Problems in Queueing System Analysis. North-Holland Publishing Company, Amsterdam (1983)
Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)
Fayolle, G., Iasnogorodski, R.: Two coupled processors: the reduction to a Riemann-Hilbert problem. Zeitschrift fur Wahrscheinlichkeitstheorie und vervandte Gebiete 47(3), 325–351 (1979)
Flatto, L.: Two parallel queues created by arrivals with two demands II. SIAM J. Appl. Math. 45(5), 1159–1166 (1985)
Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two demands I. SIAM J. Appl. Math. 44(5), 250–255 (1984)
Hall, P.: On representatives of subsets. J. Lond. Math. Soc. 10(1), 26–30 (1935)
Harchol-Balter, M., Li, C., Osogami, T., Scheller-Wolf, A., Squillante, M.: Cycle stealing under immediate dispatch task assignment. In Annual Symposium on Parallel Algorithms and Architectures, pp. 274–285, June 2003
Hooghiemstra, G., Keane, M., de Ree, S.V.: Power series for stationary distributions of coupled processor models. SIAM J. Appl. Math. 48(5), 861–878 (1988)
Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proceedings of the SOSP’05, pp. 263–276, December 2005
Joshi, G., Liu, Y., Soljanin, E.: Coding for fast content download. In Allerton Conference’12, pp. 326–333, 2012
Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 32(5), 989–997 (2014)
Keilson, J., Servi, L.: A distributional form of Little’s Law. Oper. Res. Lett. 7(5), 223–227 (1988)
Kim, C., Agrawala, A.K.: Analysis of the fork-join queue. IEEE Trans. Comput. 38(2), 1041–1053 (1989)
Konheim, A.G., Meilijson, I., Melkman, A.: Processor-sharing of two parallel lines. J. Appl. Prob. 18(4), 952–956 (1981)
Koole, G., Righter, R.: Resource allocation in grid computing. J. Sched. 11, 163–173 (2009)
Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchronization in parallel queues. IEEE Trans. Comput. 37(6), 739–743 (1988)
Osogami, T., Harchol-Balter, M., Scheller-Wolf, A.: Analysis of cycle stealing with switching times and thresholds. In SIGMETRICS, pp. 184–195, June 2003
Ren, X., Ananthanarayanan, G., Wierman, A., Yu, M.: Hopper: decentralized speculation-aware cluster scheduling at scale
Shah, N.B., Lee, K., Ramchandran, K.: The MDS queue: analysing latency performance of codes and redundant requests. Technical Report arXiv:1211.5405, November 2012
Shah, N.B., Lee, K., Ramchandran, K.: When do redundant requests reduce latency? Technical Report arXiv:1311.2851, June 2013
Stolyar, A.L., Tezcan, T.: Control of systems with flexible multi-server pools: a shadow routing approach. Queueing Syst. 66, 1–51 (2010)
Tsitsiklis, J., Xu, K.: On the power of (even a little) resource pooling. Stoch. Syst. 2, 1–66 (2012)
Tsitsiklis, J., Xu, K.: Queueing system topologies with limited flexibility. In SIGMETRICS, 2013
Visschers, J., Adan, I., Weiss, G.: A product form solution to a system with multi-type jobs and multi-type servers. Queueing Syst. 70, 269–298 (2012)
Vulimiri, A., Godfrey, P.B., Mittal, R., Sherry, J., Ratnasamy, S., Shenker, S.: Low latency via redundancy. In CoNEXT, pp. 283–294, December 2013
Wang, D., Joshi, G., Wornell, G.: Efficient task replication for fast response times in parallel computation. Technical Report arXiv:1404.1328, April 2014
Xia, C., Liu, Z., Towsley, D., Lelarge, M.: Scalability of fork/join queueing networks with blocking. In SIGMETRICS, pp. 133–144, June 2007
Xu, Y., Bailey, M., Noble, B., Jahanian, F.: Small is better: avoiding latency traps in virtualized data centers. In Proceedings of the 4th annual Symposium on Cloud Computing, p. 7. ACM, 2013
Funding
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1252522; was funded by NSF-CMMI-1334194, NSF-CSR-1116282, and NSF-CMMI-1538204, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in FQ4BD and TOP-Energy projects (Grant Nos. 296206 and 268992).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Rights and permissions
About this article
Cite this article
Gardner, K., Zbarsky, S., Doroudi, S. et al. Queueing with redundant requests: exact analysis. Queueing Syst 83, 227–259 (2016). https://doi.org/10.1007/s11134-016-9485-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11134-016-9485-y