ABSTRACT
Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy.
This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system.
In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant.
We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.
- I. Adan and G. Weiss. A skill based parallel service system under FCFS-ALIS - steady state, overloads, and abandonments. Stochastic Systems, 4(1):250--299, 2014.Google ScholarCross Ref
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In NSDI, pages 185--198, April 2013. Google ScholarDigital Library
- F. Baccelli and A. Makowski. Simple computable bounds for the fork-join queue. Technical Report RR-0394, Inria, 1985.Google Scholar
- F. Baccelli, A. M. Makowski, and A. Shwartz. The fork-join queue and related systems with synchronization constraints: Stochastic ordering and computable bounds. Advances in Applied Probability, 21:629--660, 1989.Google ScholarCross Ref
- A. Bassamboo, R. S. Randhawa, and J. A. V. Mieghem. A little flexibility is all you need: On the value of flexible resources in queueing systems. Operations Research, 60:1423--1435, 2012. Google ScholarDigital Library
- S. Borst, O. Boxma, and M. V. Uitert. The asymptotic workload behavior of two coupled queues. Queueing Systems, 43(1--2):81--102, January 2003. Google ScholarDigital Library
- R. F. Botta, C. M. Harris, and W. G. Marchal. Characterizations of generalized hyperexponential distribution functions. Communications in Statistics, Stochastic Models, 3(1):115--148, 1987.Google ScholarCross Ref
- O. Boxma, G. Koole, and Z. Liu. Queueing-theoretic solution methods for models of parallel and distributed systems. In Performance Evaluation of Parallel and Distributed Systems Solution Methods. CWI Tract 105 & 106, pages 1--24, 1994.Google Scholar
- H. Casanova. Benefits and drawbacks of redundant batch requests. Journal of Grid Computing, 5(2):235--250, February 2007.Google ScholarCross Ref
- J. W. Cohen and O. J. Boxma. Boundary Value Problems in Queueing System Analysis. North-Holland Publishing Company, 1983.Google Scholar
- J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, February 2013. Google ScholarDigital Library
- G. Fayolle and R. Iasnogorodski. Two coupled processors: The reduction to a Riemann-Hilbert problem. Zeitschrift fur Wahrscheinlichkeitstheorie und vervandte Gebiete, 47(3):325--351, 1979.Google Scholar
- L. Flatto. Two parallel queues created by arrivals with two demands II. SIAM Journal on Applied Mathematics, 45(5):1159--1166, October 1985.Google ScholarCross Ref
- L. Flatto and S. Hahn. Two parallel queues created by arrivals with two demands I. SIAM Journal on Applied Mathematics, 44(5):250--255, October 1984.Google ScholarCross Ref
- K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, E. Hyytiä, and A. Scheller-Wolf. Queueing with redundant requests: First exact analysis. Technical Report Carnegie Mellon University-CS-14--143R, January 2015.Google Scholar
- M. Harchol-Balter, C. Li, T. Osogami, A. Scheller-Wolf, and M. Squillante. Cycle stealing under immediate dispatch task assignment. In Annual Symposium on Parallel Algorithms and Architectures, pages 274--285, June 2003. Google ScholarDigital Library
- G. Hooghiemstra, M. Keane, and S. V. de Ree. Power series for stationary distributions of coupled processor models. SIAM Journal on Applied Mathematics, 48(5):861--878, October 1988. Google ScholarDigital Library
- H. Huang, W. Hung, and K. G. Shin. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proc. of SOSP'05, pages 263--276, December 2005. Google ScholarDigital Library
- G. Joshi, Y. Liu, and E. Soljanin. Coding for fast content download. In Allerton Conference'12, pages 326--333, 2012.Google ScholarCross Ref
- G. Joshi, Y. Liu, and E. Soljanin. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE Journal on Selected Areas in Communications, 32(5):989--997, May 2014.Google ScholarCross Ref
- J. Keilson and L. Servi. A distributional form of Little's Law. Operations Research Letters, 7(5):223--227, 1988. Google ScholarDigital Library
- C. Kim and A. K. Agrawala. Analysis of the fork-join queue. IEEE Transactions on Computers, 38(2):1041--1053, February 1989. Google ScholarDigital Library
- A. G. Konheim, I. Meilijson, and A. Melkman. Processor-sharing of two parallel lines. Journal of Applied Probability, 18(4):952--956, December 1981.Google ScholarCross Ref
- G. Koole and R. Righter. Resource allocation in grid computing. Journal of Scheduling, 11:163--173, 2009. Google ScholarDigital Library
- R. Nelson and A. N. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. IEEE Transactions on Computers, 37(6):739--743, 1988. Google ScholarDigital Library
- T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf. Analysis of cycle stealing with switching times and thresholds. In SIGMETRICS, pages 184--195, June 2003.Google Scholar
- N. B. Shah, K. Lee, and K. Ramchandran. The MDS queue: Analysing latency performance of codes and redundant requests. Technical Report arXiv:1211.5405, November 2012.Google Scholar
- N. B. Shah, K. Lee, and K. Ramchandran. When do redundant requests reduce latency? Technical Report arXiv:1311.2851, June 2013.Google Scholar
- A. L. Stolyar and T. Tezcan. Control of systems with flexible multi-server pools: a shadow routing approach. Queueing Systems, 66:1--51, 2010. Google ScholarDigital Library
- J. Tsitsiklis and K. Xu. On the power of (even a little) resource pooling. Stochastic Systems, 2:1--66, 2012.Google ScholarCross Ref
- J. Tsitsiklis and K. Xu. Queueing system topologies with limited flexibility. In SIGMETRICS, 2013. Google ScholarDigital Library
- J. Visschers, I. Adan, and G. Weiss. A product form solution to a system with multi-type jobs and multi-type servers. Queueing Systems, 70:269--298, 2012. Google ScholarDigital Library
- A. Vulimiri, P. B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy, and S. Shenker. Low latency via redundancy. In CoNEXT, pages 283--294, December 2013. Google ScholarDigital Library
- D. Wang, G. Joshi, and G. Wornell. Efficient task replication for fast response times in parallel computation. Technical Report arXiv:1404.1328, April 2014. Google Scholar
- C. Xia, Z. Liu, D. Towsley, and M. Lelarge. Scalability of fork/join queueing networks with blocking. In SIGMETRICS, pages 133--144, June 2007. Google ScholarDigital Library
Index Terms
- Reducing Latency via Redundant Requests: Exact Analysis
Recommendations
Reducing Latency via Redundant Requests: Exact Analysis
Performance evaluation reviewRecent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact ...
Queueing with redundant requests: exact analysis
Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However, there is no exact ...
Optimal Reissue Policies for Reducing Tail Latency
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and ArchitecturesInteractive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These additional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the ...
Comments