research-article

Reducing Latency via Redundant Requests: Exact Analysis

Authors:
Kristen Gardner

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Samuel Zbarsky

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Sherwin Doroudi

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Mor Harchol-Balter

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Esa Hyytia

Aalto University, Aalto, Finland

Aalto University, Aalto, Finland
View Profile

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsJune 2015Pages 347–360https://doi.org/10.1145/2745844.2745873

Published:15 June 2015Publication History

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Pages 347–360

ABSTRACT

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact analysis of systems with redundancy.

This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution on the state of the system.

In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the "gain" to redundant classes and "pain" to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple Exponential distribution and that of the non-redundant class follows a Generalized Hyperexponential. Second, fully redundant classes are "immune" to any pain caused by other classes becoming redundant.

We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and Join-the-Shortest-Queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

References

I. Adan and G. Weiss. A skill based parallel service system under FCFS-ALIS - steady state, overloads, and abandonments. Stochastic Systems, 4(1):250--299, 2014.Google ScholarCross Ref
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In NSDI, pages 185--198, April 2013. Google ScholarDigital Library
F. Baccelli and A. Makowski. Simple computable bounds for the fork-join queue. Technical Report RR-0394, Inria, 1985.Google Scholar
F. Baccelli, A. M. Makowski, and A. Shwartz. The fork-join queue and related systems with synchronization constraints: Stochastic ordering and computable bounds. Advances in Applied Probability, 21:629--660, 1989.Google ScholarCross Ref
A. Bassamboo, R. S. Randhawa, and J. A. V. Mieghem. A little flexibility is all you need: On the value of flexible resources in queueing systems. Operations Research, 60:1423--1435, 2012. Google ScholarDigital Library
S. Borst, O. Boxma, and M. V. Uitert. The asymptotic workload behavior of two coupled queues. Queueing Systems, 43(1--2):81--102, January 2003. Google ScholarDigital Library
R. F. Botta, C. M. Harris, and W. G. Marchal. Characterizations of generalized hyperexponential distribution functions. Communications in Statistics, Stochastic Models, 3(1):115--148, 1987.Google ScholarCross Ref
O. Boxma, G. Koole, and Z. Liu. Queueing-theoretic solution methods for models of parallel and distributed systems. In Performance Evaluation of Parallel and Distributed Systems Solution Methods. CWI Tract 105 & 106, pages 1--24, 1994.Google Scholar
H. Casanova. Benefits and drawbacks of redundant batch requests. Journal of Grid Computing, 5(2):235--250, February 2007.Google ScholarCross Ref
J. W. Cohen and O. J. Boxma. Boundary Value Problems in Queueing System Analysis. North-Holland Publishing Company, 1983.Google Scholar
J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, February 2013. Google ScholarDigital Library
G. Fayolle and R. Iasnogorodski. Two coupled processors: The reduction to a Riemann-Hilbert problem. Zeitschrift fur Wahrscheinlichkeitstheorie und vervandte Gebiete, 47(3):325--351, 1979.Google Scholar
L. Flatto. Two parallel queues created by arrivals with two demands II. SIAM Journal on Applied Mathematics, 45(5):1159--1166, October 1985.Google ScholarCross Ref
L. Flatto and S. Hahn. Two parallel queues created by arrivals with two demands I. SIAM Journal on Applied Mathematics, 44(5):250--255, October 1984.Google ScholarCross Ref
K. Gardner, S. Zbarsky, S. Doroudi, M. Harchol-Balter, E. Hyytiä, and A. Scheller-Wolf. Queueing with redundant requests: First exact analysis. Technical Report Carnegie Mellon University-CS-14--143R, January 2015.Google Scholar
M. Harchol-Balter, C. Li, T. Osogami, A. Scheller-Wolf, and M. Squillante. Cycle stealing under immediate dispatch task assignment. In Annual Symposium on Parallel Algorithms and Architectures, pages 274--285, June 2003. Google ScholarDigital Library
G. Hooghiemstra, M. Keane, and S. V. de Ree. Power series for stationary distributions of coupled processor models. SIAM Journal on Applied Mathematics, 48(5):861--878, October 1988. Google ScholarDigital Library
H. Huang, W. Hung, and K. G. Shin. FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In Proc. of SOSP'05, pages 263--276, December 2005. Google ScholarDigital Library
G. Joshi, Y. Liu, and E. Soljanin. Coding for fast content download. In Allerton Conference'12, pages 326--333, 2012.Google ScholarCross Ref
G. Joshi, Y. Liu, and E. Soljanin. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE Journal on Selected Areas in Communications, 32(5):989--997, May 2014.Google ScholarCross Ref
J. Keilson and L. Servi. A distributional form of Little's Law. Operations Research Letters, 7(5):223--227, 1988. Google ScholarDigital Library
C. Kim and A. K. Agrawala. Analysis of the fork-join queue. IEEE Transactions on Computers, 38(2):1041--1053, February 1989. Google ScholarDigital Library
A. G. Konheim, I. Meilijson, and A. Melkman. Processor-sharing of two parallel lines. Journal of Applied Probability, 18(4):952--956, December 1981.Google ScholarCross Ref
G. Koole and R. Righter. Resource allocation in grid computing. Journal of Scheduling, 11:163--173, 2009. Google ScholarDigital Library
R. Nelson and A. N. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. IEEE Transactions on Computers, 37(6):739--743, 1988. Google ScholarDigital Library
T. Osogami, M. Harchol-Balter, and A. Scheller-Wolf. Analysis of cycle stealing with switching times and thresholds. In SIGMETRICS, pages 184--195, June 2003.Google Scholar
N. B. Shah, K. Lee, and K. Ramchandran. The MDS queue: Analysing latency performance of codes and redundant requests. Technical Report arXiv:1211.5405, November 2012.Google Scholar
N. B. Shah, K. Lee, and K. Ramchandran. When do redundant requests reduce latency? Technical Report arXiv:1311.2851, June 2013.Google Scholar
A. L. Stolyar and T. Tezcan. Control of systems with flexible multi-server pools: a shadow routing approach. Queueing Systems, 66:1--51, 2010. Google ScholarDigital Library
J. Tsitsiklis and K. Xu. On the power of (even a little) resource pooling. Stochastic Systems, 2:1--66, 2012.Google ScholarCross Ref
J. Tsitsiklis and K. Xu. Queueing system topologies with limited flexibility. In SIGMETRICS, 2013. Google ScholarDigital Library
J. Visschers, I. Adan, and G. Weiss. A product form solution to a system with multi-type jobs and multi-type servers. Queueing Systems, 70:269--298, 2012. Google ScholarDigital Library
A. Vulimiri, P. B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy, and S. Shenker. Low latency via redundancy. In CoNEXT, pages 283--294, December 2013. Google ScholarDigital Library
D. Wang, G. Joshi, and G. Wornell. Efficient task replication for fast response times in parallel computation. Technical Report arXiv:1404.1328, April 2014. Google Scholar
C. Xia, Z. Liu, D. Towsley, and M. Lelarge. Scalability of fork/join queueing networks with blocking. In SIGMETRICS, pages 133--144, June 2007. Google ScholarDigital Library

Index Terms

Reducing Latency via Redundant Requests: Exact Analysis
1. Mathematics of computing
  1. Probability and statistics
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Reducing Latency via Redundant Requests: Exact Analysis
Performance evaluation review

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However there is no exact ...
Read More
Queueing with redundant requests: exact analysis

Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However, there is no exact ...
Read More
Optimal Reissue Policies for Reducing Tail Latency
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These additional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2015
488 pages
ISBN:9781450334860
DOI:10.1145/2745844
General Chairs:
Bill Lin
University of California, San Diego
,
Jun (Jim) Xu
Georgia Tech
,
Program Chairs:
Sudipta Sengupta
Microsoft Research
,
Devavrat Shah
Massachusetts Institute of Technology
ACM SIGMETRICS Performance Evaluation Review Volume 43, Issue 1
Performance evaluation review
June 2015
468 pages
ISSN:0163-5999
DOI:10.1145/2796314
Editors:
Derek Eager
University of Saskatchewan
,
Carey Williamson
University of Calgary
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
markov chain analysis
redundancy
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMETRICS '15 Paper Acceptance Rate32of239submissions,13%Overall Acceptance Rate459of2,691submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 120
  Total Citations
  View Citations
- 597
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing Latency via Redundant Requests: Exact Analysis

SIGMETRICS '15: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing Latency via Redundant Requests: Exact Analysis

Queueing with redundant requests: exact analysis

Optimal Reissue Policies for Reducing Tail Latency