research-article

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

Authors:
Xiaoqi Ren

California Institute of Technology, Pasadena, CA, USA

California Institute of Technology, Pasadena, CA, USA
View Profile

,
Ganesh Ananthanarayanan

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Adam Wierman

California Institute of Technology, Pasadena, CA, USA

California Institute of Technology, Pasadena, CA, USA
View Profile

,
Minlan Yu

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA
View Profile

ACM SIGCOMM Computer Communication Review Volume 45 Issue 4October 2015pp 379–392https://doi.org/10.1145/2829988.2787481

Published:17 August 2015Publication History

ACM SIGCOMM Computer Communication Review

Abstract

As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers. However, speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. In this work, we present Hopper, a job scheduler that is speculation-aware, i.e., that integrates the tradeoffs associated with speculation into job scheduling decisions. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.

Supplemental Material

p379-ren.webm

webm

151.7 MB

Download

References

Apache Thrift. https://thrift.apache.org/.Google Scholar
Cloudera Impala. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html.Google Scholar
Hadoop. http://hadoop.apache.org.Google Scholar
Hadoop Capacity Scheduler. http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html.Google Scholar
Hadoop Distributed File System. http://hadoop.apache.org/hdfs.Google Scholar
Hadoop Slowstart. https://issues.apache.org/jira/browse/MAPREDUCE-1184/.Google Scholar
Hive. http://wiki.apache.org/hadoop/Hive.Google Scholar
Hopper Technical Report. https://sites.google.com/site/sigcommhoppertechreport/.Google Scholar
Sparrow. https://github.com/radlab/sparrow.Google Scholar
The Next Generation of Apache Hadoop MapReduce. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/.Google Scholar
G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters. In EuroSys, 2011. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective Straggler Mitigation: Attack of the Clones. In USENIX NSDI, 2013. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In USENIX NSDI, 2012. Google ScholarDigital Library
G. Ananthanarayanan, M. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. GRASS: Trimming Stragglers in Approximation Analytics. In USENIX NSDI, 2014. Google ScholarDigital Library
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, E. Harris, and B. Saha. Reining in the Outliers in Map-Reduce Clusters Using Mantri. In USENIX OSDI, 2010. Google ScholarDigital Library
E. Bortnikov, A. Frank, E. Hillel, and S. Rao. Predicting Execution Bottlenecks in Map-Reduce Clusters. In USENIX HotCloud, 2012. Google ScholarDigital Library
E. Boutin, J. Ekanayake, W. Kin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In USENIX OSDI, 2014. Google ScholarDigital Library
M. Bramson, Y. Lu, and B. Prabhakar. Randomized load balancing with general service time distributions. In Proceedings of Sigmetrics, pages 275--286, 2010. Google ScholarDigital Library
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proceedings of the VLDB Endowment, (2), 2008. Google ScholarDigital Library
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008. Google ScholarDigital Library
H. Chen, J. Marden, and A. Wierman. On the Impact of Heterogeneity and Back-end Scheduling in Load Balancing Designs. In INFOCOM. IEEE, 2009.Google ScholarCross Ref
J. Dean. Achieving Rapid Response Times in Large Online Services. In Berkeley AMPLab Cloud Seminar, 2012.Google Scholar
J. Dean and L. Barroso. The Tail at Scale. Communications of the ACM, (2), 2013. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 2008. Google ScholarDigital Library
F. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In ACM SIGCOMM, 2014. Google ScholarDigital Library
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. In USENIX NSDI, 2011. Google ScholarDigital Library
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-Resource Packing for Cluster Schedulers. In ACM SIGCOMM, 2014. Google ScholarDigital Library
M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Transactions on Computer Systems (TOCS), 21(2):207--233, 2003. Google ScholarDigital Library
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In USENIX NSDI, 2011. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SOSP, 2009. Google ScholarDigital Library
M. Lin, L. Zhang, A. Wierman, and J. Tan. Joint Optimization of Overlapping Phases in MapReduce. Performance Evaluation, 2013. Google ScholarDigital Library
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. In VLDB, 2010. Google ScholarDigital Library
B. Moseley, A. Dasgupta, R. Kumar, and T. Sarlós. On Scheduling in Map-reduce and Flow-shops. In ACM SPAA, 2011. Google ScholarDigital Library
K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, and I. Stoica. The Case for Tiny Tasks in Compute Clusters. In USENIX HotOS, 2013. Google ScholarDigital Library
K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. Chun. Making Sense of Performance in Data Analytics Frameworks. In USENIX NSDI, 2015. Google ScholarDigital Library
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, Low Latency Scheduling. In ACM SOSP, 2013. Google ScholarDigital Library
K. Pruhs, J. Sgall, and E. Torng. Online scheduling. Handbook of scheduling: algorithms, models, and performance analysis, pages 15--1, 2004.Google Scholar
A. Richa, M. Mitzenmacher, and R. Sitaraman. The power of two random choices: A survey of techniques and results. Combinatorial Optimization, 2001.Google Scholar
L. Schrage. A proof of the optimality of the shortest remaining processing time discipline. Operations Research, 16(3):687--690, 1968.Google ScholarDigital Library
B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das. Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters. In ACM SOCC, 2011. Google ScholarDigital Library
J. Tan, X. Meng, and L. Zhang. Delay Tails in MapReduce Scheduling. ACM SIGMETRICS Performance Evaluation Review, 2012. Google ScholarDigital Library
Y. Wang, J. Tan, W. Yu, L. Zhang, and X. Meng. Preemptive ReduceTask Scheduling for Fast and Fair Job Completion. USENIX ICAC, 2013.Google Scholar
A. Wierman. Fairness and scheduling in single server queues. Surveys in Operations Research and Management Science, 16(1):39--48, 2011.Google ScholarCross Ref
A. Wierman and M. Harchol-Balter. Classifying scheduling policies with respect to unfairness in an m/gi/1. In ACM SIGMETRICS Performance Evaluation Review, volume 31, pages 238--249. ACM, 2003. Google ScholarDigital Library
J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. Balmin. FLEX: a Slot Allocation Scheduling Optimizer for MapReduce Workloads. In Middleware 2010. Springer, 2010. Google ScholarDigital Library
N. Yadwadkar, G. Ananthanarayanan, and R. Katz. Wrangler: Predictable and Faster Jobs using Fewer Resources. In ACM SoCC, 2014. Google ScholarDigital Library
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. In UC Berkeley Technical Report UCB/EECS-2009--55, 2009.Google Scholar
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In ACM EuroSys, 2010. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In USENIX NSDI, 2012. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In USENIX OSDI, 2008. Google ScholarDigital Library

Index Terms

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Networks
  1. Network services
    1. Cloud computing

Recommendations

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly ...
Read More
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Read More
The Superthreaded Processor Architecture

The common single-threaded execution model limits processors to exploiting only the relatively small amount of instruction-level parallelism available in application programs. The superthreaded processor, on the other hand, is a concurrent multithreaded ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGCOMM Computer Communication Review Volume 45, Issue 4
SIGCOMM'15
October 2015
659 pages
ISSN:0146-4833
DOI:10.1145/2829988
Editors:
Konstantina Papagiannaki
Telefonica Research, Barcelona, Spain
,
Katerina Argyraki
EPFL, Switzerland
,
Hitesh Ballani
Microsoft Research Cambridge, UK
,
Fabián Bustamante
Northwestern University, USA
,
Joseph Camp
SMU, USA
,
Augustin Chaintreau
Columbia University, USA
,
Phillipa Gill
Stony Brook University, USA
,
Marco Mellia
Politecnico di Torino, Italy
,
Bhaskaran Raman
IIT Bombay, India
,
Joel Sommers
Colgate University, USA
,
Aline Carneiro Viana
INRIA, France
Issue’s Table of Contents
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
August 2015
684 pages
ISBN:9781450335423
DOI:10.1145/2785956
General Chairs:
Steve Uhlig
Queen Mary University of London, UK
,
Olaf Maennel
Tallinn U. of Technology in Estonia, Estonia
,
Program Chairs:
Brad Karp
University College London, UK
,
Jitendra Padhye
Microsoft, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2015
Check for updates
Author Tags
decentralized scheduling
fairness
speculation
straggler
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 943
  Total Downloads
- Downloads (Last 12 months)80
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

ACM SIGCOMM Computer Communication Review

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

An evaluation of speculative instruction execution on simultaneous multithreaded processors

The Superthreaded Processor Architecture