Abstract
As clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers. However, speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. In this work, we present Hopper, a job scheduler that is speculation-aware, i.e., that integrates the tradeoffs associated with speculation into job scheduling decisions. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation.
Supplemental Material
- Apache Thrift. https://thrift.apache.org/.Google Scholar
- Cloudera Impala. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html.Google Scholar
- Hadoop. http://hadoop.apache.org.Google Scholar
- Hadoop Capacity Scheduler. http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html.Google Scholar
- Hadoop Distributed File System. http://hadoop.apache.org/hdfs.Google Scholar
- Hadoop Slowstart. https://issues.apache.org/jira/browse/MAPREDUCE-1184/.Google Scholar
- Hive. http://wiki.apache.org/hadoop/Hive.Google Scholar
- Hopper Technical Report. https://sites.google.com/site/sigcommhoppertechreport/.Google Scholar
- Sparrow. https://github.com/radlab/sparrow.Google Scholar
- The Next Generation of Apache Hadoop MapReduce. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/.Google Scholar
- G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris. Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters. In EuroSys, 2011. Google ScholarDigital Library
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective Straggler Mitigation: Attack of the Clones. In USENIX NSDI, 2013. Google ScholarDigital Library
- G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In USENIX NSDI, 2012. Google ScholarDigital Library
- G. Ananthanarayanan, M. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. GRASS: Trimming Stragglers in Approximation Analytics. In USENIX NSDI, 2014. Google ScholarDigital Library
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, E. Harris, and B. Saha. Reining in the Outliers in Map-Reduce Clusters Using Mantri. In USENIX OSDI, 2010. Google ScholarDigital Library
- E. Bortnikov, A. Frank, E. Hillel, and S. Rao. Predicting Execution Bottlenecks in Map-Reduce Clusters. In USENIX HotCloud, 2012. Google ScholarDigital Library
- E. Boutin, J. Ekanayake, W. Kin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In USENIX OSDI, 2014. Google ScholarDigital Library
- M. Bramson, Y. Lu, and B. Prabhakar. Randomized load balancing with general service time distributions. In Proceedings of Sigmetrics, pages 275--286, 2010. Google ScholarDigital Library
- R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proceedings of the VLDB Endowment, (2), 2008. Google ScholarDigital Library
- R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008. Google ScholarDigital Library
- H. Chen, J. Marden, and A. Wierman. On the Impact of Heterogeneity and Back-end Scheduling in Load Balancing Designs. In INFOCOM. IEEE, 2009.Google ScholarCross Ref
- J. Dean. Achieving Rapid Response Times in Large Online Services. In Berkeley AMPLab Cloud Seminar, 2012.Google Scholar
- J. Dean and L. Barroso. The Tail at Scale. Communications of the ACM, (2), 2013. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 2008. Google ScholarDigital Library
- F. Dogar, T. Karagiannis, H. Ballani, and A. Rowstron. Decentralized Task-aware Scheduling for Data Center Networks. In ACM SIGCOMM, 2014. Google ScholarDigital Library
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. In USENIX NSDI, 2011. Google ScholarDigital Library
- R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-Resource Packing for Cluster Schedulers. In ACM SIGCOMM, 2014. Google ScholarDigital Library
- M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Transactions on Computer Systems (TOCS), 21(2):207--233, 2003. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In USENIX NSDI, 2011. Google ScholarDigital Library
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In ACM SOSP, 2009. Google ScholarDigital Library
- M. Lin, L. Zhang, A. Wierman, and J. Tan. Joint Optimization of Overlapping Phases in MapReduce. Performance Evaluation, 2013. Google ScholarDigital Library
- S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive Analysis of Web-Scale Datasets. In VLDB, 2010. Google ScholarDigital Library
- B. Moseley, A. Dasgupta, R. Kumar, and T. Sarlós. On Scheduling in Map-reduce and Flow-shops. In ACM SPAA, 2011. Google ScholarDigital Library
- K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, and I. Stoica. The Case for Tiny Tasks in Compute Clusters. In USENIX HotOS, 2013. Google ScholarDigital Library
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. Chun. Making Sense of Performance in Data Analytics Frameworks. In USENIX NSDI, 2015. Google ScholarDigital Library
- K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, Low Latency Scheduling. In ACM SOSP, 2013. Google ScholarDigital Library
- K. Pruhs, J. Sgall, and E. Torng. Online scheduling. Handbook of scheduling: algorithms, models, and performance analysis, pages 15--1, 2004.Google Scholar
- A. Richa, M. Mitzenmacher, and R. Sitaraman. The power of two random choices: A survey of techniques and results. Combinatorial Optimization, 2001.Google Scholar
- L. Schrage. A proof of the optimality of the shortest remaining processing time discipline. Operations Research, 16(3):687--690, 1968.Google ScholarDigital Library
- B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das. Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters. In ACM SOCC, 2011. Google ScholarDigital Library
- J. Tan, X. Meng, and L. Zhang. Delay Tails in MapReduce Scheduling. ACM SIGMETRICS Performance Evaluation Review, 2012. Google ScholarDigital Library
- Y. Wang, J. Tan, W. Yu, L. Zhang, and X. Meng. Preemptive ReduceTask Scheduling for Fast and Fair Job Completion. USENIX ICAC, 2013.Google Scholar
- A. Wierman. Fairness and scheduling in single server queues. Surveys in Operations Research and Management Science, 16(1):39--48, 2011.Google ScholarCross Ref
- A. Wierman and M. Harchol-Balter. Classifying scheduling policies with respect to unfairness in an m/gi/1. In ACM SIGMETRICS Performance Evaluation Review, volume 31, pages 238--249. ACM, 2003. Google ScholarDigital Library
- J. Wolf, D. Rajan, K. Hildrum, R. Khandekar, V. Kumar, S. Parekh, K. Wu, and A. Balmin. FLEX: a Slot Allocation Scheduling Optimizer for MapReduce Workloads. In Middleware 2010. Springer, 2010. Google ScholarDigital Library
- N. Yadwadkar, G. Ananthanarayanan, and R. Katz. Wrangler: Predictable and Faster Jobs using Fewer Resources. In ACM SoCC, 2014. Google ScholarDigital Library
- M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job scheduling for multi-user mapreduce clusters. In UC Berkeley Technical Report UCB/EECS-2009--55, 2009.Google Scholar
- M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In ACM EuroSys, 2010. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In USENIX NSDI, 2012. Google ScholarDigital Library
- M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In USENIX OSDI, 2008. Google ScholarDigital Library
Index Terms
- Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale
Recommendations
Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data CommunicationAs clusters continue to grow in size and complexity, providing scalable and predictable performance is an increasingly important challenge. A crucial roadblock to achieving predictable performance is stragglers, i.e., tasks that take significantly ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
The Superthreaded Processor Architecture
The common single-threaded execution model limits processors to exploiting only the relatively small amount of instruction-level parallelism available in application programs. The superthreaded processor, on the other hand, is a concurrent multithreaded ...
Comments