research-article

3Sigma: distribution-based cluster scheduling for runtime uncertainty

Authors:

Alexey Tumanov,

Michael A. Kozuch,

Gregory R. GangerAuthors Info & Claims

EuroSys '18: Proceedings of the Thirteenth EuroSys Conference

Article No.: 2, Pages 1 - 17

https://doi.org/10.1145/3190508.3190515

Published: 23 April 2018 Publication History

Abstract

The 3Sigma cluster scheduling system uses job runtime histories in a new way. Knowing how long each job will execute enables a scheduler to more effectively pack jobs with diverse time concerns (e.g., deadline vs. the-sooner-the-better) and placement preferences on heterogeneous cluster resources. But, existing schedulers use single-point estimates (e.g., mean or median of a relevant subset of historical runtimes), and we show that they are fragile in the face of real-world estimate error profiles. In particular, analysis of job traces from three different large-scale cluster environments shows that, while the runtimes of many jobs can be predicted well, even state-of-the-art predictors have wide error profiles with 8--23% of predictions off by a factor of two or more. Instead of reducing relevant history to a single point, 3Sigma schedules jobs based on full distributions of relevant runtime histories and explicitly creates plans that mitigate the effects of anticipated runtime uncertainty. Experiments with workloads derived from the same traces show that 3Sigma greatly outperforms a state-of-the-art scheduler that uses point estimates from a state-of-the-art predictor; in fact, the performance of 3Sigma approaches the end-to-end performance of a scheduler based on a hypothetical, perfect runtime predictor. 3Sigma reduces SLO miss rate, increases cluster goodput, and improves or matches latency for best effort jobs.

References

[1]

Yael Ben-Haim and Elad Tom-Tov. 2010. A streaming parallel decision tree algorithm. Journal of Machine Learning Research 11, Feb (2010), 849--872.

Digital Library

[2]

Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zheng-ping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 285--300. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/boutin

Digital Library

[3]

Yun Chi, Hakan Hacígümüş, Wang-Pin Hsiung, and Jeffrey F Naughton. 2013. Distribution-based query scheduling. Proceedings of the VLDB Endowment 6, 9 (2013), 673--684.

Digital Library

[4]

Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based Scheduling: If You're Late Don't Blame Us!. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). ACM, New York, NY, USA, Article 2, 14 pages

Digital Library

[5]

Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid Datacenter Scheduling. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '15). USENIX Association, Berkeley, CA, USA, 499--510. http://dl.acm.org/citation.cfm?id=2813767.2813804

Digital Library

[6]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 127--144.

Digital Library

[7]

Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: guaranteed job latency in data parallel clusters. In Proc. of the 7th ACM european conference on Computer Systems (EuroSys '12). 99--112.

Digital Library

[8]

Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2015. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review 44, 4 (2015), 455--466.

Digital Library

[9]

Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In OSDI. 65--80.

Digital Library

[10]

Mike Hibler, Robert Ricci, Leigh Stoller, Jonathon Duerig, Shashi Guruprasad, Tim Stack, Kirk Webb, and Jay Lepreau. 2008. Large-scale Virtualization in the Emulab Network Testbed. In USENIX 2008 Annual Technical Conference (ATC'08). USENLX Association, Berkeley, CA, USA, 113--128. http://dl.acm.org/citation.cfm?id=1404014.1404023

Digital Library

[11]

Mohammad Islam, Angelo K Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, and Alejandro Abdelnur. 2012. Oozie: Towards a Scalable Workflow Management System for Hadoop. In SWEET Workshop.

Digital Library

[12]

Virajith Jalaparti, Peter Bodik, Ishai Menache, Sriram Rao, Konstantin Makarychev, and Matthew Caesar. 2015. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 407--420.

Digital Library

[13]

Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan Kulkarni, et al. 2016. Morpheus: towards automated SLOs for enterprise clusters. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. USENIX Association, 117--134.

Digital Library

[14]

S. Krishnaswamy, S. W. Loke, and A. Zaslavsky. 2004. Estimating computation times of data-intensive applications. IEEE Distributed Systems Online 5, 4 (April 2004).

[15]

Shuo Liu, Gang Quan, and Shangping Ren. 2010. On-line Scheduling of Real-time Services for Cloud Computing. In Services (SERVICES-1), 2010 6th World Congress on. IEEE.

Digital Library

[16]

Kristi Morton, Abram Friesen, Magdalena Balazinska, and Dan Grossman. 2010. Estimating the progress of MapReduce pipelines. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 681--684.

[17]

I. A. Moschakis and H. D. Karatza. 2011. Performance and cost evaluation of Gang Scheduling in a Cloud Computing system with job migrations and starvation handling. In Computers and Communications (ISCC), 2011 IEEE Symposium on. 418--423.

Digital Library

[18]

John K Ousterhout. 1982. Scheduling Techniques for Concurrent Systems. In International Conference on Distributed Computing Systems (ICDCS), Vol. 82. 22--30.

[19]

Kaushik Rajan, Dharmesh Kakadia, Carlo Curino, and Subru Krishnan. 2016. PerfOrator: Eloquent Performance Models for Resource Optimization. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, New York, NY, USA, 415--427.

Digital Library

[20]

Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. of the 3nd ACM Symposium on Cloud Computing (SOCC '12).

Digital Library

[21]

Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing. ACM, 7.

Digital Library

[22]

Jennifer M. Schopf and Francine Berman. 1999. Stochastic scheduling. In SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM.

Digital Library

[23]

Bikash Sharma, Victor Chudnovsky, Joseph L. Hellerstein, Rasekh Rifaat, and Chita R. Das. 2011. Modeling and synthesizing task placement constraints in Google compute clusters. In Proc. of the 2nd ACM Symposium on Cloud Computing (SOCC '11). ACM, Article 3, 14 pages.

Digital Library

[24]

Warren Smith, Ian T. Foster, and Valerie E. Taylor. 1998. Predicting Application Run Times Using Historical Information. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing. IEEE.

Digital Library

[25]

Roshan Sumbaly, Jay Kreps, and Sam Shah. 2013. The Big Data Ecosystem at LinkedIn. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD).

Digital Library

[26]

Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson. 2007. Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. In IEEE Transactions on Parallel and Distributed Systems. IEEE.

Digital Library

[27]

Alexey Tumanov, Angela Jiang, Jun Woo Park, Michael A. Kozuch, and Gregory R. Ganger. 2016. JamaisVu: Robust Scheduling with Auto-Estimated Job Runtimes. Technical Report CMU-PDL-16-104. Carnegie Mellon University.

[28]

Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. 2016. TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 35, 16 pages.

Digital Library

[29]

Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. of the 4th ACM Symposium on Cloud Computing (SOCC '13).

Digital Library

[30]

S. Verboven, P. Hellinckx, F. Arickx, and J. Broeckhove. 2008. Runtime Prediction Based Grid Scheduling of Parameter Sweep Jobs. In Asia-Pacific Services Computing Conference, 2008. APSCC '08. IEEE. 33--38.

Digital Library

[31]

Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. 2011. ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC '11). ACM, New York, NY, USA, 235--244.

Digital Library

[32]

A. Verma, M. Korupolu, and J. Wilkes. 2014. Evaluating job packing in warehouse-scale computing. In 2014 IEEE International Conference on Cluster Computing (CLUSTER). 48--56.

[33]

Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). ACM, New York, NY, USA, Article 18, 17 pages.

Digital Library

[34]

Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An Integrated Experimental Environment for Distributed Systems and Networks. In Proc. of the Fifth Symposium on Operating Systems Design and Implementation. USENIX Association, Boston, MA, 255--270.

Digital Library

[35]

Timothy Wood, Prashant J Shenoy, Arun Venkataramani, and Mazin S Yousif. 2007. Black-box and Gray-box Strategies for Virtual Machine Migration. In NSDI, Vol. 7. 17--17.

Digital Library

[36]

Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems (Eurosys). ACM, 265--278.

Digital Library

Cited By

Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Fang SZhang GLi YLi J(2024)Windowed Hamming Distance-Based Intrusion Detection for the CAN BusApplied Sciences10.3390/app1407280514:7(2805)Online publication date: 27-Mar-2024
https://doi.org/10.3390/app14072805
Vijayakumar SMadhavapeddy AKalyvianaki E(2024)Scheduling for Reduced Tail Task Latencies in Highly Utilized DatacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698522(302-321)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698522
Show More Cited By

Index Terms

3Sigma: distribution-based cluster scheduling for runtime uncertainty
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning under uncertainty
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
    2. Software system structures
      1. Distributed systems organizing principles
        Cloud computing

Recommendations

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

In this paper, we study a new primary-secondary bicriteria scheduling problem on identical machines. The primary objective is to minimize the total completion time of all jobs and the secondary objective is to minimize the maximum T-time of all machines,...
Minimizing Total Completion Time Subject to Job Release Dates and Preemption Penalties

Extensive research has been devoted to preemptive scheduling. However, little attention has been paid to problems where a certain time penalty must be incurred if preemption is allowed. In this paper, we consider the single-machine scheduling problem of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '18: Proceedings of the Thirteenth EuroSys Conference

April 2018

631 pages

ISBN:9781450355841

DOI:10.1145/3190508

General Chair:
Rui Oliveira,
Program Chairs:
Pascal Felber,
Y. Charlie Hu

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Scotiabank
Amazon Web Services
Ericsson
Google
NSF CISE Expeditions
VMware
DHS
GE
Ant Financial
Microsoft
Huawei
IBM
Alibaba
Splunk
Samsung Scholarship
CapitalOne
Intel

Conference

EuroSys '18

Sponsor:

SIGOPS

EuroSys '18: Thirteenth EuroSys Conference 2018

April 23 - 26, 2018

Porto, Portugal

Acceptance Rates

EuroSys '18 Paper Acceptance Rate 43 of 262 submissions, 16%;

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

52
Total Citations
View Citations
885
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Faisal AMartin NBashir HLamelas SDogar FGavrilovska ATerry D(2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691964
Fang SZhang GLi YLi J(2024)Windowed Hamming Distance-Based Intrusion Detection for the CAN BusApplied Sciences10.3390/app1407280514:7(2805)Online publication date: 27-Mar-2024
https://doi.org/10.3390/app14072805
Vijayakumar SMadhavapeddy AKalyvianaki E(2024)Scheduling for Reduced Tail Task Latencies in Highly Utilized DatacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698522(302-321)Online publication date: 20-Nov-2024
https://dl.acm.org/doi/10.1145/3698038.3698522
Nestorov AMarrón DGutierrez-Torre AWang CMisale CYoussef ACarrera DBerral JSchiavoni VEdinger JCao JJin Z(2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3652892.3700753
Wang NYin XMao PLi YXu QLv S(2024)Dynamic Contribution-Matrix ResNet-Based Retrieval Algorithm for Ocean Surface High Wind Speed From Spaceborne Microwave RadiometerIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.346840662(1-14)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3468406
Gao WYe ZSun PZhang TWen Y(2024)UniSched: A Unified Scheduler for Deep Learning Training Jobs With Different User DemandsIEEE Transactions on Computers10.1109/TC.2024.337179473:6(1500-1515)Online publication date: Jun-2024
https://doi.org/10.1109/TC.2024.3371794
Peng HZhan YLi PXia Y(2024)Tangram: High-Resolution Video Analytics on Serverless Platform with SLO-Aware Batching2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00066(645-655)Online publication date: 23-Jul-2024
https://doi.org/10.1109/ICDCS60910.2024.00066
Zhang YZhou ZElnikety SDelimitrou C(2024)Ursa: Lightweight Resource Management for Cloud-Native Microservices2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00077(954-969)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00077
Wen SHan RLiu CChen L(2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
https://doi.org/10.1186/s13677-023-00465-z
Ye ZGao WHu QSun PWang XLuo YZhang TWen Y(2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
https://doi.org/10.1145/3638757
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten