skip to main content
10.1145/3190508.3190515acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

3Sigma: distribution-based cluster scheduling for runtime uncertainty

Published: 23 April 2018 Publication History

Abstract

The 3Sigma cluster scheduling system uses job runtime histories in a new way. Knowing how long each job will execute enables a scheduler to more effectively pack jobs with diverse time concerns (e.g., deadline vs. the-sooner-the-better) and placement preferences on heterogeneous cluster resources. But, existing schedulers use single-point estimates (e.g., mean or median of a relevant subset of historical runtimes), and we show that they are fragile in the face of real-world estimate error profiles. In particular, analysis of job traces from three different large-scale cluster environments shows that, while the runtimes of many jobs can be predicted well, even state-of-the-art predictors have wide error profiles with 8--23% of predictions off by a factor of two or more. Instead of reducing relevant history to a single point, 3Sigma schedules jobs based on full distributions of relevant runtime histories and explicitly creates plans that mitigate the effects of anticipated runtime uncertainty. Experiments with workloads derived from the same traces show that 3Sigma greatly outperforms a state-of-the-art scheduler that uses point estimates from a state-of-the-art predictor; in fact, the performance of 3Sigma approaches the end-to-end performance of a scheduler based on a hypothetical, perfect runtime predictor. 3Sigma reduces SLO miss rate, increases cluster goodput, and improves or matches latency for best effort jobs.

References

[1]
Yael Ben-Haim and Elad Tom-Tov. 2010. A streaming parallel decision tree algorithm. Journal of Machine Learning Research 11, Feb (2010), 849--872.
[2]
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zheng-ping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 285--300. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/boutin
[3]
Yun Chi, Hakan Hacígümüş, Wang-Pin Hsiung, and Jeffrey F Naughton. 2013. Distribution-based query scheduling. Proceedings of the VLDB Endowment 6, 9 (2013), 673--684.
[4]
Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-based Scheduling: If You're Late Don't Blame Us!. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). ACM, New York, NY, USA, Article 2, 14 pages
[5]
Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid Datacenter Scheduling. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '15). USENIX Association, Berkeley, CA, USA, 499--510. http://dl.acm.org/citation.cfm?id=2813767.2813804
[6]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 127--144.
[7]
Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: guaranteed job latency in data parallel clusters. In Proc. of the 7th ACM european conference on Computer Systems (EuroSys '12). 99--112.
[8]
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2015. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review 44, 4 (2015), 455--466.
[9]
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In OSDI. 65--80.
[10]
Mike Hibler, Robert Ricci, Leigh Stoller, Jonathon Duerig, Shashi Guruprasad, Tim Stack, Kirk Webb, and Jay Lepreau. 2008. Large-scale Virtualization in the Emulab Network Testbed. In USENIX 2008 Annual Technical Conference (ATC'08). USENLX Association, Berkeley, CA, USA, 113--128. http://dl.acm.org/citation.cfm?id=1404014.1404023
[11]
Mohammad Islam, Angelo K Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, and Alejandro Abdelnur. 2012. Oozie: Towards a Scalable Workflow Management System for Hadoop. In SWEET Workshop.
[12]
Virajith Jalaparti, Peter Bodik, Ishai Menache, Sriram Rao, Konstantin Makarychev, and Matthew Caesar. 2015. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 407--420.
[13]
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Íñigo Goiri, Subru Krishnan, Janardhan Kulkarni, et al. 2016. Morpheus: towards automated SLOs for enterprise clusters. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. USENIX Association, 117--134.
[14]
S. Krishnaswamy, S. W. Loke, and A. Zaslavsky. 2004. Estimating computation times of data-intensive applications. IEEE Distributed Systems Online 5, 4 (April 2004).
[15]
Shuo Liu, Gang Quan, and Shangping Ren. 2010. On-line Scheduling of Real-time Services for Cloud Computing. In Services (SERVICES-1), 2010 6th World Congress on. IEEE.
[16]
Kristi Morton, Abram Friesen, Magdalena Balazinska, and Dan Grossman. 2010. Estimating the progress of MapReduce pipelines. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 681--684.
[17]
I. A. Moschakis and H. D. Karatza. 2011. Performance and cost evaluation of Gang Scheduling in a Cloud Computing system with job migrations and starvation handling. In Computers and Communications (ISCC), 2011 IEEE Symposium on. 418--423.
[18]
John K Ousterhout. 1982. Scheduling Techniques for Concurrent Systems. In International Conference on Distributed Computing Systems (ICDCS), Vol. 82. 22--30.
[19]
Kaushik Rajan, Dharmesh Kakadia, Carlo Curino, and Subru Krishnan. 2016. PerfOrator: Eloquent Performance Models for Resource Optimization. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, New York, NY, USA, 415--427.
[20]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proc. of the 3nd ACM Symposium on Cloud Computing (SOCC '12).
[21]
Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing. ACM, 7.
[22]
Jennifer M. Schopf and Francine Berman. 1999. Stochastic scheduling. In SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing. ACM.
[23]
Bikash Sharma, Victor Chudnovsky, Joseph L. Hellerstein, Rasekh Rifaat, and Chita R. Das. 2011. Modeling and synthesizing task placement constraints in Google compute clusters. In Proc. of the 2nd ACM Symposium on Cloud Computing (SOCC '11). ACM, Article 3, 14 pages.
[24]
Warren Smith, Ian T. Foster, and Valerie E. Taylor. 1998. Predicting Application Run Times Using Historical Information. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing. IEEE.
[25]
Roshan Sumbaly, Jay Kreps, and Sam Shah. 2013. The Big Data Ecosystem at LinkedIn. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD).
[26]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson. 2007. Backfilling Using System-Generated Predictions Rather than User Runtime Estimates. In IEEE Transactions on Parallel and Distributed Systems. IEEE.
[27]
Alexey Tumanov, Angela Jiang, Jun Woo Park, Michael A. Kozuch, and Gregory R. Ganger. 2016. JamaisVu: Robust Scheduling with Auto-Estimated Job Runtimes. Technical Report CMU-PDL-16-104. Carnegie Mellon University.
[28]
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. 2016. TetriSched: Global Rescheduling with Adaptive Plan-ahead in Dynamic Heterogeneous Clusters. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16). ACM, New York, NY, USA, Article 35, 16 pages.
[29]
Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. of the 4th ACM Symposium on Cloud Computing (SOCC '13).
[30]
S. Verboven, P. Hellinckx, F. Arickx, and J. Broeckhove. 2008. Runtime Prediction Based Grid Scheduling of Parameter Sweep Jobs. In Asia-Pacific Services Computing Conference, 2008. APSCC '08. IEEE. 33--38.
[31]
Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. 2011. ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC '11). ACM, New York, NY, USA, 235--244.
[32]
A. Verma, M. Korupolu, and J. Wilkes. 2014. Evaluating job packing in warehouse-scale computing. In 2014 IEEE International Conference on Cluster Computing (CLUSTER). 48--56.
[33]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). ACM, New York, NY, USA, Article 18, 17 pages.
[34]
Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. 2002. An Integrated Experimental Environment for Distributed Systems and Networks. In Proc. of the Fifth Symposium on Operating Systems Design and Implementation. USENIX Association, Boston, MA, 255--270.
[35]
Timothy Wood, Prashant J Shenoy, Arun Venkataramani, and Mazin S Yousif. 2007. Black-box and Gray-box Strategies for Virtual Machine Migration. In NSDI, Vol. 7. 17--17.
[36]
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems (Eurosys). ACM, 265--278.

Cited By

View all
  • (2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
  • (2024)Windowed Hamming Distance-Based Intrusion Detection for the CAN BusApplied Sciences10.3390/app1407280514:7(2805)Online publication date: 27-Mar-2024
  • (2024)Scheduling for Reduced Tail Task Latencies in Highly Utilized DatacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698522(302-321)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '18: Proceedings of the Thirteenth EuroSys Conference
April 2018
631 pages
ISBN:9781450355841
DOI:10.1145/3190508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Scotiabank
  • Amazon Web Services
  • Ericsson
  • Google
  • NSF CISE Expeditions
  • VMware
  • DHS
  • GE
  • Ant Financial
  • Microsoft
  • Huawei
  • IBM
  • Alibaba
  • Splunk
  • Samsung Scholarship
  • CapitalOne
  • Intel

Conference

EuroSys '18
Sponsor:
EuroSys '18: Thirteenth EuroSys Conference 2018
April 23 - 26, 2018
Porto, Portugal

Acceptance Rates

EuroSys '18 Paper Acceptance Rate 43 of 262 submissions, 16%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)When will my ML job finish? toward providing completion time estimates through predictability-centric schedulingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691964(487-505)Online publication date: 10-Jul-2024
  • (2024)Windowed Hamming Distance-Based Intrusion Detection for the CAN BusApplied Sciences10.3390/app1407280514:7(2805)Online publication date: 27-Mar-2024
  • (2024)Scheduling for Reduced Tail Task Latencies in Highly Utilized DatacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698522(302-321)Online publication date: 20-Nov-2024
  • (2024)Dexter: A Performance-Cost Efficient Resource Allocation Manager for Serverless Data AnalyticsProceedings of the 25th International Middleware Conference10.1145/3652892.3700753(117-130)Online publication date: 2-Dec-2024
  • (2024)Dynamic Contribution-Matrix ResNet-Based Retrieval Algorithm for Ocean Surface High Wind Speed From Spaceborne Microwave RadiometerIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.346840662(1-14)Online publication date: 2024
  • (2024)UniSched: A Unified Scheduler for Deep Learning Training Jobs With Different User DemandsIEEE Transactions on Computers10.1109/TC.2024.337179473:6(1500-1515)Online publication date: Jun-2024
  • (2024)Tangram: High-Resolution Video Analytics on Serverless Platform with SLO-Aware Batching2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00066(645-655)Online publication date: 23-Jul-2024
  • (2024)Ursa: Lightweight Resource Management for Cloud-Native Microservices2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00077(954-969)Online publication date: 2-Mar-2024
  • (2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
  • (2023)Deep Learning Workload Scheduling in GPU Datacenters: A SurveyACM Computing Surveys10.1145/3638757Online publication date: 27-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media