ABSTRACT
In today's datacenters, job heterogeneity makes it difficult for schedulers to simultaneously meet latency requirements and maintain high resource utilization. The state-of-the-art datacenter schedulers, including centralized, distributed, and hybrid schedulers, fail to ensure low latency for short jobs in large-scale and highly loaded systems. The key issues are the scalability in centralized schedulers, ineffective and inefficient probing and resource sharing in both distributed and hybrid schedulers.
In this paper, we propose Pigeon, a distributed, hierarchical job scheduler based on a two-layer design. Pigeon divides workers into groups, each managed by a separate master. In Pigeon, upon a job arrival, a distributed scheduler directly distribute tasks evenly among masters with minimum job processing overhead, hence, preserving highest possible scalability. Meanwhile, each master manages and distributes all the received tasks centrally, oblivious of the job context, allowing for full sharing of the worker pool at the group level to maximize multiplexing gain. To minimize the chance of head-of-line blocking for short jobs and avoid starvation for long jobs, two weighted fair queues are employed in each master to accommodate tasks from short and long jobs, separately, and a small portion of the workers are reserved for short jobs. Evaluation via theoretical analysis, trace-driven simulations, and a prototype implementation shows that Pigeon significantly outperforms Sparrow, a representative distributed scheduler, and Eagle, a hybrid scheduler.
- Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In Proceedings of OSDI.Google Scholar
- Jake Brutlag. 2009. Speed matters for google web search. In Google.Google Scholar
- Wei Chen, Jia Rao, and Xiaobo Zhou. 2017. Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization. In Proceedings of USENIX Annual Technical Conference.Google Scholar
- Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. In Proceedings of VLDB Endowment.Google ScholarDigital Library
- Yanpei Chen, Archana Ganapathi, Rean Griffith, and Randy Katz. 2011. The Case for Evaluating MapReduce Performance Using Workload Suites. In Proceedings of MASCOTS.Google ScholarDigital Library
- Robert B. Cooper. 1981. Introduction to Queueing Theory. North Holland.Google Scholar
- Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya, Roni Burd, Sarvesh Sakalanaga, Chris Douglas, Bill Ramsey, and Raghu Ramakrishnan. 2019. Hydra: a federated resource manager for data-center scale analytics. In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56, 2 (2013).Google Scholar
- Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2016. Job-aware scheduling in eagle: divide and stick to your probes. In Proceedings of ACM Symposium on Clod Computing (SOCC).Google ScholarDigital Library
- Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2018. Kairos: Preemptive Data Center Scheduling Without Runtime Estimates. In Proceedings of ACM Symposium on Clod Computing (SOCC).Google ScholarDigital Library
- Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid Datacenter Scheduling. In Proceedings of USENIX Annual Technical Conference (ATC).Google Scholar
- Andrew D. Fergusin, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: Guaranteed job latency in data parallel clusters. In Proceedings of EuroSys.Google ScholarDigital Library
- Apache Software Foundation. 2018. Hadoop: YARN Federation. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/Federation.htmlGoogle Scholar
- Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. 2016. Firmanent: Fast, Centralized Cluster Scheduling at Scale. In Proceedings of USENIX Symposium on Iperating System Design (OSDI).Google Scholar
- Benjamin Hindman, Andy Konwinski, Mati Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proceedings of NSDI.Google Scholar
- Chien-Chun Hung, Leana Golubchik, and Minlan Yu. 2011. Scheduling Jobs Across Geo-distributed Datacenters. In Proceedings of SoCC.Google Scholar
- Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2012. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of SOSP.Google Scholar
- Myeongjae Jeon, Saehoon Kim, Seung won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2014. Predictive Parallelization: Taming Tail Latencies in Web Search. In Proceedings of the ACM SIGIR.Google ScholarDigital Library
- Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- Konstantinos Karanasos, Sriram Rao, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga. 2015. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In Proceedings of USENIX Annual Technical Conference (ATC).Google Scholar
- Mansour Khelghatdoust and Vincent Gramolim. 2018. Peacock: Probe-Based Scheduling of Jobs by Rotating Between Elastic Queuess. In Proceedings of International Conference on Parallel and Distributed Computing.Google ScholarCross Ref
- Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, Low Latency Scheduling. In Proceedings of ACM Symposium on Operating System (SODP).Google ScholarDigital Library
- Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, and Sriram Rao. 2016. Efficient Queue Management for Cluster Scheduling. In Proceedings EroSys.Google ScholarDigital Library
- Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of ACM Symposium on Cloud Computing (SOCC).Google Scholar
- Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of EuroSys.Google ScholarDigital Library
- Ross Sheldon. 2014. Introduction to Probability Models. Academic Press.Google Scholar
- Ryan Scott Stutsman. 1987. Durabilit and Crash Recovery in Distributed In-Memory Storage Systems. In Dissertation of Doctor Philosophy.Google Scholar
- Kun Suo, Jia Rao, Hong Jiang, and Witawas Srisa-an. 2018. Characterizing and Optimizing Hotspot Parallel Garbage Collection on Multicore Systems. In Proceedings of ACM European Conference on Computer systems (EuroSys).Google ScholarDigital Library
- Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. 2015. C3: cutting tail latency in cloud data stores via adaptive replica selection. In Proceeding of USENIX NSDI.Google Scholar
- Apache Thrift. 2017. Apache Thrift. https://thrift.apache.org/Google Scholar
- Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. 2016. Tetrisched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of EuroSys.Google Scholar
- Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proceedings of ACM Symposium on Cloud Computing (SOCC).Google ScholarDigital Library
- Yiqian Xia, Rui Ren, Hongming Cai, Athanasios V. Vasilakos, and Zheng Lv. 2018. Daphne: A Flexible and Hybrid Scheduling Framework in Multi-Tenant Clusters. IEEE Transactions on Network and Service Management 15, 1 (2018).Google ScholarCross Ref
- Matei Zaharia, Dhruba Borthakur, Joydeep Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Proceedings of EuroSys.Google ScholarDigital Library
Index Terms
- Pigeon: an Effective Distributed, Hierarchical Datacenter Job Scheduler
Recommendations
Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines
The deadline of a request is the time instant at which its execution must complete. The deadline of the request in any period of a job with deferred deadline is some time instant after the end of the period. The authors describe a semi-static priority-...
Improving Short Job Latency Performance in Hybrid Job Schedulers with Dice
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingIt is common to find a mixture of both long batch jobs and latency-sensitive short jobs in enterprise data centers. Recently hybrid job schedulers emerge as attractive alternatives of conventional centralized job schedulers.
In this paper, we conduct ...
Toward balanced and sustainable job scheduling for production supercomputers
Job scheduling on production supercomputers is complicated by diverse demands of system administrators and amorphous characteristics of workloads. Specifically, various scheduling goals such as queuing efficiency and system utilization are usually ...
Comments