research-article

Improving Preemptive Scheduling with Application-Transparent Checkpointing in Shared Clusters

Authors:

Dejan MilojicicAuthors Info & Claims

Middleware '15: Proceedings of the 16th Annual Middleware Conference

Pages 222 - 234

https://doi.org/10.1145/2814576.2814807

Published: 24 November 2015 Publication History

Abstract

Modern data center clusters are shifting from dedicated single framework clusters to shared clusters. In such shared environments, cluster schedulers typically utilize preemption by simply killing jobs in order to achieve resource priority and fairness during peak utilization. This can cause significant resource waste and delay job response time.

In this paper, we propose using suspend-resume mechanisms to mitigate the overhead of preemption in cluster scheduling. Instead of killing preempted jobs or tasks, our approach uses a system level, application-transparent checkpointing mechanism to save the progress of jobs for resumption at a later time when resources are available. To reduce the preemption overhead and improve job response times, our approach uses adaptive preemption to dynamically select appropriate preemption mechanisms (e.g., kill vs. suspend, local vs. remote restore) according to the progress of a task and its suspend-resume overhead. By leveraging fast storage technologies, such as non-volatile memory (NVM), our approach can further reduce the preemption penalty to provide better QoS and resource efficiency. We implement the proposed approach and conduct extensive experiments via Google cluster trace-driven simulations and applications on a Hadoop cluster. The results demonstrate that our approach can significantly reduce the resource and power usage and improve application performance over existing approaches. In particular, our implementation on the next generation Hadoop YARN platform achieves up to a 67% reduction in resource wastage, 30% improvement in overall job response time times and 34% reduction in energy consumption over the current YARN scheduler.

References

[1]

G. Ananthanarayanan, C. Douglas, R. Ramakrishnan, S. Rao, and I. Stoica. True elasticity in multi-tenant data-intensive compute clusters. SoCC '12, pages 24:1--24:7, New York, NY, USA, 2012. ACM.

Digital Library

[2]

D. Auble and J. Morris. Simple linux utility for resource management, http://bit.ly/1FpdnQ1. 2013.

[3]

E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. OSDI'14, pages 285--300, Berkeley, CA, USA, 2014. USENIX Association.

Digital Library

[4]

D. Çavdar, A. Rosà, L. Y. Chen, W. Binder, and F. Alagöz. Quantifying the brown side of priority schedulers: Lessons from big clusters. SIGMETRICS Perform. Eval. Rev., 42(3):76--81, Dec. 2014.

Digital Library

[5]

L. Cheng, Q. Zhang, and R. Boutaba. Mitigating the negative impact of preemption on heterogeneous mapreduce workloads. CNSM '11, pages 189--197, Laxenburg, Austria, Austria, 2011. International Federation for Information Processing.

Digital Library

[6]

B. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts, and P. Lin. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters. SOCC '13, pages 6:1--6:17, New York, NY, USA, 2013. ACM.

Digital Library

[7]

J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. Better i/o through byte-addressable, persistent memory. SOSP '09, New York, NY, USA, 2009. ACM.

Digital Library

[8]

CRIU. Checkpoint/restore in userspace, http://criu.org. 2014.

[9]

R. R. Curtin, J. R. Cline, N. P. Slagle, W. B. March, P. Ram, N. A. Mehta, and A. G. Gray. MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research, 14:801--805, 2013.

Digital Library

[10]

S. Di, D. Kondo, and C. Franck. Characterizing cloud applications on a Google data center. ICPP'13, Lyon, France, Oct. 2013.

Digital Library

[11]

X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems. ACM Trans. Archit. Code Optim., 8(2):6:1--6:29, June 2011.

Digital Library

[12]

S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System software for persistent memory. EuroSys '14, pages 15:1--15:15, New York, NY, USA, 2014. ACM.

Digital Library

[13]

M. Harchol-Balter and A. B. Downey. Exploiting process lifetime distributions for dynamic load balancing. ACM Trans. Comput. Syst., 15(3):253--285, Aug. 1997.

Digital Library

[14]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. NSDI'11, pages 295--308, Berkeley, CA, USA, 2011. USENIX Association.

Digital Library

[15]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. EuroSys '07, pages 59--72, New York, NY, USA, 2007. ACM.

Digital Library

[16]

S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using nvm as virtual memory. IPDPS'13, pages 29--40, May 2013.

Digital Library

[17]

M. H. Lankhorst, B. W. Ketelaars, and R. Wolters. Low-cost and nanoscale non-volatile memory concept for future silicon chips. Nature materials, 4(4):347--352, 2005.

[18]

A. K. Mishra, J. L. Hellerstein, W. Cirne, and C. R. Das. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev., 37(4):34--41, Mar. 2010.

Digital Library

[19]

J.-A. Quiane-Ruiz, C. Pinkel, J. Schad, and J. Dittrich. Rafting mapreduce: Fast recovery on the raft. ICDE'11, pages 589--600, April 2011.

Digital Library

[20]

C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. SoCC '12, NYC, NY, USA, 2012. ACM.

Digital Library

[21]

M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. EuroSys'13, pages 351--364, Prague, Czech Republic, 2013.

Digital Library

[22]

P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. Spotcheck: Designing a derivative iaas cloud on the spot market. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 16:1--16:15, New York, NY, USA, 2015. ACM.

Digital Library

[23]

R. Singh, D. Irwin, P. Shenoy, and K. Ramakrishnan. Yank: Enabling green data centers to pull the plug. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 143--155, Lombard, IL, 2013. USENIX.

Digital Library

[24]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 5:1--5:16, New York, NY, USA, 2013. ACM.

Digital Library

[25]

J. Wilkes. More Google cluster data. Google research blog, http://bit.ly/1A38mfR. Nov 2011.

[26]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud'10, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

Cited By

Stoyanov RReber AUeno DClapiński MVagin ABruno R(2024)Towards Efficient End-to-End Encryption for Container Checkpointing SystemsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680477(60-66)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3678015.3680477
Mukwevho MCelik T(2021)Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud SystemsIEEE Transactions on Services Computing10.1109/TSC.2018.281664414:2(589-605)Online publication date: 1-Mar-2021
https://doi.org/10.1109/TSC.2018.2816644
Chen WZhou XRao J(2020)Preemptive and Low Latency Datacenter Scheduling via Lightweight ContainersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.295775431:12(2749-2762)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TPDS.2019.2957754
Show More Cited By

Improving Preemptive Scheduling with Application-Transparent Checkpointing in Shared Clusters
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Preemptive scheduling with simple linear deterioration on a single machine

In this paper we study the problem of scheduling n deteriorating jobs with release dates on a single machine. The processing time of a job is assumed to be the product of its deteriorating rate and its starting time. Precedence relations may be imposed ...
Communication-driven scheduling for virtual clusters in cloud
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

Due to high flexibility and cost-effectiveness, cloud computing is increasingly being explored as an alternative to local clusters by academic and commercial users. Recent research already confirmed the feasibility of running tightly-coupled parallel ...
Preemptive Weighted Completion Time Scheduling of Parallel Jobs

We present a new algorithm for the preemptive offline scheduling of independent jobs on a system consisting of m identical machines. The jobs can be parallel; that is, they may need the concurrent availability of several machines for their execution. To ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware '15: Proceedings of the 16th Annual Middleware Conference

November 2015

295 pages

ISBN:9781450336185

DOI:10.1145/2814576

General Chairs:
Rodger Lea
The University of British Columbia, Canada
,
Sathish Gopalakrishnan
The University of British Columbia, Canada
,
Program Chairs:
Eli Tilevich
Virginia Tech, USA
,
Amy L. Murphy
Bruno Kessler Foundation, Italy
,
Publications Chair:
Michael Blackstock
The University of British Columbia, Canada

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery
USENIX Assoc: USENIX Assoc
IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

Middleware '15

Sponsor:

ACM
USENIX Assoc
IFIP

Middleware '15: 16th International Middleware Conference

December 7 - 11, 2015

BC, Vancouver, Canada

Acceptance Rates

Middleware '15 Paper Acceptance Rate 23 of 118 submissions, 19%;

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
344
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Stoyanov RReber AUeno DClapiński MVagin ABruno R(2024)Towards Efficient End-to-End Encryption for Container Checkpointing SystemsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680477(60-66)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3678015.3680477
Mukwevho MCelik T(2021)Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud SystemsIEEE Transactions on Services Computing10.1109/TSC.2018.281664414:2(589-605)Online publication date: 1-Mar-2021
https://doi.org/10.1109/TSC.2018.2816644
Chen WZhou XRao J(2020)Preemptive and Low Latency Datacenter Scheduling via Lightweight ContainersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.295775431:12(2749-2762)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TPDS.2019.2957754
Venkatesh RSmejkal TMilojicic DGavrilovska A(2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
https://dl.acm.org/doi/10.1145/3357526.3357542
Zhou WWhite KYu H(2019)Improving Short Job Latency Performance in Hybrid Job Schedulers with DiceProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337851(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337851
Kukreti SMueller F(2018)CloneHadoop: Process Cloning to Reduce Hadoop's Long Tail2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT)10.1109/BDCAT.2018.00011(11-20)Online publication date: Dec-2018
https://doi.org/10.1109/BDCAT.2018.00011
Shu TWu C(2017)Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows2017 IEEE 13th International Conference on e-Science (e-Science)10.1109/eScience.2017.18(393-402)Online publication date: Oct-2017
https://doi.org/10.1109/eScience.2017.18
Shu TWu C(2017)Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systemsFuture Generation Computer Systems10.1016/j.future.2017.07.050Online publication date: Aug-2017
https://doi.org/10.1016/j.future.2017.07.050
Bruno RFerreira P(2016)ALMAProceedings of the 17th International Middleware Conference10.1145/2988336.2988341(1-14)Online publication date: 28-Nov-2016
https://dl.acm.org/doi/10.1145/2988336.2988341
Shao YZhu XBao WZhou WXiao W(2016)CHIME: A Checkpoint-Based Approach to Improving the Performance of Shared Clusters2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0134(1007-1014)Online publication date: Dec-2016
https://doi.org/10.1109/ICPADS.2016.0134
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten