skip to main content
10.1145/2814576.2814807acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Improving Preemptive Scheduling with Application-Transparent Checkpointing in Shared Clusters

Published: 24 November 2015 Publication History

Abstract

Modern data center clusters are shifting from dedicated single framework clusters to shared clusters. In such shared environments, cluster schedulers typically utilize preemption by simply killing jobs in order to achieve resource priority and fairness during peak utilization. This can cause significant resource waste and delay job response time.
In this paper, we propose using suspend-resume mechanisms to mitigate the overhead of preemption in cluster scheduling. Instead of killing preempted jobs or tasks, our approach uses a system level, application-transparent checkpointing mechanism to save the progress of jobs for resumption at a later time when resources are available. To reduce the preemption overhead and improve job response times, our approach uses adaptive preemption to dynamically select appropriate preemption mechanisms (e.g., kill vs. suspend, local vs. remote restore) according to the progress of a task and its suspend-resume overhead. By leveraging fast storage technologies, such as non-volatile memory (NVM), our approach can further reduce the preemption penalty to provide better QoS and resource efficiency. We implement the proposed approach and conduct extensive experiments via Google cluster trace-driven simulations and applications on a Hadoop cluster. The results demonstrate that our approach can significantly reduce the resource and power usage and improve application performance over existing approaches. In particular, our implementation on the next generation Hadoop YARN platform achieves up to a 67% reduction in resource wastage, 30% improvement in overall job response time times and 34% reduction in energy consumption over the current YARN scheduler.

References

[1]
G. Ananthanarayanan, C. Douglas, R. Ramakrishnan, S. Rao, and I. Stoica. True elasticity in multi-tenant data-intensive compute clusters. SoCC '12, pages 24:1--24:7, New York, NY, USA, 2012. ACM.
[2]
D. Auble and J. Morris. Simple linux utility for resource management, http://bit.ly/1FpdnQ1. 2013.
[3]
E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. OSDI'14, pages 285--300, Berkeley, CA, USA, 2014. USENIX Association.
[4]
D. Çavdar, A. Rosà, L. Y. Chen, W. Binder, and F. Alagöz. Quantifying the brown side of priority schedulers: Lessons from big clusters. SIGMETRICS Perform. Eval. Rev., 42(3):76--81, Dec. 2014.
[5]
L. Cheng, Q. Zhang, and R. Boutaba. Mitigating the negative impact of preemption on heterogeneous mapreduce workloads. CNSM '11, pages 189--197, Laxenburg, Austria, Austria, 2011. International Federation for Information Processing.
[6]
B. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts, and P. Lin. Natjam: Design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters. SOCC '13, pages 6:1--6:17, New York, NY, USA, 2013. ACM.
[7]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee. Better i/o through byte-addressable, persistent memory. SOSP '09, New York, NY, USA, 2009. ACM.
[8]
CRIU. Checkpoint/restore in userspace, http://criu.org. 2014.
[9]
R. R. Curtin, J. R. Cline, N. P. Slagle, W. B. March, P. Ram, N. A. Mehta, and A. G. Gray. MLPACK: A scalable C++ machine learning library. Journal of Machine Learning Research, 14:801--805, 2013.
[10]
S. Di, D. Kondo, and C. Franck. Characterizing cloud applications on a Google data center. ICPP'13, Lyon, France, Oct. 2013.
[11]
X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems. ACM Trans. Archit. Code Optim., 8(2):6:1--6:29, June 2011.
[12]
S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson. System software for persistent memory. EuroSys '14, pages 15:1--15:15, New York, NY, USA, 2014. ACM.
[13]
M. Harchol-Balter and A. B. Downey. Exploiting process lifetime distributions for dynamic load balancing. ACM Trans. Comput. Syst., 15(3):253--285, Aug. 1997.
[14]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. NSDI'11, pages 295--308, Berkeley, CA, USA, 2011. USENIX Association.
[15]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. EuroSys '07, pages 59--72, New York, NY, USA, 2007. ACM.
[16]
S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. Optimizing checkpoints using nvm as virtual memory. IPDPS'13, pages 29--40, May 2013.
[17]
M. H. Lankhorst, B. W. Ketelaars, and R. Wolters. Low-cost and nanoscale non-volatile memory concept for future silicon chips. Nature materials, 4(4):347--352, 2005.
[18]
A. K. Mishra, J. L. Hellerstein, W. Cirne, and C. R. Das. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev., 37(4):34--41, Mar. 2010.
[19]
J.-A. Quiane-Ruiz, C. Pinkel, J. Schad, and J. Dittrich. Rafting mapreduce: Fast recovery on the raft. ICDE'11, pages 589--600, April 2011.
[20]
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. SoCC '12, NYC, NY, USA, 2012. ACM.
[21]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: flexible, scalable schedulers for large compute clusters. EuroSys'13, pages 351--364, Prague, Czech Republic, 2013.
[22]
P. Sharma, S. Lee, T. Guo, D. Irwin, and P. Shenoy. Spotcheck: Designing a derivative iaas cloud on the spot market. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 16:1--16:15, New York, NY, USA, 2015. ACM.
[23]
R. Singh, D. Irwin, P. Shenoy, and K. Ramakrishnan. Yank: Enabling green data centers to pull the plug. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 143--155, Lombard, IL, 2013. USENIX.
[24]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, pages 5:1--5:16, New York, NY, USA, 2013. ACM.
[25]
J. Wilkes. More Google cluster data. Google research blog, http://bit.ly/1A38mfR. Nov 2011.
[26]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. HotCloud'10, Berkeley, CA, USA, 2010. USENIX Association.

Cited By

View all
  • (2024)Towards Efficient End-to-End Encryption for Container Checkpointing SystemsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680477(60-66)Online publication date: 4-Sep-2024
  • (2021)Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud SystemsIEEE Transactions on Services Computing10.1109/TSC.2018.281664414:2(589-605)Online publication date: 1-Mar-2021
  • (2020)Preemptive and Low Latency Datacenter Scheduling via Lightweight ContainersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.295775431:12(2749-2762)Online publication date: 1-Dec-2020
  • Show More Cited By
  1. Improving Preemptive Scheduling with Application-Transparent Checkpointing in Shared Clusters

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Middleware '15: Proceedings of the 16th Annual Middleware Conference
    November 2015
    295 pages
    ISBN:9781450336185
    DOI:10.1145/2814576
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cloud computing
    2. Cluster resource management
    3. Scheduling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Middleware '15
    Sponsor:
    • ACM
    • USENIX Assoc
    • IFIP
    Middleware '15: 16th International Middleware Conference
    December 7 - 11, 2015
    BC, Vancouver, Canada

    Acceptance Rates

    Middleware '15 Paper Acceptance Rate 23 of 118 submissions, 19%;
    Overall Acceptance Rate 203 of 948 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards Efficient End-to-End Encryption for Container Checkpointing SystemsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680477(60-66)Online publication date: 4-Sep-2024
    • (2021)Toward a Smart Cloud: A Review of Fault-Tolerance Methods in Cloud SystemsIEEE Transactions on Services Computing10.1109/TSC.2018.281664414:2(589-605)Online publication date: 1-Mar-2021
    • (2020)Preemptive and Low Latency Datacenter Scheduling via Lightweight ContainersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.295775431:12(2749-2762)Online publication date: 1-Dec-2020
    • (2019)Fast in-memory CRIU for docker containersProceedings of the International Symposium on Memory Systems10.1145/3357526.3357542(53-65)Online publication date: 30-Sep-2019
    • (2019)Improving Short Job Latency Performance in Hybrid Job Schedulers with DiceProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337851(1-10)Online publication date: 5-Aug-2019
    • (2018)CloneHadoop: Process Cloning to Reduce Hadoop's Long Tail2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT)10.1109/BDCAT.2018.00011(11-20)Online publication date: Dec-2018
    • (2017)Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows2017 IEEE 13th International Conference on e-Science (e-Science)10.1109/eScience.2017.18(393-402)Online publication date: Oct-2017
    • (2017)Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systemsFuture Generation Computer Systems10.1016/j.future.2017.07.050Online publication date: Aug-2017
    • (2016)ALMAProceedings of the 17th International Middleware Conference10.1145/2988336.2988341(1-14)Online publication date: 28-Nov-2016
    • (2016)CHIME: A Checkpoint-Based Approach to Improving the Performance of Shared Clusters2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0134(1007-1014)Online publication date: Dec-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media