research-article

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

Authors:
Benjamin Berg

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Rein Vesilo

Macquarie University, Macquarie Park, Australia

Macquarie University, Macquarie Park, Australia
View Profile

,
Mor Harchol-Balter

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

ACM SIGMETRICS Performance Evaluation Review Volume 48 Issue 3December 2020pp 35–36https://doi.org/10.1145/3453953.3453960

Published:05 March 2021Publication History

ACM SIGMETRICS Performance Evaluation Review

Abstract

Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across multiple servers it completes more quickly. However, it is unclear how to share a limited number of servers between many parallelizable jobs.

In this paper we consider a typical scenario where a data center composed of N servers will be tasked with completing a set of M parallelizable jobs. Typically, M is much smaller than N. In our scenario, each job consists of some amount of inherent work which we refer to as a job's size. We assume that job sizes are known up front to the system, and each job can utilize any number of servers at any moment in time. These assumptions are reasonable for many parallelizable workloads such as training neural networks using TensorFlow [2]. Our goal in this paper is to allocate servers to jobs so as to minimize the mean slowdown across all jobs, where the slowdown of a job is the job's completion time divided by its running time if given exclusive access to all N servers. Slowdown measures how a job was interfered with by other jobs in the system, and is often the metric of interest in the theoretical parallel scheduling literature (where it is also called stretch), as well as the HPC community (where it is called expansion factor).

References

B. Berg, J.P. Dorsman, and M. Harchol-Balter. Towards optimality in parallel scheduling. ACM POMACS, 1(2), 2018.Google Scholar
S. Lin, M. Paolieri, C. Chou, and L. Golubchik. A model-based approach to streamlining distributed training for asynchronous SGD. In MASCOTS. IEEE, 2018.Google ScholarCross Ref
Donald R Smith. A new proof of the optimality of the shortest remaining processing time discipline. Operations Research, 26(1):197--199, 1978.Google ScholarDigital Library
Adam Wierman, Mor Harchol-Balter, and Takayuki Osogami. Nearly insensitive bounds on SMART scheduling. SIGMETRICS, 33(1):205--216, 2005.Google ScholarDigital Library

Index Terms

heSRPT: Parallel Scheduling to Minimize Mean Slowdown
1. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
    2. Online algorithms
      1. Online learning algorithms
        Scheduling algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Index terms have been assigned to the content through auto-classification.

Recommendations

heSRPT: Optimal Scheduling of Parallel Jobs with Known Sizes

Nearly all modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. ...
Read More
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Read More
Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

In this paper, we study a new primary-secondary bicriteria scheduling problem on identical machines. The primary objective is to minimize the total completion time of all jobs and the secondary objective is to minimize the maximum T-time of all machines,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMETRICS Performance Evaluation Review Volume 48, Issue 3
December 2020
140 pages
ISSN:0163-5999
DOI:10.1145/3453953
Editor:
Zhenhua Liu
Stony Brook University
Issue’s Table of Contents
Copyright © 2021 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 March 2021
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

heSRPT: Optimal Scheduling of Parallel Jobs with Known Sizes

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

heSRPT: Parallel Scheduling to Minimize Mean Slowdown

ACM SIGMETRICS Performance Evaluation Review

Abstract

References

Cited By

Index Terms

Recommendations

heSRPT: Optimal Scheduling of Parallel Jobs with Known Sizes

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Primary-secondary bicriteria scheduling on identical machines to minimize the total completion time of all jobs and the maximum T-time of all machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media