research-article

Communication-aware Job Scheduling using SLURM

Authors:
Priya Mishra

Indian Institute of Technology Kanpur, India

Indian Institute of Technology Kanpur, India
View Profile

,
Tushar Agrawal

Indian Institute of Technology Kanpur, India

Indian Institute of Technology Kanpur, India
View Profile

,
Preeti Malakar

Indian Institute of Technology Kanpur, India

Indian Institute of Technology Kanpur, India
View Profile

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel ProcessingAugust 2020Article No.: 20Pages 1–10https://doi.org/10.1145/3409390.3409410

Published:17 August 2020Publication History

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing

Pages 1–10

ABSTRACT

Job schedulers play an important role in selecting optimal resources for the submitted jobs. However, most of the current job schedulers do not consider job-specific characteristics such as communication patterns during resource allocation. This often leads to sub-optimal node allocations. We propose three node allocation algorithms that consider the job’s communication behavior to improve the performance of communication-intensive jobs. We develop our algorithms for tree-based network topologies. The proposed algorithms aim at minimizing network contention by allocating nodes on the least contended switches. We also show that allocating nodes in powers of two leads to a decrease in inter-switch communication for MPI communications, which further improves performance. We implement and evaluate our algorithms using SLURM, a widely-used and well-known job scheduler. We show that the proposed algorithms can reduce the execution times of communication-intensive jobs by 9% (326 hours) on average. The average wait time of jobs is reduced by 31% across three supercomputer job logs.

References

2005. Parallel Workload Archive. www.cse.huji.ac.il/labs/parallel/workload/Google Scholar
2019. ALCF, ANL. https://reports.alcf.anl.gov/data/index.htmlGoogle Scholar
2020. Mira and Theta. https://www.alcf.anl.gov/alcf-resourcesGoogle Scholar
2020. MPICH. https://www.mpich.org.Google Scholar
2020. MPICH Source Code. https://github.com/pmodels/mpich.Google Scholar
Laksono Adhianto and et al.2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6(2010), 685–701.Google ScholarCross Ref
T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium.Google ScholarCross Ref
B. Brandfass, T. Alrutz, and T. Gerhold. 2013. Rank reordering for MPI communication optimization. Computers & Fluids 80(2013), 372 – 380.Google ScholarCross Ref
Sudheer Chunduri and et al.2018. Characterization of MPI Usage on a Production Supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis(SC ’18).Google Scholar
W. Cirne and F. Berman. 2001. A comprehensive model of the supercomputer workload. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 140–148.Google Scholar
Dror G. Feitelson, Dan Tsafrir, and David Krakov. 2014. Experience with using the Parallel Workloads Archive. J. Parallel and Distrib. Comput. 74, 10 (2014).Google ScholarCross Ref
Yiannis Georgiou, Emmanuel Jeannot, Guillaume Mercier, and Adèle Villiermet. 2018. Topology-Aware Job Mapping. Int. J. High Perform. Comput. Appl. 32, 1 (2018), 14–27.Google ScholarDigital Library
Robert L. Henderson. 1995. Job Scheduling Under the Portable Batch System. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing(IPPS ’95). Springer-Verlag, Berlin, Heidelberg, 279–294.Google ScholarDigital Library
Emmanuel Jeannot and Guillaume Mercier. 2010. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In Euro-Par 2010 - Parallel Processing. 199–210.Google Scholar
Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet Neighborhoods: Key to Protect Job Performance Predictability. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. 449–459.Google ScholarDigital Library
Benjamin Klenk and Holger Fröning. 2017. An Overview of MPI Characteristics of Exascale Proxy Applications. In High Performance Computing. Springer International Publishing, 217–236.Google Scholar
C. E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput.10(1985), 892–901.Google ScholarCross Ref
Hui Li, David Groep, and Lex Wolters. 2004. Workload Characteristics of a Multi-Cluster Supercomputer. In Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing (New York, NY) (JSSPP’04). Springer-Verlag, Berlin, Heidelberg, 176–193. https://doi.org/10.1007/11407522_10Google ScholarDigital Library
Jose A Morinigo and et al.2020. Performance drop at executing communication-intensive parallel algorithms. Journal of Supercomputing(2020).Google Scholar
Samuel D. Pollard, Nikhil Jain, Stephen Herbein, and Abhinav Bhatele. 2018. Evaluation of an Interference-Free Node Allocation Policy on Fat-Tree Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis(SC ’18). IEEE Press.Google ScholarDigital Library
Gilad Shainer, Tong Liu, Pak Lui, and Richard Graham. 2011. Accelerating High Performance Computing Applications Through MPI Offloading. Technical Report. HPC Advisory Council.Google Scholar
Sameer S Shende and Allen D Malony. 2006. The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20, 2 (2006), 287–311.Google ScholarDigital Library
Seren Soner and Can Özturan. 2014. Topologically Aware Job Scheduling for SLURM. Technical Report. Partnership for Advanced Computing in Europe.Google Scholar
Hari Subramoni, Devendar Bureddy, Krishna Chaitanya Kandalla, Karl W. Schulz, Bill Barth, Jonathan L. Perkins, Mark Daniel Arnold, and Dhabaleswar K. Panda. 2013. Design of network topology aware scheduling services for large InfiniBand clusters. IEEE International Conference on Cluster Computing (CLUSTER) (2013).Google ScholarCross Ref
Rajeev Thakur and William D. Gropp. 2003. Improving the Performance of Collective Operations in MPICH. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Jack Dongarra, Domenico Laforenza, and Salvatore Orlando (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 257–267.Google Scholar
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. The International Journal of High Performance Computing Applications 19, 1(2005), 49–66.Google ScholarDigital Library
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing. 44–60.Google Scholar

Recommendations

Online Flexible Job Scheduling for Minimum Span
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

In this paper, we study an online Flexible Job Scheduling (FJS) problem. The input of the problem is a set of jobs, each having an arrival time, a starting deadline and a processing length. Each job has to be started by the scheduler between its arrival ...
Read More
Job scheduling to minimize the weighted waiting time variance of jobs

This study considers the job scheduling problem of minimizing the weighted waiting time variance (WWTV) of jobs. It is an extension of WTV minimization problems in which we schedule a batch of n jobs, for servicing on a single resource, in such a way ...
Read More
Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines

The deadline of a request is the time instant at which its execution must complete. The deadline of the request in any period of a job with deferred deadline is some time instant after the end of the period. The authors describe a semi-static priority-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing
August 2020
186 pages
ISBN:9781450388689
DOI:10.1145/3409390

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SLURM
communication-aware
job scheduling
job-aware
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 248
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Communication-aware Job Scheduling using SLURM

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

Online Flexible Job Scheduling for Minimum Span

Job scheduling to minimize the weighted waiting time variance of jobs

Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Communication-aware Job Scheduling using SLURM

ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

Online Flexible Job Scheduling for Minimum Span

Job scheduling to minimize the weighted waiting time variance of jobs

Modified Rate-Monotonic Algorithm for Scheduling Periodic Jobs with Deferred Deadlines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media