skip to main content
10.1145/3409390.3409410acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Communication-aware Job Scheduling using SLURM

Published:17 August 2020Publication History

ABSTRACT

Job schedulers play an important role in selecting optimal resources for the submitted jobs. However, most of the current job schedulers do not consider job-specific characteristics such as communication patterns during resource allocation. This often leads to sub-optimal node allocations. We propose three node allocation algorithms that consider the job’s communication behavior to improve the performance of communication-intensive jobs. We develop our algorithms for tree-based network topologies. The proposed algorithms aim at minimizing network contention by allocating nodes on the least contended switches. We also show that allocating nodes in powers of two leads to a decrease in inter-switch communication for MPI communications, which further improves performance. We implement and evaluate our algorithms using SLURM, a widely-used and well-known job scheduler. We show that the proposed algorithms can reduce the execution times of communication-intensive jobs by 9% (326 hours) on average. The average wait time of jobs is reduced by 31% across three supercomputer job logs.

References

  1. 2005. Parallel Workload Archive. www.cse.huji.ac.il/labs/parallel/workload/Google ScholarGoogle Scholar
  2. 2019. ALCF, ANL. https://reports.alcf.anl.gov/data/index.htmlGoogle ScholarGoogle Scholar
  3. 2020. Mira and Theta. https://www.alcf.anl.gov/alcf-resourcesGoogle ScholarGoogle Scholar
  4. 2020. MPICH. https://www.mpich.org.Google ScholarGoogle Scholar
  5. 2020. MPICH Source Code. https://github.com/pmodels/mpich.Google ScholarGoogle Scholar
  6. Laksono Adhianto and et al.2010. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6(2010), 685–701.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Brandfass, T. Alrutz, and T. Gerhold. 2013. Rank reordering for MPI communication optimization. Computers & Fluids 80(2013), 372 – 380.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sudheer Chunduri and et al.2018. Characterization of MPI Usage on a Production Supercomputer. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis(SC ’18).Google ScholarGoogle Scholar
  10. W. Cirne and F. Berman. 2001. A comprehensive model of the supercomputer workload. In Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538). 140–148.Google ScholarGoogle Scholar
  11. Dror G. Feitelson, Dan Tsafrir, and David Krakov. 2014. Experience with using the Parallel Workloads Archive. J. Parallel and Distrib. Comput. 74, 10 (2014).Google ScholarGoogle ScholarCross RefCross Ref
  12. Yiannis Georgiou, Emmanuel Jeannot, Guillaume Mercier, and Adèle Villiermet. 2018. Topology-Aware Job Mapping. Int. J. High Perform. Comput. Appl. 32, 1 (2018), 14–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robert L. Henderson. 1995. Job Scheduling Under the Portable Batch System. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing(IPPS ’95). Springer-Verlag, Berlin, Heidelberg, 279–294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Emmanuel Jeannot and Guillaume Mercier. 2010. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In Euro-Par 2010 - Parallel Processing. 199–210.Google ScholarGoogle Scholar
  15. Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet Neighborhoods: Key to Protect Job Performance Predictability. In Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium. 449–459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Benjamin Klenk and Holger Fröning. 2017. An Overview of MPI Characteristics of Exascale Proxy Applications. In High Performance Computing. Springer International Publishing, 217–236.Google ScholarGoogle Scholar
  17. C. E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput.10(1985), 892–901.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hui Li, David Groep, and Lex Wolters. 2004. Workload Characteristics of a Multi-Cluster Supercomputer. In Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing (New York, NY) (JSSPP’04). Springer-Verlag, Berlin, Heidelberg, 176–193. https://doi.org/10.1007/11407522_10Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jose A Morinigo and et al.2020. Performance drop at executing communication-intensive parallel algorithms. Journal of Supercomputing(2020).Google ScholarGoogle Scholar
  20. Samuel D. Pollard, Nikhil Jain, Stephen Herbein, and Abhinav Bhatele. 2018. Evaluation of an Interference-Free Node Allocation Policy on Fat-Tree Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis(SC ’18). IEEE Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gilad Shainer, Tong Liu, Pak Lui, and Richard Graham. 2011. Accelerating High Performance Computing Applications Through MPI Offloading. Technical Report. HPC Advisory Council.Google ScholarGoogle Scholar
  22. Sameer S Shende and Allen D Malony. 2006. The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20, 2 (2006), 287–311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Seren Soner and Can Özturan. 2014. Topologically Aware Job Scheduling for SLURM. Technical Report. Partnership for Advanced Computing in Europe.Google ScholarGoogle Scholar
  24. Hari Subramoni, Devendar Bureddy, Krishna Chaitanya Kandalla, Karl W. Schulz, Bill Barth, Jonathan L. Perkins, Mark Daniel Arnold, and Dhabaleswar K. Panda. 2013. Design of network topology aware scheduling services for large InfiniBand clusters. IEEE International Conference on Cluster Computing (CLUSTER) (2013).Google ScholarGoogle ScholarCross RefCross Ref
  25. Rajeev Thakur and William D. Gropp. 2003. Improving the Performance of Collective Operations in MPICH. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Jack Dongarra, Domenico Laforenza, and Salvatore Orlando (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 257–267.Google ScholarGoogle Scholar
  26. Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. The International Journal of High Performance Computing Applications 19, 1(2005), 49–66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing. 44–60.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP Workshops '20: Workshop Proceedings of the 49th International Conference on Parallel Processing
    August 2020
    186 pages
    ISBN:9781450388689
    DOI:10.1145/3409390

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 August 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format