Partitioned Parallel Job Scheduling for Extreme Scale Computing

Brelsford, David; Chochia, George; Falk, Nathan; Marthi, Kailash; Sure, Ravindra; Bobroff, Norman; Fong, Liana; Seelam, Seetharami

doi:10.1007/978-3-642-35867-8_9

David Brelsford²⁰,
George Chochia²⁰,
Nathan Falk²⁰,
Kailash Marthi²⁰,
Ravindra Sure²⁰,
Norman Bobroff²¹,
Liana Fong²¹ &
…
Seetharami Seelam²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7698))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

Abstract

Recent success in building extreme computing systems poses new challenges in job scheduling design to support cluster sizes that can execute million’s of concurrent tasks. We show that for these extreme scale clusters the resource demand at a centralized scheduler can exceed the capacity or limit the ability of the scheduler to perform well. This paper introduces partitioned scheduling, a hybrid centralized and distributed approach in which compute nodes are assigned to the job centrally, while task to local node resources assignments are performed subsequently at the assigned job nodes. This reduces the memory and processing growth at the central scheduler, and improves the scaling behavior of scheduling time by enabling operations to be done in parallel at the job nodes. When local resource assignments must be distributed to all other job nodes, the partitioned approach trades central processing for increased network communications. Thus, we introduce features that improve communications such as pipelining that leverage the presence of the high speed cluster network. The new system is evaluated for jobs with up to 50K tasks on clusters with 496 nodes and 128 tasks per node. The partitioned scheduling approach is demonstrated to reduce processor and memory usage at the central processor and improve job scheduling and job dispatching times up to an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Extreme-scale parallel computing: bottlenecks and strategies

Article 01 October 2018

Scheduling in Parallel and Distributed Computing Systems

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Article Open access 31 May 2022

References

DARPA High Productivity Computing Systems project, http://www.darpa.mil/IPTO/programs/hpcs/hpcs.asp
External Data Represenation Standard, http://tools.ietf.org/html/rfc1014
IBM Parallel Environment (PE), http://www-03.ibm.com/systems/software/parallel/index.html
IBM Tivoli Workload Scheduler LoadLeveler, http://publib.boulder.ibm.com/-infocenter/clresctr/vxrx/index.jsp
IBM Tivoli Workload Scheduler LoadLeveler Version 4.1, http://www-01.ibm.com/common/ssi/rep_ca/5/897/ENUS210-145/ENUS210-145.PDF
Adiga, N.R., Alm’asi, G., Aridor, Y., et al.: An overview of the BlueGene/L Supercomputer. In: Proceeding of Supercomputing, pp. 1–22 (2002)
Google Scholar
Anderson, J.H., Bud, V., Devi, U.C.: An edf-based scheduling algorithm for multiprocessor soft real-time systems. In: ECRTS (2005)
Google Scholar
Aridor, Y., Domany, T., Goldshmidt, O., Kliteynik, Y., Moreira, J., Shmueli, E.: Open Job Management Architecture for the Blue Gene/L Supercomputer. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 91–107. Springer, Heidelberg (2005)
Chapter Google Scholar
Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Culler, D.E.: Effective distributed scheduling of parallel workloads. In: SIGMETRICS, pp. 25–36 (1996)
Google Scholar
Baker, T.P.: A comparison of global and partitioned edf schedulability tests for multiprocessors. In: Proceeding of International Conf. on Real-Time and Network Systems (2005)
Google Scholar
Balaji, P., Buntinas, D., Goodell, D., Gropp, W., Krishna, J., Lusk, E., Thakur, R.: PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 31–41. Springer, Heidelberg (2010)
Chapter Google Scholar
Bobroff, N., Coppinger, R., Fong, L., Seelam, S., Xu, J.: Scalability Analysis of Job Scheduling Using Virtual Nodes. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2009. LNCS, vol. 5798, pp. 190–206. Springer, Heidelberg (2009)
Chapter Google Scholar
Butler, R., Gropp, W.D., Lusk, E.: A Scalable Process-Management Environment for Parallel Programs. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) EuroPVM/MPI 2000. LNCS, vol. 1908, pp. 168–175. Springer, Heidelberg (2000)
Chapter Google Scholar
Casavant, T.L., Kuhl, J.G.: A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans. Software Eng. 14(2) (1988)
Google Scholar
Casey, L.M.: Decentralised scheduling. Australian Computer Journal 13(2) (1981)
Google Scholar
Chandra, A., Shenoy, P.J.: Hierarchical scheduling for symmetric multiprocessors. IEEE Trans. Parallel Distrib. Syst. 19(3) (2008)
Google Scholar
Demaine, E.D., Foster, I.T., et al.: Generalized communicators in the message passing interface. IEEE Trans. Parallel Distrib. Syst. 12(6) (2001)
Google Scholar
Frachtenberg, E., Feitelson, D.G., et al.: Adaptive parallel job scheduling with flexible coscheduling. IEEE Trans. Parallel & Distributed Syst. 16 (2005)
Google Scholar
Kato, S., Yamasaki, N., Ishikawa, Y.: Semi-partitioned scheduling of sporadic task systems on multiprocessors. In: ECRTS (2009)
Google Scholar
Prenneis, A.: Loadleveler: Workload management for parallel and distributed computing environments. In: Super Computing Europe, SUPEREU (1996)
Google Scholar
Rajamony, R., Arimilli, L.B., Gildea, K.: PERCS: The IBM Power7-IH high-performance computing system. IBM J. Res. Dev. 55(3), 233–244 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM Systems and Technology Group, USA
David Brelsford, George Chochia, Nathan Falk, Kailash Marthi & Ravindra Sure
IBM T.J. Watson Research Center, USA
Norman Bobroff, Liana Fong & Seetharami Seelam

Authors

David Brelsford
View author publications
You can also search for this author in PubMed Google Scholar
George Chochia
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Falk
View author publications
You can also search for this author in PubMed Google Scholar
Kailash Marthi
View author publications
You can also search for this author in PubMed Google Scholar
Ravindra Sure
View author publications
You can also search for this author in PubMed Google Scholar
Norman Bobroff
View author publications
You can also search for this author in PubMed Google Scholar
Liana Fong
View author publications
You can also search for this author in PubMed Google Scholar
Seetharami Seelam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Google, 1600 Amphitheater Parkway, 94043, Mountain View, CA, USA
Walfredo Cirne
Mathematics and Computer Science Division, Argonne National Laboratory, Bldg 240, 60439, Argonne, IL, USA
Narayan Desai
Facebook Inc., 1601 Willow Road, 94025, Menlo Park, CA, USA
Eitan Frachtenberg
Robotics Research Institute, TU Dortmund, Otto-Hahn-Str. 8, 44227, Dortmund, Germany
Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brelsford, D. et al. (2013). Partitioned Parallel Job Scheduling for Extreme Scale Computing. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2012. Lecture Notes in Computer Science, vol 7698. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35867-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-35867-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35866-1
Online ISBN: 978-3-642-35867-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics