Multi-domain job coscheduling for leadership computing systems

Tang, Wei; Desai, Narayan; Vishwanath, Venkatram; Buettner, Daniel; Lan, Zhiling

doi:10.1007/s11227-012-0741-6

Multi-domain job coscheduling for leadership computing systems

Published: 18 January 2012

Volume 63, pages 367–384, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wei Tang¹,
Narayan Desai²,
Venkatram Vishwanath²,
Daniel Buettner² &
…
Zhiling Lan¹

150 Accesses
1 Citation
Explore all metrics

Abstract

Current supercomputing centers usually deploy a large-scale compute system together with an associated data analysis or visualization system. Multiple scenarios have driven the demand that some associated jobs co-execute on different machines. We propose a multi-domain coscheduling mechanism, providing the ability to coordinate execution between jobs on multiple resource management domains without manual intervention. We have evaluated our mechanism based on real job traces from Intrepid and Eureka, the production Blue Gene/P system and a cluster with the largest GPU installation, deployed at Argonne National Laboratory. The experimental results show that coscheduling can be achieved with limited impact on system performance under varying workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abbasi H, Wolf M, Eisenhauer G, Klasky S, Schwan K, Zheng F (2009) DataStager: Scalable data staging services for petascale applications. In: Proc of ACM international symposium on high performance distributed computing (HPDC)
Google Scholar
Basney J, Livny M (1999) Improving goodput by co-scheduling CPU and network capacity. Int J High Perform Comput Appl 13(3):220–230
Article Google Scholar
Binns J, Dech F, Papka M, Silverstein J, Stevens R (2005) Developing a distributed collaborative radiological visualization application. In: From Grid to HealthGrid, pp 70–79
Google Scholar
Blue Gene Team (2008) Overview of the IBM Blue Gene/P project. IBM J Res Devel
Cobalt project. http://trac.mcs.anl.gov/projects/cobalt
Czajkowski K, Foster I, Karonis N, Kesselman C, Martin S, Smith W, Tuecke S (1998) A resource management architecture for metacomputing systems. In: Proc of job scheduling strategies for parallel processing (JSSPP)
Google Scholar
Etsion Y, Tsafrir D (2005) A short survey of commercial cluster batch schedulers. Technical Report 2005-13, the Hebrew University of Jerusalem
Frachtenberg E, Feitelson D, Petrini F, Fernandez J (2003) Flexible coscheduling–mitigating load imbalance and improving utilization of heterogeneous resources. In: Proc of IEEE international parallel & distributed processing symposium (IPDPS)
Google Scholar
Foster I, Kesselman C, Lee C, Lindell R, Nahrstedt K, Roy A (1999) A distributed resource management architecture that supports advance reservations and co-allocation. In: Proc of international workshop on quality of service
Google Scholar
Huedo E, Montero R, Llorente I (2004) A framework for adaptive execution in grids. Softw Pract Exp 34(7):631–651
Article Google Scholar
MacLaren J (2007) HARC: the highly-available resource co-allocator. In: Proc. of GADA’07. LNCS, vol 4804. Springer, Berlin, pp 1385–1402
Google Scholar
Moab workload scheduler. http://www.adaptivecomputing.com
Ousterhout J (1982) Scheduling techniques for concurrent systems. In: Proc of IEEE int’l conference on distributed computing systems (ICDCS)
Google Scholar
Petrini F, Feng W-C (2000) Buffered coscheduling: a new methodology for multitasking parallel jobs on distributed systems. In: Proc of IEEE int’l parallel & distributed processing symp (IPDPS)
Google Scholar
Romosan A, Rotem D, Shoshani A, Wright D (2005) Co-scheduling of computation and data on computer clusters. In: Proc of int’l conf on scientific and statistical database management
Google Scholar
Sobalvarro P, Pakin S, Weihl W, Chien A (1998) Dynamic coscheduling on workstation clusters. In: Proc of job scheduling strategies for parallel processing (JSSPP)
Google Scholar
Sobalvarro P, Weihl W (1995) Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors. In: Proc of job scheduling strategies for parallel processing (JSSPP)
Google Scholar
Smith W, Foster I, Taylor W (2000) Scheduling with advanced reservations. In: Proc of IEEE int’l parallel & distributed processing symposium (IPDPS)
Google Scholar
Tang W, Lan Z, Desai N, Buettner D (2009) Fault-aware, utility-based job scheduling on Blue Gene/P systems. In: Proc of IEEE int’l conf on cluster computing
Google Scholar
Tang W, Desai N, Buettner D, Lan Z (2010) Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: Proceedings of IEEE international parallel & distributed processing symposium (IPDPS)
Google Scholar
Tang W, Desai N, Vishwanath V, Buettner D, Lan Z (2011) Job coscheduling on coupled high-end computing system. In: Proc of int’l conf on parallel processing workshops (ICPPW)
Google Scholar
Teodoro G, Sachetto R, Sertel O, Gurcan M, Meira W, Catalyurek U, Ferreira R (2009) Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In: Proc of IEEE int’l conf on cluster computing
Google Scholar
Townsley D, Bair R, Dubey A, Fisher R, Hearn N, Lamb D, Riley K (2009) Large-scale simulations of buoyancy-driven turbulent nuclear burning. J Phys, Conf Ser 125(1)
Tsafrir D, Etsion Y, Feitelson D (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803
Article Google Scholar
Vadiyar S, Dongarra J (2002) A metascheduler for the grid. In: Proc of 11th IEEE international symposium on high performance distributed computing (HPDC)
Google Scholar
Vishwanath V, Hereld M, Morozov V, Papka ME (2011) Topology-aware data movement and staging for I/O acceleration on BlueGene/P supercomputing systems. In: Proc IEEE/ACM international conference for high performance computing, networking, storage and analysis (SC)
Google Scholar
Vishwanath V, Hereld M, Papka ME (2011) Simulation-time data analysis and I/O acceleration on leadership-class systems using GLEAN. In: Proc of IEEE symposium on large data analysis and visualization
Google Scholar
Wiseman Y, Feitelson D (2003) Paired gang scheduling. IEEE Trans Parallel Distrib Syst 14(6):581–592
Article Google Scholar
Yoshimoto K, Kavatch PA, Andrews P (2005) Co-scheduling with user settable reservations. In: Proc of job scheduling strategies for parallel processing (JSSPP)
Google Scholar

Download references

Author information

Authors and Affiliations

Illinois Institute of Technology, Chicago, IL, USA
Wei Tang & Zhiling Lan
Argonne National Laboratory, Argonne, IL, USA
Narayan Desai, Venkatram Vishwanath & Daniel Buettner

Authors

Wei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Narayan Desai
View author publications
You can also search for this author in PubMed Google Scholar
Venkatram Vishwanath
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Buettner
View author publications
You can also search for this author in PubMed Google Scholar
Zhiling Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, W., Desai, N., Vishwanath, V. et al. Multi-domain job coscheduling for leadership computing systems. J Supercomput 63, 367–384 (2013). https://doi.org/10.1007/s11227-012-0741-6

Download citation

Published: 18 January 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11227-012-0741-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-domain job coscheduling for leadership computing systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

The evolution of distributed computing systems: from fundamental to new frontiers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-domain job coscheduling for leadership computing systems

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

The evolution of distributed computing systems: from fundamental to new frontiers

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation