ScELA: Scalable and Extensible Launching Architecture for Clusters

Sridhar, Jaidev K.; Koop, Matthew J.; Perkins, Jonathan L.; Panda, Dhabaleswar K.

doi:10.1007/978-3-540-89894-8_30

Jaidev K. Sridhar⁵,
Matthew J. Koop⁵,
Jonathan L. Perkins⁵ &
…
Dhabaleswar K. Panda⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5374))

Included in the following conference series:

International Conference on High-Performance Computing

703 Accesses
9 Citations

Abstract

As cluster sizes head into tens of thousands, current job launch mechanisms do not scale as they are limited by resource constraints as well as performance bottlenecks. The job launch process includes two phases – spawning of processes on processors and information exchange between processes for job initialization. Implementations of various programming models follow distinct protocols for the information exchange phase. We present the design of a scalable, extensible and high-performance job launch architecture for very large scale parallel computing. We present implementations of this architecture which achieve a speedup of more than 700% in launching a simple Hello World MPI application on 10,240 processor cores and also scale to more than 3 times the number of processor cores compared to prior solutions.

This research is supported in part by U.S. Department of Energy grants #DE-FC02-06ER25749 and #DE-FC02-06ER25755; National Science Foundation grants #CNS-0403342 and #CCF-0702675; grant from Wright Center for Innovation #WCI04-010-OSU-0; grants from Intel, Mellanox, Cisco, and Sun Microsystems; Equipment donations from Intel, Mellanox, AMD, Advanced Clustering, Appro, QLogic, and Sun Microsystems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TOP 500 Project: Top 500 Supercomputer Sites, http://www.top500.org
Sandia National Laboratories: Thunderbird Linux Cluster, http://www.cs.sandia.gov/platforms/Thunderbird.html
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard (1994)
Google Scholar
Network-Based Computing Laboratory: MVAPICH: MPI-1 over InfiniBand, http://mvapich.cse.ohio-state.edu/overview/mvapich
Argonne National Laboratory: MPICH2 : High-performance and Widely Portable MPI, http://www.mcs.anl.gov/research/projects/mpich2/
Huang, W., Santhanaraman, G., Jin, H.-W., Gao, Q., Panda, D.K.: Design of high performance mvapich2: Mpi2 over infiniband. In: Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006) (2006)
Google Scholar
Carlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., Warren, K.: Introduction to upc and language specification. IDA Center for Computing Sciences (1999)
Google Scholar
Shukla, A., Brecht, T.: Tcp connection management mechanisms for improving internet server performance. In: 1st IEEE Workshop on Hot Topics in Web Systems and Technologies, 2006. HOTWEB 2006, November 13-14, 2006, pp. 1–12 (2006)
Google Scholar
Moody, A., Fernandez, J., Petrini, F., Panda, D.: Scalable NIC-based Reduction on Large-scale Clusters. In: Supercomputing, 2003 ACM/IEEE Conference (2003)
Google Scholar
InfiniBand Trade Association: InfiniBand Architecture Specification, http://www.infinibandta.com
Lawrence Berkeley National Laboratory: MVICH: MPI for Virtual Interface Architecture (2001), http://www.nersc.gov/research/FTG/mvich/index.html
Butler, R., Gropp, W., Lusk, E.: Components and interfaces of a process management system for parallel programs. In: Parallel Computing (2001)
Google Scholar
Texas Advanced Computing Center: HPC Systems, http://www.tacc.utexas.edu/resources/hpcsystems/
Yu, W., Wu, J., Panda, D.K.: Scalable startup of parallel programs over infiniband. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296. Springer, Heidelberg (2004)
Google Scholar
Brightwell, R., Fisk, L.: Scalable parallel application launch on cplant. In: Supercomputing, ACM/IEEE 2001 Conference, November 10-16 (2001)
Google Scholar
Lawrence Livermore National Laboratory and Hewlett Packard and Bull and Linux NetworX: Simple Linux Utility for Resource Management, https://computing.llnl.gov/linux/slurm/
Network-based Computing Laboratory: (MVAPICH: MPI over InfiniBband and iWARP), http://mvapich.cse.ohio-state.edu

Download references

Author information

Authors and Affiliations

Network-Based Computing Laboratory, The Ohio State University, 2015 Neil Ave., Columbus, OH 43210, USA
Jaidev K. Sridhar, Matthew J. Koop, Jonathan L. Perkins & Dhabaleswar K. Panda

Authors

Jaidev K. Sridhar
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Koop
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan L. Perkins
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, OH 43210, Columbus, USA
Ponnuswamy Sadayappan
Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, 94 Brett Road, NJ 08854, Piscataway, USA
Manish Parashar
Hewlett-Packard ISO,, Sy 192, Whitefield Road, Mahadevapura Post, 560048, Bangalore, India
Ramamurthy Badrinath
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sridhar, J.K., Koop, M.J., Perkins, J.L., Panda, D.K. (2008). ScELA: Scalable and Extensible Launching Architecture for Clusters. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-89894-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89893-1
Online ISBN: 978-3-540-89894-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics