Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

Yu, Weikuan; Que, Xinyu; Tipparaju, Vinod; Graham, Richard L.; Vetter, Jeffrey S.

doi:10.1007/s00450-010-0104-6

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

Special Issue Paper
Published: 08 April 2010

Volume 25, pages 57–64, (2010)
Cite this article

Computer Science - Research and Development

Weikuan Yu¹,
Xinyu Que¹,
Vinod Tipparaju²,
Richard L. Graham² &
…
Jeffrey S. Vetter²

67 Accesses
Explore all metrics

Abstract

Global Address Space (GAS) programming models are attractive because they retain the easy-to-use addressing model that is the characteristic of shared-memory style load and store operations. The scalability of GAS models depends directly on the design and implementation of runtime libraries on the targeted platforms. In this paper, we examine the memory requirement of a popular GAS run-time library, Aggregate Remote Memory Copy Interface (ARMCI) on petascale Cray XT 5 systems. Then we describe a new technique cooperative server clustering that enhances the memory scalability of ARMCI communication servers. In cooperative server clustering, ARMCI servers are organized into clusters, and cooperatively process incoming communication requests among them. A request intervention scheme is also designed to expedite the return of responses to the initiating processes. Our experimental results demonstrate that, with very little impact on ARMCI communication latency and bandwidth, cooperative server clustering is able to significantly reduce the memory requirement of ARMCI communication servers, thereby enabling highly scalable scientific applications. In particular, it dramatically reduces the total execution time of a scientific application, NWChem, by 45% on 2400 processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

GASNet-EX: A High-Performance, Portable Communication Library for Exascale

CUDA-DTM: Distributed Transactional Memory for GPU Clusters

References

Bonachea D, Hargrove PMW, Yelick K (2009) Porting gasnet to portals: partitioned global address space (pgas) language support for the cray xt. In: CUG ’09: cray user group meeting
Brightwell R, Riesen R, Maccabe AB (2003) Design, implementation, and performance of mpi on portals 3.0. Int J High Perform Comput Appl 17(1)
Chen WY, Iancu C, Yelick K (2005) Communication optimizations for fine-grained upc applications. In: PACT ’05: proceedings of the 14th international conference on parallel architectures and compilation techniques. IEEE Computer Society, Washington, pp 267–278
Google Scholar
Chen WY, Bonachea D, Iancu C, Yelick K (2007) Automatic nonblocking communication for partitioned global address space programs. In: ICS ’07: proceedings of the 21st annual international conference on supercomputing. ACM, New York, pp 158–167
Chapter Google Scholar
Dotsenko Y, Coarfa C, Mellor-Crummey J (2004) A multi-platform co-array Fortran compiler. In: Proceedings of parallel architecture and compilation techniques
Global Arrays Toolkit (2009) http://www.emsl.pnl.gov/docs/global
IBM (2008) Report on experimental language X10. http://dist.codehaus.org/x10/documentation/languagespec/x10-170.pdf
Kendall RA, Aprà E, Bernholdt DE, Bylaska EJ, Dupuis M, Fann GI, Harrison RJ, Ju J, Nichols JA, Nieplocha J, Straatsma TP, Windus TL, Wong AT (2000) High performance computational chemistry: an overview of NWChem a distributed parallel application. Comput Phys Commun 128(1):260–283 -2
Article MATH Google Scholar
Koop MJ, Jones T, Panda DK (2007) Reducing connection memory requirements of mpi for infiniband clusters: a message coalescing approach. In: Proceedings of the seventh IEEE international symposium on cluster computing and the grid, Washington, DC, USA
Nieplocha J, Tipparaju V, Krishnan M, Panda DK (2006) High performance remote memory access communication: the armci approach. Int J High Perform Comput Appl 20(2):233–253
Article Google Scholar
Parzyszek K (2003) Generalized portable shmem library for high performance computing. PhD thesis, Ames, IA, USA, co-Major professor-Kendall, Ricky A and Co-Major professor-Lutz, Robyn R
Shet A, Tipparaju V, Harrison R (2009) Asynchronous programming in upc: a case study and potential for improvement. In: Workshop on asynchrony in the PGAS programming model collocated with ICS 2009
Shipman G, Woodall T, Graham R, Maccabe A, Bridges P (2006) Infiniband scalability in open mpi. In: International parallel and distributed processing symposium
Sur S, Chai L, Jin HW, Panda DK (2006) Shared receive queue based scalable mpi design for infiniband clusters. In: International parallel and distributed processing symposium
Sur S, Koop MJ, Panda DK (2006) High-performance and scalable mpi over infiniband with reduced memory usage: an in-depth performance analysis. In: SC ’06: proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM, New York, p 105
Chapter Google Scholar
Tipparaju V, Apra E, Yu W, Vetter JS (2010) Enabling a highly-scalable global address space model for petascale computing. In: Computing frontiers
UPC Specifications, v12 (2009) http://www.gwu.edu/~upc/publications/LBNL-59208.pdf
Yu W, Gao Q, Panda D (2006) Adaptive connection management for scalable mpi over infiniband. In: International parallel and distributed processing symposium, Greece

Download references

Author information

Authors and Affiliations

Department of Computer Science, Auburn University, Auburn, AL, 36849, USA
Weikuan Yu & Xinyu Que
Computer Science & Mathematics, Oak Ridge National Laboratory, Oak Ridge, USA
Vinod Tipparaju, Richard L. Graham & Jeffrey S. Vetter

Authors

Weikuan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Que
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Tipparaju
View author publications
You can also search for this author in PubMed Google Scholar
Richard L. Graham
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey S. Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weikuan Yu.

Additional information

This work was funded in part by a UT-Battelle grant (UT-B-4000087151) to Auburn University, and in part by National Center for Computational Sciences. This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by an allocation of advanced computing resources provided by the National Science Foundation. Part of the computations were performed on Kraken (a Cray XT5) at the National Institute for Computational Sciences (http://www.nics.tennessee.edu/).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Que, X., Tipparaju, V. et al. Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems. Comput Sci Res Dev 25, 57–64 (2010). https://doi.org/10.1007/s00450-010-0104-6

Download citation

Published: 08 April 2010
Issue Date: May 2010
DOI: https://doi.org/10.1007/s00450-010-0104-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

Abstract

Access this article

Similar content being viewed by others

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

GASNet-EX: A High-Performance, Portable Communication Library for Exascale

CUDA-DTM: Distributed Transactional Memory for GPU Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

Abstract

Access this article

Similar content being viewed by others

Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences

GASNet-EX: A High-Performance, Portable Communication Library for Exascale

CUDA-DTM: Distributed Transactional Memory for GPU Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation