Skip to main content
Log in

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

Global Address Space (GAS) programming models are attractive because they retain the easy-to-use addressing model that is the characteristic of shared-memory style load and store operations. The scalability of GAS models depends directly on the design and implementation of runtime libraries on the targeted platforms. In this paper, we examine the memory requirement of a popular GAS run-time library, Aggregate Remote Memory Copy Interface (ARMCI) on petascale Cray XT 5 systems. Then we describe a new technique cooperative server clustering that enhances the memory scalability of ARMCI communication servers. In cooperative server clustering, ARMCI servers are organized into clusters, and cooperatively process incoming communication requests among them. A request intervention scheme is also designed to expedite the return of responses to the initiating processes. Our experimental results demonstrate that, with very little impact on ARMCI communication latency and bandwidth, cooperative server clustering is able to significantly reduce the memory requirement of ARMCI communication servers, thereby enabling highly scalable scientific applications. In particular, it dramatically reduces the total execution time of a scientific application, NWChem, by 45% on 2400 processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bonachea D, Hargrove PMW, Yelick K (2009) Porting gasnet to portals: partitioned global address space (pgas) language support for the cray xt. In: CUG ’09: cray user group meeting

  2. Brightwell R, Riesen R, Maccabe AB (2003) Design, implementation, and performance of mpi on portals 3.0. Int J High Perform Comput Appl 17(1)

  3. Chen WY, Iancu C, Yelick K (2005) Communication optimizations for fine-grained upc applications. In: PACT ’05: proceedings of the 14th international conference on parallel architectures and compilation techniques. IEEE Computer Society, Washington, pp 267–278

    Google Scholar 

  4. Chen WY, Bonachea D, Iancu C, Yelick K (2007) Automatic nonblocking communication for partitioned global address space programs. In: ICS ’07: proceedings of the 21st annual international conference on supercomputing. ACM, New York, pp 158–167

    Chapter  Google Scholar 

  5. Dotsenko Y, Coarfa C, Mellor-Crummey J (2004) A multi-platform co-array Fortran compiler. In: Proceedings of parallel architecture and compilation techniques

  6. Global Arrays Toolkit (2009) http://www.emsl.pnl.gov/docs/global

  7. IBM (2008) Report on experimental language X10. http://dist.codehaus.org/x10/documentation/languagespec/x10-170.pdf

  8. Kendall RA, Aprà E, Bernholdt DE, Bylaska EJ, Dupuis M, Fann GI, Harrison RJ, Ju J, Nichols JA, Nieplocha J, Straatsma TP, Windus TL, Wong AT (2000) High performance computational chemistry: an overview of NWChem a distributed parallel application. Comput Phys Commun 128(1):260–283 -2

    Article  MATH  Google Scholar 

  9. Koop MJ, Jones T, Panda DK (2007) Reducing connection memory requirements of mpi for infiniband clusters: a message coalescing approach. In: Proceedings of the seventh IEEE international symposium on cluster computing and the grid, Washington, DC, USA

  10. Nieplocha J, Tipparaju V, Krishnan M, Panda DK (2006) High performance remote memory access communication: the armci approach. Int J High Perform Comput Appl 20(2):233–253

    Article  Google Scholar 

  11. Parzyszek K (2003) Generalized portable shmem library for high performance computing. PhD thesis, Ames, IA, USA, co-Major professor-Kendall, Ricky A and Co-Major professor-Lutz, Robyn R

  12. Shet A, Tipparaju V, Harrison R (2009) Asynchronous programming in upc: a case study and potential for improvement. In: Workshop on asynchrony in the PGAS programming model collocated with ICS 2009

  13. Shipman G, Woodall T, Graham R, Maccabe A, Bridges P (2006) Infiniband scalability in open mpi. In: International parallel and distributed processing symposium

  14. Sur S, Chai L, Jin HW, Panda DK (2006) Shared receive queue based scalable mpi design for infiniband clusters. In: International parallel and distributed processing symposium

  15. Sur S, Koop MJ, Panda DK (2006) High-performance and scalable mpi over infiniband with reduced memory usage: an in-depth performance analysis. In: SC ’06: proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM, New York, p 105

    Chapter  Google Scholar 

  16. Tipparaju V, Apra E, Yu W, Vetter JS (2010) Enabling a highly-scalable global address space model for petascale computing. In: Computing frontiers

  17. UPC Specifications, v12 (2009) http://www.gwu.edu/~upc/publications/LBNL-59208.pdf

  18. Yu W, Gao Q, Panda D (2006) Adaptive connection management for scalable mpi over infiniband. In: International parallel and distributed processing symposium, Greece

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weikuan Yu.

Additional information

This work was funded in part by a UT-Battelle grant (UT-B-4000087151) to Auburn University, and in part by National Center for Computational Sciences. This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by an allocation of advanced computing resources provided by the National Science Foundation. Part of the computations were performed on Kraken (a Cray XT5) at the National Institute for Computational Sciences (http://www.nics.tennessee.edu/).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, W., Que, X., Tipparaju, V. et al. Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems. Comput Sci Res Dev 25, 57–64 (2010). https://doi.org/10.1007/s00450-010-0104-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-010-0104-6

Keywords

Navigation