Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing

Tipparaju, Vinod; Apra, Edoardo; Yu, Weikuan; Que, Xinyu; Vetter, Jeffrey S.

doi:10.1007/s10766-012-0214-9

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing

Published: 02 September 2012

Volume 40, pages 633–655, (2012)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Vinod Tipparaju¹,
Edoardo Apra¹,
Weikuan Yu²,
Xinyu Que² &
…
Jeffrey S. Vetter¹

190 Accesses
Explore all metrics

Abstract

Over the past decade, the trajectory to the petascale has been built on increased complexity and scale of the underlying parallel architectures. Meanwhile, software developers have struggled to provide tools that maintain the productivity of computational science teams using these new systems. In this regard, Global Address Space (GAS) programming models provide a straightforward and easy to use addressing model, which can lead to improved productivity. However, the scalability of GAS depends directly on the design and implementation of the runtime system on the target petascale distributed-memory architecture. In this paper, we describe the design, implementation, and optimization of the Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory. We optimized our implementation with the flow intimation technique that we have introduced in this paper. Our optimized ARMCI implementation improves scalability of both the Global Arrays programming model and a real-world chemistry application—NWChem—from small jobs up through 180,000 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating Asynchronous Task Parallelism with OpenSHMEM

Memory Management Techniques for Exploiting RDMA in PGAS Languages

Remote Procedure Calls for Improved Data Locality with the Epiphany Architecture

References

Apra, E., Harrison, R.J., de Jong, W., Rendell, A., Tipparaju, V., Xantheas, S., Olsen, R.: Liquid water: obtaining the right answer for the right reasons. In: Supercomputing, 2009. SC’09. Proceedings of the ACM/IEEE SC 2009 Conference (2009)
Barrett, B.W., Shipman, G.M., Lumsdaine, A.: Analysis of implementation options for MPI-2 one-sided. In: Proceedings, Euro PVM/MPI, Paris, France (2007)
Bell, C., Bonachea, D.: A new DMA registration strategy for pinning-based high performance networks. In: Proceedings of the 17th International Symposium on Parallel and Distributed Processing, IPDPS’03, IEEE Computer Society, Washington, DC, USA, p. 198.1. http://dl.acm.org/citation.cfm?id=838237.838681 (2003)
Bonachea, D.: GASNet specification, v1.1. Tech. rep., Berkeley, CA, USA (2002)
Brightwell, R., Riesen, R., Lawry, B., Maccabe, A.: Portals 3.0: protocol building blocks for low overhead communication. In: Parallel and Distributed Processing Symposium, Proceedings International, IPDPS 2002, Abstracts and CD-ROM, pp. 164–173 (2002)
Bylaska, E., et al.: NWChem, a computational chemistry package for parallel computers, version 5.1 (2007)
Blocksome, M., Archer, C., Inglett, T., McCarthy, P., Mundy, M., Ratterman, J., Sidelnik, A., Smith, B., Almási, G., Castaños, J., Lieber, D., Moreira, J., Krishnamoorthy, S., Tipparaju, V., Nieplocha, J.: Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene supercomputer. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC’06. ACM, New York, NY, USA, http://doi.acm.org/10.1145/1188455.1188580 (2006)
Chapel language specifications, v0.780. http://chapel.cs.washington.edu/spec-0.780.pdf (2006)
Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform co-array Fortran compiler. In: Parallel Architecture and Compilation Techniques, 2004. PACT 2004. Proceedings. 13th International Conference on, pp. 29–40. doi:10.1109/PACT.2004.1342539 (2004)
Dunning, T.H.: Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 90(2), 1007–1023 (1989). doi:10.1063/1.456153, http://link.aip.org/link/?JCP/90/1007/1
Dunning, T.H., Peterson, K.A., Woon, D.E., Wilson, A.K.: Quantifying quantum chemistry. In: American Conference on Theoretical Chemistry, unpublished (1999)
de Jong, W.A., Krishnamoorthy, S.: Private communication (2008)
GA: Global Arrays Toolkit. http://www.emsl.pnl.gov/docs/global (2010)
GASNet Portals Conduit Docs. http://gasnet.cs.berkeley.edu/dist/portals-conduit/README (2008)
Kelly, S.M., Brightwell, R.: Software architecture of the light weight kernel, Catamount. In: In Cray user group, pp. 16–19 (2005)
Kobayashi R., Rendell A.P.: A direct coupled cluster algorithm for massively parallel computers. Chem. Phys. Lett. 265(1–2), 1–11 (1997). doi:10.1016/S0009-2614(96)01387-5
Article Google Scholar
Koop, M., Sridhar, J., Panda, D.: Scalable MPI design over InfiniBand using eXtended reliable connection. In: Cluster Computing, 2008 IEEE International Conference on, pp. 203–212. doi:10.1109/CLUSTR.2008.4663773 (2008)
Krishnan, M., Nieplocha, J., Blocksome, M., Smith, B.: Evaluation of remote memory access communication on the IBM Blue Gene/P Supercomputer. In: Parallel Processing—Workshops, 2008. ICPP-W ’08. International Conference on, pp. 109–115. doi:10.1109/ICPP-W.2008.34 (2008)
Kumar, S., Dozsa, G., Almasi, G., Heidelberger, P., Chen, D., Giampapa, M.E., Michael, B., Faraj, A., Parker, J., Ratterman, J., Smith, B., Archer, C.J.: The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P Supercomputer. In: Proceedings of the 22nd Annual International Conference on Supercomputing, ICS’08. ACM, New York, NY, USA, pp. 94–103. http://doi.acm.org/10.1145/1375527.1375544 (2008)
Nieplocha, J., Ju, J., Apra, E.: One-sided Communication on the Myrinet-based SMP Clusters using the GM Message-Passing Library. In: In Proceedings of the Workshop on Communication Architecture for Clusters (CAC) held in conjunction with IPDPS 01 (2001)
Nieplocha, J., Tipparaju, V., Saify, A., Panda, D.: Protocols and strategies for optimizing performance of remote memory operations on clusters. In: Parallel and Distributed Processing Symposium, Proceedings International, IPDPS 2002, Abstracts and CD-ROM, pp. 164–173 (2002a)
Nieplocha, J., Tipparaju, V., Saify, A., Panda, D.: Protocols and strategies for optimizing performance of remote memory operations on clusters. In: In: Proceedings Workshop Communication Architecture for Clusters (CAC02) of IPDPS’02, Ft (2002b)
Nieplocha, J., Tipparaju, V., Apra, E.: An evaluation of two implementation strategies for optimizing one-sided atomic reduction. In: IPDPS’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS’05)—Workshop 9, IEEE Computer Society, Washington, DC, USA, p. 215.2. doi:10.1109/IPDPS.2005.96 (2005a)
Nieplocha, J., Tipparaju, V., Krishnan, M.: Optimizing strided remote memory access operations on the quadrics QsNetII network interconnect. In: HPCASIA’05: Proceedings of the 8th International Conference on High-Performance Computing in Asia-Pacific Region, IEEE Computer Society, Washington, DC, USA, p. 28. doi:10.1109/HPCASIA.2005.62 (2005b)
Nieplocha, J., Palmer, B., Tipparaju, V., Krishnan, M., Trease, H., Apra, E.: Advances, applications and performance of the global arrays shared memory programming toolkit. Int. J. High Perform. Comput. Appl. 20(2), 203–231 (2006a) http://hpc.sagepub.com/cgi/content/abstract/20/2/203
Nieplocha, J., Tipparaju, V., Krishnan, M., Panda, D.K.: High performance remote memory access communication: the ARMCI approach. Int. J. High Perform. Comput. Appl. 20(2), 233–253 (2006b) http://hpc.sagepub.com/cgi/content/abstract/20/2/233
Nishtala, R., Hargrove, P., Bonachea, D., Yelick, K.: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In: Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp. 1–12. doi:10.1109/IPDPS.2009.5161076 (2009)
Parzyszek, K.: Generalized portable shmem library for high performance computing. PhD thesis, Ames, IA, USA, co-Major Professor-Kendall, Ricky A. and Co-Major Professor-Lutz, Robyn R. (2003)
Pollack, L., Windus, T.L., de Jong, W.A., Dixon, D.A.: Thermodynamic properties of the C5, C6, and C8 n-alkanes from ab initio electronic structure theory. J. Phys. Chem. A 109(31), 6934–6938 (2005). doi:10.1021/jp044564r, http://pubs.acs.org/doi/abs/10.1021/jp044564r, http://pubs.acs.org/doi/pdf/10.1021/jp044564r
Google Scholar
Quadrics Supercomputer Company. http://en.wikipedia.org/wiki/Quadrics (1996)
Report on experimental language X10. http://dist.codehaus.org/x10/documentation/languagespec/x10-170.pdf (2008)
Shet, A., Tipparaju, V., Harrison, R.: Asynchronous programming in UPC: a case study and potential for improvement. In: Workshop on Asynchrony in the PGAS Programming Model Collocated with ICS 2009 (2009)
Tipparaju, V., Santhanaraman, G., Nieplocha, J., Panda, D.K.: Host-assisted zero-copy remote memory access communication on InfiniBand. In: Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, p. 31. doi:10.1109/IPDPS.2004.1302943 (2004)
Tipparaju, V., Kot, A., Nieplocha, J., Bruggencate, M., Chrisochoides, N.: Evaluation of remote memory access communication on the Cray XT3. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–7. doi:10.1109/IPDPS.2007.370478 (2007)
Top500 list. http://www.top500.org (2010)
UPC specifications, v1.2. http://www.gwu.edu/~upc/publications/LBNL-59208.pdf (2005)

Download references

Author information

Authors and Affiliations

Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Vinod Tipparaju, Edoardo Apra & Jeffrey S. Vetter
Department of Computer Science, Auburn University, Auburn, AL, 36849, USA
Weikuan Yu & Xinyu Que

Authors

Vinod Tipparaju
View author publications
You can also search for this author in PubMed Google Scholar
Edoardo Apra
View author publications
You can also search for this author in PubMed Google Scholar
Weikuan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Que
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey S. Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinod Tipparaju.

Additional information

This paper was authored by at least one employee of UT-Battelle, LLC, under contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the paper for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this contribution, or allow others to do so, for United States Government purposes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tipparaju, V., Apra, E., Yu, W. et al. Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing. Int J Parallel Prog 40, 633–655 (2012). https://doi.org/10.1007/s10766-012-0214-9

Download citation

Received: 01 December 2010
Accepted: 02 August 2012
Published: 02 September 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10766-012-0214-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing

Abstract

Access this article

Similar content being viewed by others

Integrating Asynchronous Task Parallelism with OpenSHMEM

Memory Management Techniques for Exploiting RDMA in PGAS Languages

Remote Procedure Calls for Improved Data Locality with the Epiphany Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing

Abstract

Access this article

Similar content being viewed by others

Integrating Asynchronous Task Parallelism with OpenSHMEM

Memory Management Techniques for Exploiting RDMA in PGAS Languages

Remote Procedure Calls for Improved Data Locality with the Epiphany Architecture

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation