Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems

Potluri, Sreeram; Rossetti, Davide; Becker, Donald; Poole, Duncan; Gorentla Venkata, Manjunath; Hernandez, Oscar; Shamis, Pavel; Lopez, M. Graham; Baker, Mathew; Poole, Wendy

doi:10.1007/978-3-319-26428-8_2

Sreeram Potluri¹⁷,
Davide Rossetti¹⁷,
Donald Becker¹⁷,
Duncan Poole¹⁷,
Manjunath Gorentla Venkata¹⁸,
Oscar Hernandez¹⁸,
Pavel Shamis¹⁸,
M. Graham Lopez¹⁸,
Mathew Baker¹⁸ &
…
Wendy Poole¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9397))

Included in the following conference series:

Workshop on OpenSHMEM and Related Technologies

443 Accesses
8 Citations
1 Altmetric

Abstract

Extreme-scale systems with compute accelerators such as Graphical Processing Unit (GPUs) have become popular for executing scientific applications. These systems are typically programmed using MPI and CUDA (for NVIDIA based GPUs). However, there are many drawbacks to the MPI+CUDA approach. The orchestration required between the compute and communication phases of the application execution, and the constraint that communication can only be initiated from serial portions on the Central Processing Unit (CPU) lead to scaling bottlenecks. To address these drawbacks, we explore the viability of using OpenSHMEMfor programming these systems. In this paper, first, we make a case for supporting GPU-initiated communication, and suitability of the OpenSHMEMprogramming model. Second, we present NVSHMEM, a prototype implementation of the proposed programming approach, port Stencil and Transpose benchmarks which are representative of many scientific applications from MPI+CUDA model to OpenSHMEM, and evaluate the design and implementation of NVSHMEM. Finally, we provide a discussion on the opportunities and challenges of OpenSHMEMto program these systems, and propose extensions to OpenSHMEMto achieve the full potential of this programming approach.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan(http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

The MPI Forum: MPI: A Message Passing Interface. Technical report (Version 3.0, 2012)
Google Scholar
Tariq, S.: Lessons learned in improving scaling of applications on large GPU clusters. In: HPC Advisory Council Stanford Conference (2003)
Google Scholar
OpenSHMEM Org.: OpenSHMEM Specification (2015). http://openshmem.org/
NVIDIA: Nvlink high-speed interconnect (2015). http://www.nvidia.com/object/nvlink.html
AVAGO: Expressfabric technology (2015). http://www.plxtech.com/applications/expressfabric
Sorin, D.J., Hill, M.D., Wood, D.A.: A Primer on Memory Consistency and Cache Coherence, 1st edn. Morgan & Claypool Publishers, San Rafael (2011)
Google Scholar
Dinan, J., Flajslik, M.: Contexts: a mechanism for high throughput communication in openshmem. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, Eugene, OR, USA, October 6–10, 2014, pp. 10:1–10:9 (2014)
Google Scholar
Gropp, W.D., Thakur, R.: Issues in developing a thread-safe MPI implementation. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 12–21. Springer, Heidelberg (2006)
Chapter Google Scholar
ten Bruggencate, M., Roweth, D., Oyanagi, S.: Thread-safe SHMEM extensions. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 178–185. Springer, Heidelberg (2014)
Google Scholar
Skjellum, A.: High performance MPI: extending the message passing interface for higher performance and higher predictability. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (1998)
Google Scholar
NVIDIA: Gpudirect (2015). https://developer.nvidia.com/gpudirect
NVIDIA: Gpudirect RDMA (2015). http://docs.nvidia.com/cuda/gpudirect-rdma
Rossetti, D.: Gpudirect: integrating the GPU with a network interface. In: GPU Technology Conference (2015)
Google Scholar
Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: Mvapich2-GPU: optimized GPU to GPU communication for infiniband clusters. Comput. Sci. 26, 257–266 (2011)
Google Scholar
Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using gpudirect RDMA for infiniband clusters with nvidia gpus. In: Proceedings of the 2013 42Nd International Conference on Parallel Processing. ICPP 2013, pp. 80–89. IEEE Computer Society, Washington (2013)
Google Scholar
MVAPICH: MPI over infiniband, 10gige/iwarp and roce (2015). http://mvapich.cse.ohio-state.edu
Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W.c., Bisset, K.R., Thakur, R.: MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems. In: 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK (2012)
Google Scholar
Potluri, S., Bureddy, D., Wang, H., Subramoni, H., Panda, D.K.: Extending openshmem for GPU computing. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IPDPS 2013, pp. 1001–1012. IEEE Computer Society, Washington (2013)
Google Scholar
Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. X10 2011, pp. 8:1–8:10. ACM, New York (2011)
Google Scholar
Miyoshi, T., Irie, H., Shima, K., Honda, H., Kondo, M., Yoshinaga, T.: Flat: A GPU programming framework to provide embedded MOI. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. GPGPU-5, pp. 20–29. ACM, New York (2012)
Google Scholar

Download references

Acknowledgments

The work at NVIDIA is funded by U.S. Department of Energy under subcontract 7078610 with Lawrence Berkeley National Laboratory. The work at Oak Ridge National Laboratory (ORNL) is supported by the United States Department of Defense and used the resources of the Extreme Scale Systems Center located at the ORNL. In addition the authors would like to thank Stephen Poole (DoD) for his review of this work and many technical discussions that help shape the ideas presented in the paper.

Author information

Authors and Affiliations

NVIDIA Corporation, Santa Clara, USA
Sreeram Potluri, Davide Rossetti, Donald Becker & Duncan Poole
Extreme Scale Systems Center (ESSC), Oak Ridge National Laboratory (ORNL), Oak Ridge, USA
Manjunath Gorentla Venkata, Oscar Hernandez, Pavel Shamis, M. Graham Lopez & Mathew Baker
Open Source Software Solutions, Knoxville, USA
Wendy Poole

Authors

Sreeram Potluri
View author publications
You can also search for this author in PubMed Google Scholar
Davide Rossetti
View author publications
You can also search for this author in PubMed Google Scholar
Donald Becker
View author publications
You can also search for this author in PubMed Google Scholar
Duncan Poole
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Gorentla Venkata
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shamis
View author publications
You can also search for this author in PubMed Google Scholar
M. Graham Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Mathew Baker
View author publications
You can also search for this author in PubMed Google Scholar
Wendy Poole
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manjunath Gorentla Venkata .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Manjunath Gorentla Venkata
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Pavel Shamis
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
Neena Imam
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
M. Graham Lopez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Potluri, S. et al. (2015). Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M. (eds) OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies. OpenSHMEM 2014. Lecture Notes in Computer Science(), vol 9397. Springer, Cham. https://doi.org/10.1007/978-3-319-26428-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-26428-8_2
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26427-1
Online ISBN: 978-3-319-26428-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics