Skip to main content

Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems

  • Conference paper
  • First Online:
OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies (OpenSHMEM 2014)

Abstract

Extreme-scale systems with compute accelerators such as Graphical Processing Unit (GPUs) have become popular for executing scientific applications. These systems are typically programmed using MPI and CUDA (for NVIDIA based GPUs). However, there are many drawbacks to the MPI+CUDA approach. The orchestration required between the compute and communication phases of the application execution, and the constraint that communication can only be initiated from serial portions on the Central Processing Unit (CPU) lead to scaling bottlenecks. To address these drawbacks, we explore the viability of using OpenSHMEMfor programming these systems. In this paper, first, we make a case for supporting GPU-initiated communication, and suitability of the OpenSHMEMprogramming model. Second, we present NVSHMEM, a prototype implementation of the proposed programming approach, port Stencil and Transpose benchmarks which are representative of many scientific applications from MPI+CUDA model to OpenSHMEM, and evaluate the design and implementation of NVSHMEM. Finally, we provide a discussion on the opportunities and challenges of OpenSHMEMto program these systems, and propose extensions to OpenSHMEMto achieve the full potential of this programming approach.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan(http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The MPI Forum: MPI: A Message Passing Interface. Technical report (Version 3.0, 2012)

    Google Scholar 

  2. Tariq, S.: Lessons learned in improving scaling of applications on large GPU clusters. In: HPC Advisory Council Stanford Conference (2003)

    Google Scholar 

  3. OpenSHMEM Org.: OpenSHMEM Specification (2015). http://openshmem.org/

  4. NVIDIA: Nvlink high-speed interconnect (2015). http://www.nvidia.com/object/nvlink.html

  5. AVAGO: Expressfabric technology (2015). http://www.plxtech.com/applications/expressfabric

  6. Sorin, D.J., Hill, M.D., Wood, D.A.: A Primer on Memory Consistency and Cache Coherence, 1st edn. Morgan & Claypool Publishers, San Rafael (2011)

    Google Scholar 

  7. Dinan, J., Flajslik, M.: Contexts: a mechanism for high throughput communication in openshmem. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, Eugene, OR, USA, October 6–10, 2014, pp. 10:1–10:9 (2014)

    Google Scholar 

  8. Gropp, W.D., Thakur, R.: Issues in developing a thread-safe MPI implementation. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 12–21. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. ten Bruggencate, M., Roweth, D., Oyanagi, S.: Thread-safe SHMEM extensions. In: Poole, S., Hernandez, O., Shamis, P. (eds.) OpenSHMEM 2014. LNCS, vol. 8356, pp. 178–185. Springer, Heidelberg (2014)

    Google Scholar 

  10. Skjellum, A.: High performance MPI: extending the message passing interface for higher performance and higher predictability. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA) (1998)

    Google Scholar 

  11. NVIDIA: Gpudirect (2015). https://developer.nvidia.com/gpudirect

  12. NVIDIA: Gpudirect RDMA (2015). http://docs.nvidia.com/cuda/gpudirect-rdma

  13. Rossetti, D.: Gpudirect: integrating the GPU with a network interface. In: GPU Technology Conference (2015)

    Google Scholar 

  14. Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: Mvapich2-GPU: optimized GPU to GPU communication for infiniband clusters. Comput. Sci. 26, 257–266 (2011)

    Google Scholar 

  15. Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using gpudirect RDMA for infiniband clusters with nvidia gpus. In: Proceedings of the 2013 42Nd International Conference on Parallel Processing. ICPP 2013, pp. 80–89. IEEE Computer Society, Washington (2013)

    Google Scholar 

  16. MVAPICH: MPI over infiniband, 10gige/iwarp and roce (2015). http://mvapich.cse.ohio-state.edu

  17. Aji, A.M., Dinan, J., Buntinas, D., Balaji, P., Feng, W.c., Bisset, K.R., Thakur, R.: MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems. In: 14th IEEE International Conference on High Performance Computing and Communications, Liverpool, UK (2012)

    Google Scholar 

  18. Potluri, S., Bureddy, D., Wang, H., Subramoni, H., Panda, D.K.: Extending openshmem for GPU computing. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. IPDPS 2013, pp. 1001–1012. IEEE Computer Society, Washington (2013)

    Google Scholar 

  19. Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop. X10 2011, pp. 8:1–8:10. ACM, New York (2011)

    Google Scholar 

  20. Miyoshi, T., Irie, H., Shima, K., Honda, H., Kondo, M., Yoshinaga, T.: Flat: A GPU programming framework to provide embedded MOI. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. GPGPU-5, pp. 20–29. ACM, New York (2012)

    Google Scholar 

Download references

Acknowledgments

The work at NVIDIA is funded by U.S. Department of Energy under subcontract 7078610 with Lawrence Berkeley National Laboratory. The work at Oak Ridge National Laboratory (ORNL) is supported by the United States Department of Defense and used the resources of the Extreme Scale Systems Center located at the ORNL. In addition the authors would like to thank Stephen Poole (DoD) for his review of this work and many technical discussions that help shape the ideas presented in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manjunath Gorentla Venkata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Potluri, S. et al. (2015). Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M. (eds) OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies. OpenSHMEM 2014. Lecture Notes in Computer Science(), vol 9397. Springer, Cham. https://doi.org/10.1007/978-3-319-26428-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26428-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26427-1

  • Online ISBN: 978-3-319-26428-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics