Skip to main content

CUDAGA: A Portable Parallel Programming Model for GPU Cluster

  • Conference paper
  • First Online:
Cloud Computing and Big Data (CloudCom-Asia 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

  • 1331 Accesses

Abstract

GPU cluster is important for high performance computing with its high performance/cost ratio. However, it is still very hard for application developers to write parallel codes on GPU. MPI is mostly used for parallel programming, and data locality and communication must be specified explicitly by developers. Moreover, data transmission between CPU and GPU must also be processed with CUDA codes. CUDAGA, a new parallel programming model for GPU cluster with CUDA, is presented to provide portable interfaces for commu-nication on GPUs. GA (Global Arrays), a portable shared-memory programming model for distributed memory computers, is the base to facilitate parallel pro-gramming and maintain transparent global arrays on GPUs. Experiments show that CUDAGA can decrease parallel programming difficulties, but ensures better performance for some specific applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Buck, I., Foley, T., Horn, D., Sugerman, J.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23(3), 777–786 (2004)

    Article  Google Scholar 

  2. Volodymyr, V., Jeremy, J. E., Guochun, S.: GPU clusters for high-performance computing. In: Proceedings of IEEE Cluster PPAC Workshop, pp. 1–8. IEEE Computer Society (2009)

    Google Scholar 

  3. Hawick, K.A., Leist, A., Playne, D.P.: Regular lattice and small-world spin model simulations using CUDA and GPUs. Int. J. Parallel Prog. 39(2), 183–201 (2011)

    Article  Google Scholar 

  4. Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a non-uniform memory access programming model for high-performance computers. J. Supercomput. 10(2), 169–189 (1996)

    Article  Google Scholar 

  5. Nieplocha, J., Carpenter, B.: ARMCI: a portable remote memory copy library for distributed array libraries and compiler run-time systems. In: Rolim, J., et al. (eds.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 533–546. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  6. Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 79–84. ACM, New York (2009)

    Google Scholar 

  7. William, G.N.D., Lusk, E., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)

    Article  MATH  Google Scholar 

  8. Orion, S.L.: Message passing for GPGPU clusters: cudaMPI. In: Proceedings of IEEE International Conference on Cluster Computing and Workshops, pp. 1–8. IEEE (2009)

    Google Scholar 

  9. Moerschell, A., Owens, J.D.: Distributed texture memory in a multi-GPU environment. In: Graphics Hardware, pp. 31–38 (2006)

    Google Scholar 

  10. Nieplocha, J., Harrison, R.J., Littlefield, R.J.: The global array programming model for high performance scientific computing. SIAM News 28(7), 12–14 (1995)

    Google Scholar 

  11. Fan, Z., Qiu, F., Kaufman, A.: Zippy: a framework for computation and visualization on a GPU clusters. Comput. Graph. Forum 27(2), 341–350 (2008)

    Article  Google Scholar 

  12. Strengert, M., Müller, C., Dachsbacher, C., Ertl, T.: CUDASA: compute unified device and systems architecture. In: Proceedings of Eurographics Symposium on Parallel Graphics and Visualization (EGPGV 2008), pp. 49–56. Eurographics Association (2008)

    Google Scholar 

  13. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by the National 973 Key Basic Research Plan of China (No. 2013CB2282036), the Major Subject of the State Grid Corporation of China (No. SGCC-MPLG001(001-031)-2012), the National 863 Basic Research Program of China (No. 2011AA05A118), the National Natural Science Foundation of China (No. 61133008), the National Science and Technology Pillar Program (No. 2012BAH14F02) and the independent innovation project of Huazhong University of Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ran Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Y., Jin, H., Xu, D., Zheng, R., Liu, H., Zeng, J. (2015). CUDAGA: A Portable Parallel Programming Model for GPU Cluster. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28430-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28429-3

  • Online ISBN: 978-3-319-28430-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics