Skip to main content
Log in

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we propose a program development toolkit called OMPICUDA for hybrid CPU/GPU clusters. With the support of this toolkit, users can make use of a familiar programming model, i.e., compound OpenMP and MPI instead of mixed CUDA and MPI or SDSM to develop their applications on a hybrid CPU/GPU cluster. In addition, they can adapt the types of resources used for executing different parallel regions in the same program by means of an extended device directive according to the property of each parallel region. On the other hand, this programming toolkit supports a set of data-partition interfaces for users to achieve load balance at the application level no matter what type of resources are used for the execution of their programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell T (2007) A survey of general purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113

    Article  Google Scholar 

  2. Top500 list, Nov 2012, Referenced from http://www.top500.org

  3. Titan supercomputer, referenced from http://www.olcf.ornl.gov/titan/

  4. Yang X-J, Liao X-K, Lu K, Hu Q-F, Song J-Q, Su J-S (2011) The TianHe-1A supercomputer: its hardware and software. J Comput Sci Technol 26(3):344–351

    Article  Google Scholar 

  5. Vecchiola C, Pandey S, Buyya R (2009) High-performance cloud computing: a view of scientific applications. In: Proceedings of 10th international symposium on pervasive systems, algorithms, and networks, pp 4–16

    Chapter  Google Scholar 

  6. Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22:789–828

    Article  MATH  Google Scholar 

  7. The OpenMP Forum (1998) OpenMP C and C++ application program interface, version 1.0. http://www.openmp.org

  8. NVIDIA CUDA programming guide version 2.1.1. http://www.nvidia.com.tw/object/cuda_develop_tw_old.html

  9. Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73

    Article  Google Scholar 

  10. Amza C, Cox AL, Dwarkadas H, Keleher P, Lu H, Rajamony R, Yu W, Zwaenepoel W (1996) TreadMarks: shared memory computing on networks of workstations. IEEE Comput 29(2):18–28

    Article  Google Scholar 

  11. Clark C, Fraser K, Hand SM, Hansen JG, Jul EB, Limpach C, Pratt IA, Warfield A (2005) Live migration of virtual machines. Proceedings of the 2nd Conference on Symposium on Networked Systems Design and Implementation 2:273–286

    Google Scholar 

  12. Basumallik A, Min S-j, Eigenmann R (2012) Towards OpenMP execution on software distributed shared memory systems. In: Proceedings of WOMPEI’02. Lecture notes in computer science, vol 2327, pp 457–468

    Google Scholar 

  13. Microsoft, “HLSL for DirectX”. http://msdn.microsoft.com/en-us/library/windows/desktop/bb509561.aspx

  14. Kessenich J, Baldwin D, Rost R (2011) The OpenGL shader language

  15. Fernando R, Kilgard MJ (2003) The Cg tutorial: the definitive guide to programmable real-time graphics. Addison-Wesley Professional, Reading. ISBN 0-321-19496-9

    Google Scholar 

  16. Yan Y, Grossman M, Sarkar V (2009) JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Euro-par 2009 parallel processing. Lecture notes in computer science, vol 5704, pp 887–899

    Chapter  Google Scholar 

  17. Dotzler G, Veldema R, Klemm M (2010) JCudaMP:OpenMP/Java on CUDA. In: Proceeding of the 3rd international workshops on multicore software engineering, pp 10–17

    Chapter  Google Scholar 

  18. Chen Q-k, Zhang J-k (2009) A stream processor cluster architecture model with the hybrid technology of MPI and CUDA. In: Proceeding of 2009 1st international conference on information science and engineering, pp 26–28

    Google Scholar 

  19. Han TD, Abdelrahman TS (2011) hiCUDA: high-level GPGPU programming. IEEE Trans Parallel Distrib Syst 22(1):78–90

    Article  Google Scholar 

  20. Noaje G, Jaillet C, Krajecki M (2011) Source-to-source code translator: OpenMP C to CUDA. In: IEEE 13th international conference on high performance computing and communications (HPCC), pp 512–519

    Google Scholar 

  21. Lee S, Eigenmann R (2010) OpenMPC: extended OpenMP programming and tuning for GPUs. In: 2010 international conference for high performance computing, networking, storage and analysis (SC), pp 1–11

    Chapter  Google Scholar 

  22. He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 260–269

    Chapter  Google Scholar 

  23. Dolbeau R, Bihan S, Bodin F (2007) HMPP: a hybrid multi-core parallel programming environment. In: The proceedings of the workshop on general purpose processing on graphics processing units (GPGPU 2007)

    Google Scholar 

  24. Tsai T-C (2010) OMP2OCL translator: a translator for automatic translation of OpenMP programs into OpenCL programs. Mater Thesis, Institute of Computer Science and Engineering, National Chiao-Tung University

  25. Dean J, Ghmawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  26. OpenACC API 1.0 (2012) http://www.openacc-standard.org/Downloads/

  27. Liang T-Y, Li H-F, Chiu J-Y (2012) Enabling mixed OpenMP/MPI programming on hybrid CPU/GPU computing architecture. In: IPDPS 2102, pp 2369–2377

    Google Scholar 

  28. Liang T-Y, Chang Y-W, Li H-F (2012) A CUDA programming toolkit on grids. Int J Grid Util Comput 3(2/3):97–111

    Article  Google Scholar 

  29. Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) KVM: Linux virtual machine monitor. In: Proceedings of the Linux symposium, vol 1, pp 225–230

    Google Scholar 

  30. Li H-F, Liang T-Y, Jiang J-L (2011) An OpenMP compiler for hybrid CPU/GPU computing architecture. In: Third international conference on intelligent networking and collaborative systems, pp 209–216

    Chapter  Google Scholar 

  31. Kusano K, Sato M, Hosomi T, Seo Y (2001) The omni OpenMP compiler on the distributed shared memory of Cenju-4. In: OpenMP shared memory parallel programming. Lecture notes in computer science, vol 2104, pp 20–30

    Chapter  Google Scholar 

  32. Lee S, Min S-J, Eigenmann R (2009) OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 101–110

    Google Scholar 

  33. Conway ME (1963) Design of a separable transition-diagram compiler. Commun ACM, 396–408

  34. NVIDIA Development Zone (2012) CUDA C best practices guide, pp 51–52. http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation

  35. Almási G, Heidelberger P, Archer CJ, Martorell X, Erway CC, Moreira JE, Steinmacher-Burow B, Zheng Y (2005) Optimization of MPI collective communication on BlueGene/L systems. In: Proceedings of 19th annual international conference on supercomputing, pp 253–262

    Chapter  Google Scholar 

  36. Vadhiyar S, Fagg G, Dongarra J (2000) Automatically tuned collective communications. In: Proceedings of the 2000 ACM/IEEE conference on supercomputing

    Google Scholar 

  37. Corbalan J, Duran A, Labarta J (2004) Dynamic load balancing of MPI+OpenMP applications. In: International conference on parallel processing 2004, vol 1, pp 195–202

    Chapter  Google Scholar 

  38. Zhang K, Wu B (2012) Task scheduling for GPU heterogeneous cluster. In: 2012 IEEE international conference on cluster computing workshops, pp 161–169

    Chapter  Google Scholar 

  39. Nian S, Guangmin L (2009) Dynamic load balancing algorithm for MPI parallel computing. In: 2009 international conference on new trends in information and service science, pp 95–99

    Chapter  Google Scholar 

  40. Galindo I, Almeida F (2008) Dynamic load balancing on dedicated heterogeneous systems. In: Proceedings of 15th Euro PVM/MPI, pp 64–74

    Google Scholar 

Download references

Acknowledgements

We would like to thank National Science Council of the Republic of China for their grant support with the project number of NSC 99-2221-E-151-055-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tyng-Yeu Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Liang, TY. & Chiu, JY. A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66, 381–405 (2013). https://doi.org/10.1007/s11227-013-0912-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-0912-0

Keywords

Navigation