A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

Li, Hung-Fu; Liang, Tyng-Yeu; Chiu, Jun-Yao

doi:10.1007/s11227-013-0912-0

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

Published: 20 March 2013

Volume 66, pages 381–405, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hung-Fu Li¹,
Tyng-Yeu Liang¹ &
Jun-Yao Chiu¹

452 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we propose a program development toolkit called OMPICUDA for hybrid CPU/GPU clusters. With the support of this toolkit, users can make use of a familiar programming model, i.e., compound OpenMP and MPI instead of mixed CUDA and MPI or SDSM to develop their applications on a hybrid CPU/GPU cluster. In addition, they can adapt the types of resources used for executing different parallel regions in the same program by means of an extended device directive according to the property of each parallel region. On the other hand, this programming toolkit supports a set of data-partition interfaces for users to achieve load balance at the application level no matter what type of resources are used for the execution of their programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Peter Thoman & Philip Salzmann

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Rui S. Silva & João L. Sobral

GPU Architecture

References

Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell T (2007) A survey of general purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113
Article Google Scholar
Top500 list, Nov 2012, Referenced from http://www.top500.org
Titan supercomputer, referenced from http://www.olcf.ornl.gov/titan/
Yang X-J, Liao X-K, Lu K, Hu Q-F, Song J-Q, Su J-S (2011) The TianHe-1A supercomputer: its hardware and software. J Comput Sci Technol 26(3):344–351
Article Google Scholar
Vecchiola C, Pandey S, Buyya R (2009) High-performance cloud computing: a view of scientific applications. In: Proceedings of 10th international symposium on pervasive systems, algorithms, and networks, pp 4–16
Chapter Google Scholar
Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22:789–828
Article MATH Google Scholar
The OpenMP Forum (1998) OpenMP C and C++ application program interface, version 1.0. http://www.openmp.org
NVIDIA CUDA programming guide version 2.1.1. http://www.nvidia.com.tw/object/cuda_develop_tw_old.html
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73
Article Google Scholar
Amza C, Cox AL, Dwarkadas H, Keleher P, Lu H, Rajamony R, Yu W, Zwaenepoel W (1996) TreadMarks: shared memory computing on networks of workstations. IEEE Comput 29(2):18–28
Article Google Scholar
Clark C, Fraser K, Hand SM, Hansen JG, Jul EB, Limpach C, Pratt IA, Warfield A (2005) Live migration of virtual machines. Proceedings of the 2nd Conference on Symposium on Networked Systems Design and Implementation 2:273–286
Google Scholar
Basumallik A, Min S-j, Eigenmann R (2012) Towards OpenMP execution on software distributed shared memory systems. In: Proceedings of WOMPEI’02. Lecture notes in computer science, vol 2327, pp 457–468
Google Scholar
Microsoft, “HLSL for DirectX”. http://msdn.microsoft.com/en-us/library/windows/desktop/bb509561.aspx
Kessenich J, Baldwin D, Rost R (2011) The OpenGL shader language
Fernando R, Kilgard MJ (2003) The Cg tutorial: the definitive guide to programmable real-time graphics. Addison-Wesley Professional, Reading. ISBN 0-321-19496-9
Google Scholar
Yan Y, Grossman M, Sarkar V (2009) JCUDA: a programmer-friendly interface for accelerating Java programs with CUDA. In: Euro-par 2009 parallel processing. Lecture notes in computer science, vol 5704, pp 887–899
Chapter Google Scholar
Dotzler G, Veldema R, Klemm M (2010) JCudaMP:OpenMP/Java on CUDA. In: Proceeding of the 3rd international workshops on multicore software engineering, pp 10–17
Chapter Google Scholar
Chen Q-k, Zhang J-k (2009) A stream processor cluster architecture model with the hybrid technology of MPI and CUDA. In: Proceeding of 2009 1st international conference on information science and engineering, pp 26–28
Google Scholar
Han TD, Abdelrahman TS (2011) hiCUDA: high-level GPGPU programming. IEEE Trans Parallel Distrib Syst 22(1):78–90
Article Google Scholar
Noaje G, Jaillet C, Krajecki M (2011) Source-to-source code translator: OpenMP C to CUDA. In: IEEE 13th international conference on high performance computing and communications (HPCC), pp 512–519
Google Scholar
Lee S, Eigenmann R (2010) OpenMPC: extended OpenMP programming and tuning for GPUs. In: 2010 international conference for high performance computing, networking, storage and analysis (SC), pp 1–11
Chapter Google Scholar
He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, pp 260–269
Chapter Google Scholar
Dolbeau R, Bihan S, Bodin F (2007) HMPP: a hybrid multi-core parallel programming environment. In: The proceedings of the workshop on general purpose processing on graphics processing units (GPGPU 2007)
Google Scholar
Tsai T-C (2010) OMP2OCL translator: a translator for automatic translation of OpenMP programs into OpenCL programs. Mater Thesis, Institute of Computer Science and Engineering, National Chiao-Tung University
Dean J, Ghmawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
OpenACC API 1.0 (2012) http://www.openacc-standard.org/Downloads/
Liang T-Y, Li H-F, Chiu J-Y (2012) Enabling mixed OpenMP/MPI programming on hybrid CPU/GPU computing architecture. In: IPDPS 2102, pp 2369–2377
Google Scholar
Liang T-Y, Chang Y-W, Li H-F (2012) A CUDA programming toolkit on grids. Int J Grid Util Comput 3(2/3):97–111
Article Google Scholar
Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) KVM: Linux virtual machine monitor. In: Proceedings of the Linux symposium, vol 1, pp 225–230
Google Scholar
Li H-F, Liang T-Y, Jiang J-L (2011) An OpenMP compiler for hybrid CPU/GPU computing architecture. In: Third international conference on intelligent networking and collaborative systems, pp 209–216
Chapter Google Scholar
Kusano K, Sato M, Hosomi T, Seo Y (2001) The omni OpenMP compiler on the distributed shared memory of Cenju-4. In: OpenMP shared memory parallel programming. Lecture notes in computer science, vol 2104, pp 20–30
Chapter Google Scholar
Lee S, Min S-J, Eigenmann R (2009) OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 101–110
Google Scholar
Conway ME (1963) Design of a separable transition-diagram compiler. Commun ACM, 396–408
NVIDIA Development Zone (2012) CUDA C best practices guide, pp 51–52. http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation
Almási G, Heidelberger P, Archer CJ, Martorell X, Erway CC, Moreira JE, Steinmacher-Burow B, Zheng Y (2005) Optimization of MPI collective communication on BlueGene/L systems. In: Proceedings of 19th annual international conference on supercomputing, pp 253–262
Chapter Google Scholar
Vadhiyar S, Fagg G, Dongarra J (2000) Automatically tuned collective communications. In: Proceedings of the 2000 ACM/IEEE conference on supercomputing
Google Scholar
Corbalan J, Duran A, Labarta J (2004) Dynamic load balancing of MPI+OpenMP applications. In: International conference on parallel processing 2004, vol 1, pp 195–202
Chapter Google Scholar
Zhang K, Wu B (2012) Task scheduling for GPU heterogeneous cluster. In: 2012 IEEE international conference on cluster computing workshops, pp 161–169
Chapter Google Scholar
Nian S, Guangmin L (2009) Dynamic load balancing algorithm for MPI parallel computing. In: 2009 international conference on new trends in information and service science, pp 95–99
Chapter Google Scholar
Galindo I, Almeida F (2008) Dynamic load balancing on dedicated heterogeneous systems. In: Proceedings of 15th Euro PVM/MPI, pp 64–74
Google Scholar

Download references

Acknowledgements

We would like to thank National Science Council of the Republic of China for their grant support with the project number of NSC 99-2221-E-151-055-MY3.

Author information

Authors and Affiliations

Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, No. 415, Chien-Kung Road, Kaohsiung, Taiwan, R.O.C.
Hung-Fu Li, Tyng-Yeu Liang & Jun-Yao Chiu

Authors

Hung-Fu Li
View author publications
You can also search for this author in PubMed Google Scholar
Tyng-Yeu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Yao Chiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tyng-Yeu Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HF., Liang, TY. & Chiu, JY. A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66, 381–405 (2013). https://doi.org/10.1007/s11227-013-0912-0

Download citation

Published: 20 March 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11227-013-0912-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Efficient High-Level Programming in Plain Java

GPU Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Efficient High-Level Programming in Plain Java

GPU Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation