Abstract
Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive-based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards offloading computations to accelerators (typically one), OpenMC aims to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.
Similar content being viewed by others
References
Owens J, Luebke D, Govindaraju N et al. A survey of general purpose computation on graphics hardware. Computer Graphics Forum, 2007, 26(3): 80-113.
Sherlekar S. Tutorial: Intel many integrated core (MIC) architecture. In Proc. the 18th ICPADS, Dec. 2012, p.947.
Yang X, Liao X, Xu W et al. TH-1: China’s first petaflop supercomputer. Frontiers of Computer Science in China, 2010, 4(4): 445-455.
Yang X, Liao X, Lu K et al. The TianHe-1A supercomputer: Its hardware and software. Journal of Computer Science and Technology, 2011, 26(3): 344-351.
Kirk D. NVIDIA CUDA software and GPU parallel computing architecture. In Proc. International Symposium on Memory Management, Oct. 2007, pp.103-104.
Gaster B, Howes L, Kaeli D et al. Heterogeneous Computing with OpenCL — Revised OpenCL 1.2 Edition. Morgan Kaufmann, 2013.
Lee S, Vetter J. Early evaluation of directive-based GPU programming models for productive exascale computing. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.23.
Wienke S, Springer P, Terboven C et al. OpenACC: First experiences with real-world applications. In Proc. the 18th Int. Conf. Euro-Par Parallel Processing, Aug. 2012, pp.859-870.
Chapman B, Gropp W, Kumaran K et al. (eds.). OpenMP in the Petascale Era Springer, 2011.
Petitet A, Whaley R, Dongarra J et al. HPL — A portable implementation of the high-performance linpack benchmark for distributed-memory computers, Sept. 2008. http://www.netlib.org/benchmark/hpl/, Mar. 2014.
Henning J. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000, 33(7): 28-35.
Plimpton S. Fast parallel algorithms for short-range molecular dynamics. J. Computational Physics, 1995, 117(1): 1-19.
Zhang A, Mo Z. Parallelization of lared-p codes for simulation of laser plasma interactions. Technical Report, ZW-J-2002045, Institute of Applied Physics and Computational Mathematics, 2002.
Kim J, Seo S, Lee J et al. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proc. the 26th ACM Int. Conf. Supercomputing, Jun. 2012, pp.341-352.
Cui H, Wang L, Xue J et al. Automatic library generation for BLAS3 on GPUs. In Proc. IEEE Int. Parallel and Distributed Processing Symposium, May 2011, pp.255-265.
Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. the 39th Int. Conf. Parallel Processing, Sept. 2010, pp.40-50.
Di P, Xue J. Model-driven tile size selection for DOACROSS loops on GPUs. In Proc. 2011 Int. Conf. Euro-Par Parallel Processing, Aug. 2011, pp.401-412.
Diogo M, Grelck C. Towards heterogeneous computing without heterogeneous programming. In Proc. the 13th Int. Symp. Trends in Functional Programming, June 2012, pp.279-294.
Baskaran M, Ramanujam J, Sadayappan P. Automatic C-to-CUDA code generation for affine programs. In Proc. the 19th Int. Conf. Compiler Construction, Mar. 2010, pp.244-263.
Cunningham D, Bordawekar R, Saraswat V. GPU programming in a high level language: Compiling X10 to CUDA. In Proc. the 2011 ACM SIGPLAN X10 Workshop, Jun. 2011, Article No.8.
Ohshima S, Hirasawa S, Honda H. OMPCUDA: OpenMP execution framework for CUDA based on Omni OpenMP compiler. In Proc. the 6th Int. Workshop. OpenMP, June 2010, pp.161-173.
Lee S, Min S, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.
Lee S, Eigenmann R. OpenMPC: Extended OpenMP programming and tuning for GPUs. In Proc. the 2010 ACM/IEEE Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2010, pp.1-11.
Hormati A, Samadi M,Woh M et al. Sponge: Portable stream programming on graphics engines. In Proc. the 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.381-392.
Yang Y, Xiang P, Kong J et al. A GPGPU compiler for memory optimization and parallelism management. ACM SIG-PLAN Notices, 2010, 45(6): 86-97.
Wu B, Zhao Z, Zhang E et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proc. the 18th PPoPP, Feb. 2013, pp.57-68.
Reyes R, Lopez I, Fumero J et al. accull: An user-directed approach to heterogeneous programming. In Proc. IEEE the 10th ISPA, Jul. 2012, pp.654-661.
Han T, Abdelrahman T. hiCUDA: A high-level directive-based language for GPU programming. In Proc. the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Mar. 2009, pp.52-61.
Duran A, Ayguadé E, Badia R et al. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 2011, 21(2): 173-193.
Auerbach J, Bacon D, Burcea I et al. A compiler and run-time for heterogeneous computing. In Proc. the 49th Annual Conference on Design Automation, Jun. 2012, pp.271-276.
Dubach C, Cheng P, Rabbah R et al. Compiling a high-level language for GPUs: (Via language support for architectures and compilers). In Proc. the 33rd PLDI, Jun. 2012, pp.1-12.
Cooper P, Dolinsky U, Donaldson A et al. Offload-automating code migration to heterogeneous multicore systems. In Proc. the 5th HiPEAC, Jan. 2010, pp.337-352.
Beyer J, Stotzer E, Hart A et al. OpenMP for accelerators. In Proc. the 7th Int. Conf. OpenMP in the Petascale Era, June 2011, pp.108-121.
UPC Consortium. UPC language specifications v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab, 2005. http://upc.gwu.edu/docs/upc_specs_1.2.pdf, Mar. 2014.
Saraswat V, Bloom B, Peshansky I et al. X10 language specification version 2.4. Technical Report, IBM, January 2012, http://x10.sourceforge.net/documentation/languagespec/x-10-latest.pdf, Mar. 2014.
Chamberlain B, Callahan D, Zima H. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 2007, 21(3): 291-312.
Hwu W W. GPU Computing Gems Jade Edition. Morgan Kaufmann, 2011.
Garland M, Kudlur M, Zheng Y. Designing a unified programming model for heterogeneous machines. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.67.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A301, and the National Natural Science Foundation of China under Grant No. 61170049.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 76 kb)
Rights and permissions
About this article
Cite this article
Liao, XK., Yung, CQ., Tang, T. et al. OpenMC: Towards Simplifying Programming for TianHe Supercomputers. J. Comput. Sci. Technol. 29, 532–546 (2014). https://doi.org/10.1007/s11390-014-1447-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-014-1447-4