OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Liao, Xiang-Ke; Yung, Can-Qun; Tang, Tao; Yi, Hui-Zhan; Wang, Feng; Wu, Qiang; Xue, Jingling

doi:10.1007/s11390-014-1447-4

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Regular Paper
Published: 17 May 2014

Volume 29, pages 532–546, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xiang-Ke Liao¹,
Can-Qun Yung¹,
Tao Tang¹,
Hui-Zhan Yi¹,
Feng Wang¹,
Qiang Wu¹ &
…
Jingling Xue²

183 Accesses
13 Citations
Explore all metrics

Abstract

Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive-based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards offloading computations to accelerators (typically one), OpenMC aims to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach

Enabling ISO Standard Languages for Complex HPC Workflows

Celerity: High-Level C++ for Accelerator Clusters

References

Owens J, Luebke D, Govindaraju N et al. A survey of general purpose computation on graphics hardware. Computer Graphics Forum, 2007, 26(3): 80-113.
Article Google Scholar
Sherlekar S. Tutorial: Intel many integrated core (MIC) architecture. In Proc. the 18th ICPADS, Dec. 2012, p.947.
Yang X, Liao X, Xu W et al. TH-1: China’s first petaflop supercomputer. Frontiers of Computer Science in China, 2010, 4(4): 445-455.
Article Google Scholar
Yang X, Liao X, Lu K et al. The TianHe-1A supercomputer: Its hardware and software. Journal of Computer Science and Technology, 2011, 26(3): 344-351.
Article Google Scholar
Kirk D. NVIDIA CUDA software and GPU parallel computing architecture. In Proc. International Symposium on Memory Management, Oct. 2007, pp.103-104.
Gaster B, Howes L, Kaeli D et al. Heterogeneous Computing with OpenCL — Revised OpenCL 1.2 Edition. Morgan Kaufmann, 2013.
Lee S, Vetter J. Early evaluation of directive-based GPU programming models for productive exascale computing. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.23.
Wienke S, Springer P, Terboven C et al. OpenACC: First experiences with real-world applications. In Proc. the 18th Int. Conf. Euro-Par Parallel Processing, Aug. 2012, pp.859-870.
Chapman B, Gropp W, Kumaran K et al. (eds.). OpenMP in the Petascale Era Springer, 2011.
Petitet A, Whaley R, Dongarra J et al. HPL — A portable implementation of the high-performance linpack benchmark for distributed-memory computers, Sept. 2008. http://www.netlib.org/benchmark/hpl/, Mar. 2014.
Henning J. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000, 33(7): 28-35.
Article Google Scholar
Plimpton S. Fast parallel algorithms for short-range molecular dynamics. J. Computational Physics, 1995, 117(1): 1-19.
Article MATH Google Scholar
Zhang A, Mo Z. Parallelization of lared-p codes for simulation of laser plasma interactions. Technical Report, ZW-J-2002045, Institute of Applied Physics and Computational Mathematics, 2002.
Kim J, Seo S, Lee J et al. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proc. the 26th ACM Int. Conf. Supercomputing, Jun. 2012, pp.341-352.
Cui H, Wang L, Xue J et al. Automatic library generation for BLAS3 on GPUs. In Proc. IEEE Int. Parallel and Distributed Processing Symposium, May 2011, pp.255-265.
Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. the 39th Int. Conf. Parallel Processing, Sept. 2010, pp.40-50.
Di P, Xue J. Model-driven tile size selection for DOACROSS loops on GPUs. In Proc. 2011 Int. Conf. Euro-Par Parallel Processing, Aug. 2011, pp.401-412.
Diogo M, Grelck C. Towards heterogeneous computing without heterogeneous programming. In Proc. the 13th Int. Symp. Trends in Functional Programming, June 2012, pp.279-294.
Baskaran M, Ramanujam J, Sadayappan P. Automatic C-to-CUDA code generation for affine programs. In Proc. the 19th Int. Conf. Compiler Construction, Mar. 2010, pp.244-263.
Cunningham D, Bordawekar R, Saraswat V. GPU programming in a high level language: Compiling X10 to CUDA. In Proc. the 2011 ACM SIGPLAN X10 Workshop, Jun. 2011, Article No.8.
Ohshima S, Hirasawa S, Honda H. OMPCUDA: OpenMP execution framework for CUDA based on Omni OpenMP compiler. In Proc. the 6th Int. Workshop. OpenMP, June 2010, pp.161-173.
Lee S, Min S, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.
Lee S, Eigenmann R. OpenMPC: Extended OpenMP programming and tuning for GPUs. In Proc. the 2010 ACM/IEEE Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2010, pp.1-11.
Hormati A, Samadi M,Woh M et al. Sponge: Portable stream programming on graphics engines. In Proc. the 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.381-392.
Yang Y, Xiang P, Kong J et al. A GPGPU compiler for memory optimization and parallelism management. ACM SIG-PLAN Notices, 2010, 45(6): 86-97.
Article Google Scholar
Wu B, Zhao Z, Zhang E et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proc. the 18th PPoPP, Feb. 2013, pp.57-68.
Reyes R, Lopez I, Fumero J et al. accull: An user-directed approach to heterogeneous programming. In Proc. IEEE the 10th ISPA, Jul. 2012, pp.654-661.
Han T, Abdelrahman T. hiCUDA: A high-level directive-based language for GPU programming. In Proc. the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Mar. 2009, pp.52-61.
Duran A, Ayguadé E, Badia R et al. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 2011, 21(2): 173-193.
Article MathSciNet Google Scholar
Auerbach J, Bacon D, Burcea I et al. A compiler and run-time for heterogeneous computing. In Proc. the 49th Annual Conference on Design Automation, Jun. 2012, pp.271-276.
Dubach C, Cheng P, Rabbah R et al. Compiling a high-level language for GPUs: (Via language support for architectures and compilers). In Proc. the 33rd PLDI, Jun. 2012, pp.1-12.
Cooper P, Dolinsky U, Donaldson A et al. Offload-automating code migration to heterogeneous multicore systems. In Proc. the 5th HiPEAC, Jan. 2010, pp.337-352.
Beyer J, Stotzer E, Hart A et al. OpenMP for accelerators. In Proc. the 7th Int. Conf. OpenMP in the Petascale Era, June 2011, pp.108-121.
UPC Consortium. UPC language specifications v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab, 2005. http://upc.gwu.edu/docs/upc_specs_1.2.pdf, Mar. 2014.
Saraswat V, Bloom B, Peshansky I et al. X10 language specification version 2.4. Technical Report, IBM, January 2012, http://x10.sourceforge.net/documentation/languagespec/x-10-latest.pdf, Mar. 2014.
Chamberlain B, Callahan D, Zima H. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 2007, 21(3): 291-312.
Article Google Scholar
Hwu W W. GPU Computing Gems Jade Edition. Morgan Kaufmann, 2011.
Garland M, Kudlur M, Zheng Y. Designing a unified programming model for heterogeneous machines. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.67.

Download references

Author information

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Changsha, 410073, China
Xiang-Ke Liao, Can-Qun Yung, Tao Tang, Hui-Zhan Yi, Feng Wang & Qiang Wu
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Jingling Xue

Authors

Xiang-Ke Liao
View author publications
You can also search for this author in PubMed Google Scholar
Can-Qun Yung
View author publications
You can also search for this author in PubMed Google Scholar
Tao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Zhan Yi
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jingling Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang-Ke Liao.

Additional information

This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A301, and the National Natural Science Foundation of China under Grant No. 61170049.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 76 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liao, XK., Yung, CQ., Tang, T. et al. OpenMC: Towards Simplifying Programming for TianHe Supercomputers. J. Comput. Sci. Technol. 29, 532–546 (2014). https://doi.org/10.1007/s11390-014-1447-4

Download citation

Received: 29 August 2013
Revised: 21 January 2014
Published: 17 May 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11390-014-1447-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Abstract

Access this article

Similar content being viewed by others

Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach

Enabling ISO Standard Languages for Complex HPC Workflows

Celerity: High-Level C++ for Accelerator Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Abstract

Access this article

Similar content being viewed by others

Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach

Enabling ISO Standard Languages for Complex HPC Workflows

Celerity: High-Level C++ for Accelerator Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation