Skip to main content
Log in

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive-based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards offloading computations to accelerators (typically one), OpenMC aims to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Owens J, Luebke D, Govindaraju N et al. A survey of general purpose computation on graphics hardware. Computer Graphics Forum, 2007, 26(3): 80-113.

    Article  Google Scholar 

  2. Sherlekar S. Tutorial: Intel many integrated core (MIC) architecture. In Proc. the 18th ICPADS, Dec. 2012, p.947.

  3. Yang X, Liao X, Xu W et al. TH-1: China’s first petaflop supercomputer. Frontiers of Computer Science in China, 2010, 4(4): 445-455.

    Article  Google Scholar 

  4. Yang X, Liao X, Lu K et al. The TianHe-1A supercomputer: Its hardware and software. Journal of Computer Science and Technology, 2011, 26(3): 344-351.

    Article  Google Scholar 

  5. Kirk D. NVIDIA CUDA software and GPU parallel computing architecture. In Proc. International Symposium on Memory Management, Oct. 2007, pp.103-104.

  6. Gaster B, Howes L, Kaeli D et al. Heterogeneous Computing with OpenCL — Revised OpenCL 1.2 Edition. Morgan Kaufmann, 2013.

  7. Lee S, Vetter J. Early evaluation of directive-based GPU programming models for productive exascale computing. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.23.

  8. Wienke S, Springer P, Terboven C et al. OpenACC: First experiences with real-world applications. In Proc. the 18th Int. Conf. Euro-Par Parallel Processing, Aug. 2012, pp.859-870.

  9. Chapman B, Gropp W, Kumaran K et al. (eds.). OpenMP in the Petascale Era Springer, 2011.

  10. Petitet A, Whaley R, Dongarra J et al. HPL — A portable implementation of the high-performance linpack benchmark for distributed-memory computers, Sept. 2008. http://www.netlib.org/benchmark/hpl/, Mar. 2014.

  11. Henning J. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 2000, 33(7): 28-35.

    Article  Google Scholar 

  12. Plimpton S. Fast parallel algorithms for short-range molecular dynamics. J. Computational Physics, 1995, 117(1): 1-19.

    Article  MATH  Google Scholar 

  13. Zhang A, Mo Z. Parallelization of lared-p codes for simulation of laser plasma interactions. Technical Report, ZW-J-2002045, Institute of Applied Physics and Computational Mathematics, 2002.

  14. Kim J, Seo S, Lee J et al. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In Proc. the 26th ACM Int. Conf. Supercomputing, Jun. 2012, pp.341-352.

  15. Cui H, Wang L, Xue J et al. Automatic library generation for BLAS3 on GPUs. In Proc. IEEE Int. Parallel and Distributed Processing Symposium, May 2011, pp.255-265.

  16. Di P, Wan Q, Zhang X et al. Toward harnessing DOACROSS parallelism for multi-GPGPUs. In Proc. the 39th Int. Conf. Parallel Processing, Sept. 2010, pp.40-50.

  17. Di P, Xue J. Model-driven tile size selection for DOACROSS loops on GPUs. In Proc. 2011 Int. Conf. Euro-Par Parallel Processing, Aug. 2011, pp.401-412.

  18. Diogo M, Grelck C. Towards heterogeneous computing without heterogeneous programming. In Proc. the 13th Int. Symp. Trends in Functional Programming, June 2012, pp.279-294.

  19. Baskaran M, Ramanujam J, Sadayappan P. Automatic C-to-CUDA code generation for affine programs. In Proc. the 19th Int. Conf. Compiler Construction, Mar. 2010, pp.244-263.

  20. Cunningham D, Bordawekar R, Saraswat V. GPU programming in a high level language: Compiling X10 to CUDA. In Proc. the 2011 ACM SIGPLAN X10 Workshop, Jun. 2011, Article No.8.

  21. Ohshima S, Hirasawa S, Honda H. OMPCUDA: OpenMP execution framework for CUDA based on Omni OpenMP compiler. In Proc. the 6th Int. Workshop. OpenMP, June 2010, pp.161-173.

  22. Lee S, Min S, Eigenmann R. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. In Proc. the 14th PPoPP, Feb. 2009, pp.101-110.

  23. Lee S, Eigenmann R. OpenMPC: Extended OpenMP programming and tuning for GPUs. In Proc. the 2010 ACM/IEEE Int. Conf. High Performance Computing, Networking, Storage and Analysis, Nov. 2010, pp.1-11.

  24. Hormati A, Samadi M,Woh M et al. Sponge: Portable stream programming on graphics engines. In Proc. the 16th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.381-392.

  25. Yang Y, Xiang P, Kong J et al. A GPGPU compiler for memory optimization and parallelism management. ACM SIG-PLAN Notices, 2010, 45(6): 86-97.

    Article  Google Scholar 

  26. Wu B, Zhao Z, Zhang E et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU. In Proc. the 18th PPoPP, Feb. 2013, pp.57-68.

  27. Reyes R, Lopez I, Fumero J et al. accull: An user-directed approach to heterogeneous programming. In Proc. IEEE the 10th ISPA, Jul. 2012, pp.654-661.

  28. Han T, Abdelrahman T. hiCUDA: A high-level directive-based language for GPU programming. In Proc. the 2nd Workshop on General Purpose Processing on Graphics Processing Units, Mar. 2009, pp.52-61.

  29. Duran A, Ayguadé E, Badia R et al. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 2011, 21(2): 173-193.

    Article  MathSciNet  Google Scholar 

  30. Auerbach J, Bacon D, Burcea I et al. A compiler and run-time for heterogeneous computing. In Proc. the 49th Annual Conference on Design Automation, Jun. 2012, pp.271-276.

  31. Dubach C, Cheng P, Rabbah R et al. Compiling a high-level language for GPUs: (Via language support for architectures and compilers). In Proc. the 33rd PLDI, Jun. 2012, pp.1-12.

  32. Cooper P, Dolinsky U, Donaldson A et al. Offload-automating code migration to heterogeneous multicore systems. In Proc. the 5th HiPEAC, Jan. 2010, pp.337-352.

  33. Beyer J, Stotzer E, Hart A et al. OpenMP for accelerators. In Proc. the 7th Int. Conf. OpenMP in the Petascale Era, June 2011, pp.108-121.

  34. UPC Consortium. UPC language specifications v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab, 2005. http://upc.gwu.edu/docs/upc_specs_1.2.pdf, Mar. 2014.

  35. Saraswat V, Bloom B, Peshansky I et al. X10 language specification version 2.4. Technical Report, IBM, January 2012, http://x10.sourceforge.net/documentation/languagespec/x-10-latest.pdf, Mar. 2014.

  36. Chamberlain B, Callahan D, Zima H. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications, 2007, 21(3): 291-312.

    Article  Google Scholar 

  37. Hwu W W. GPU Computing Gems Jade Edition. Morgan Kaufmann, 2011.

  38. Garland M, Kudlur M, Zheng Y. Designing a unified programming model for heterogeneous machines. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2012, Article No.67.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang-Ke Liao.

Additional information

This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A301, and the National Natural Science Foundation of China under Grant No. 61170049.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 76 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, XK., Yung, CQ., Tang, T. et al. OpenMC: Towards Simplifying Programming for TianHe Supercomputers. J. Comput. Sci. Technol. 29, 532–546 (2014). https://doi.org/10.1007/s11390-014-1447-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1447-4

Keywords

Navigation