Skip to main content
Log in

Paradigmatic shifts for exascale supercomputing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As the next generation of supercomputers reaches the exascale, the dominant design parameter governing performance will shift from hardware to software. Intelligent usage of memory access, vectorization, and intranode threading will become critical to the performance of scientific applications and numerical calculations on exascale supercomputers. Although challenges remain in effectively programming the heterogeneous devices likely to be utilized in future supercomputers, new languages and tools are providing a pathway for application developers to tackle this new frontier. These languages include open programming standards such as OpenCL and OpenACC, as well as widely-adopted languages such as CUDA; also of importance are high-quality libraries such as CUDPP and Thrust. This article surveys a purposely diverse set of proof-of-concept applications developed at Los Alamos National Laboratory. We find that the capability level of the accelerator computing hardware and languages has moved beyond the regular grid finite difference calculations and molecular dynamics codes. More advanced applications requiring dynamic memory allocation, such as cell-based adaptive mesh refinement, can now be addressed—and with more effort even unstructured mesh codes can be moved to the GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Listing 1

Similar content being viewed by others

References

  1. Bergen BK, Daniels MG, Weber PM (2010) A hybrid programming model for compressible gas dynamics using OpenCL. In: 2010 39th international conference on parallel processing workshops. doi:10.1109/ICPPW.2010.60

    Google Scholar 

  2. Bhatele A (2010) Automating topology aware mapping for supercomputers. Dissertation, University of Illinois at Urbana–Champaign

  3. Boillat J, Burkhart H, Decker K, Kropf P (1991) Parallel computing in the 1990’s: attacking the software problem. Phys Rep 207(3–5):141–165

    Article  Google Scholar 

  4. Bowers KJ, Albright J, Bergen B, Yin L, Barker J, Kerbyson DJ (2008) 0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08. IEEE, Piscataway, pp 63:1–63:11

    Google Scholar 

  5. Bowers KJ, Albright BJ, Yin L, Bergen B, Twan T (2008) Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Phys Plasmas 15(5):055703. doi:10.1063/1.2840133

    Article  Google Scholar 

  6. Casaglia G (1976) Distributed computing systems: a biased review. Euromicro Newsl 2(4):5–18

    Article  Google Scholar 

  7. Chen S, Gibbons B, Nath S (2011) Rethinking database algorithms for phase change memory. In: Proceedings of the 5th biennial conference on innovative data systems research (CIDR’11)

    Google Scholar 

  8. Davis SF (1987) A simplified TVD finite difference scheme via artificial viscosity. SIAM J Sci Comput 8(1):1–18. doi:10.1137/0908002

    Article  MATH  Google Scholar 

  9. DeVito Z, Joubert N, Palacios F, Oakley S, Medina M, Barrientos M, Elsen E, Ham F, Aiken A, Duraisamy K, Darve E, Alonso J, Hanrahan P (2011) Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of the 2011 ACM/IEEE conference on supercomputing

    Google Scholar 

  10. Dongarra J (2009) An overview of HPC and challenges for the future. In: HPC Asia 2009. http://www.nchc.org.tw/en/news/index.php?NEWS_ID=49. Accessed 29 July 2011

  11. Feng W, Cameron K (2007) The Green500 list: encouraging sustainable supercomputing. Computer 40(12):50–55

    Article  Google Scholar 

  12. Ferenbaugh C (in review) A comparison of GPU strategies for unstructured mesh physics. Concurr Comput Pract Exp

  13. Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22(6):789–828

    Article  MATH  Google Scholar 

  14. Harvey M, Fabritiis GD (2011) Swan: a tool for porting CUDA programs to OpenCL. Comput Phys Commun 182(4):1093–1099

    Article  Google Scholar 

  15. Kato S, Lakshmanan K, Kumar A, Kelkar M, Ishikawa Y, Rajkumar R (2011) RGEM: a responsive GPGPU execution model for runtime engines. In: 2011 IEEE 32nd real-time systems symposium (RTSS), pp 57–66

    Chapter  Google Scholar 

  16. Kato S, McThrow M, Maltzahn C, Brandt S (in press) Gdev: first-class GPU resource management in the operating system. In: 2012 USENIX annual technical conference (USENIX ATC’12)

  17. Khaleel MA (2010) 2010 exascale workshop panel report meeting. Technical report PNNL-19515, Pacific Northwest National Laboratory, Department of Energy, Washington, DC

  18. Klimovitski A (2001) Using SSE and SSE2: misconceptions and reality. In: Intel developer update magazine, March 2001, pp 1–8

    Google Scholar 

  19. Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J, Karp S, Keckler S, Klein D, Lucas R, Richards M, Scarpelli A, Scott S, Snavely A, Sterling T, Williams RS, Yelick K (2008) Exascale computing study: Technology challenges in achieving exascale systems. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.6676. Accessed 23 March 2012

  20. Los Alamos National Laboratory (2011) Flag 3.2 alpha 5 radiation–hydrodynamics code (LA-CC 11-065)

  21. Message Passing Interface Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl High Perform Comput 8(3–4):159–416

    Google Scholar 

  22. Mills A, Wood L (1981) Cray-1: a powerful delivery system for engineering software. Adv Eng Softw 3(2):62–66

    Article  Google Scholar 

  23. Nicholaeff D, Davis N, Trujillo D Robey, R (in review) A cell-based adaptive mesh refinement implemented with general-purpose graphics processing units. SIAM J Sci Comput

  24. Oyanagi Y (2002) Future of supercomputing. J Comput Appl Math 149(1):147–153

    Article  MathSciNet  MATH  Google Scholar 

  25. Pao K (2011) Co-design and you: why should mathematicians care about exascale computing. In: 2011 DOE applied mathematics program meeting

    Google Scholar 

  26. Papadrakakis M, Stavroulakis G, Karatarakis A (2011) A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures. Comput Methods Appl Mech Eng 200(13–16):1490–1508

    Article  MathSciNet  MATH  Google Scholar 

  27. Robey RN, Nicholaeff D Robey, RW (in review) Hash-based algorithms for discretized data. SIAM J Sci Comput

  28. Simon H, Zacharia T, Stevens R (2007) Modeling and simulation at the exascale for energy and the environment. Technical report, Department of Energy, Washington, DC

  29. Snir M, Gropp W, Kogge P (2011) Exascale research: preparing for the post-Moore era. http://hdl.handle.net/2142/25469. Accessed 25 July 2011

  30. Sottile M, Rasmussen C, Weseloh W, Robey R, Quinlan D, Overbey, J (in press) ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming. Int J Comp Sci Eng

  31. Sottile MJ, Rasmussen CE, Weseloh WN, Robey RW, Quinlan J, Overbey J (2011) ForOpenCL: transformations exploiting array syntax in fortran for accelerator programming. CoRR abs/1107.2157

  32. Tendler J, Dodson JS, Fields S, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25

    Article  Google Scholar 

  33. Wolfe M (2008) How we should program GPGPUs. Linux Journal, November 2008. http://www.linuxjournal.com/magazine/how-we-should-program-gpgpus. Accessed 29 July 2011

  34. Yang XJ, Liao XK, Lu K, Hu QF, Song JQ, Su JS (2011) The TianHe-1A supercomputer: its hardware and software. J Comput Sci Technol 26(3):344–351

    Article  Google Scholar 

  35. Young J (2011) Supercomputers let up on speed. The Chronicle of Higher Education, April 2011. http://chronicle.com/article/In-University-Supercomputing/126979/. Accessed 20 July 2011

  36. Zhang C, Yuan X, Srinivasan A (2010) Processor affinity and MPI performance on SMP–CMP clusters. In: 2010 IEEE international symposium on parallel distributed processing, workshops and PhD forum (IPDPSW), pp 1–8. doi:10.1109/IPDPSW.2010.5470774

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Ben Bergen and Marcus Daniels for the use of Darwin, the LANL CCS GPU cluster.

The authors are also grateful to the LANL CCS/X-Division Exascale working group led by Tim Kelley and to Scott Runnels for organizing the LANL X-Division Summer Workshop exascale group at which much of the foundational work for this article was performed. These groups encouraged the work on the applications in different computational domains.

This work was supported by Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Los Alamos National Security, LLC, for the National Nuclear Security Administration of the US Department of Energy under contract DE-AC52-06NA25396.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neal E. Davis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davis, N.E., Robey, R.W., Ferenbaugh, C.R. et al. Paradigmatic shifts for exascale supercomputing. J Supercomput 62, 1023–1044 (2012). https://doi.org/10.1007/s11227-012-0789-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0789-3

Keywords

Navigation