Abstract
As the next generation of supercomputers reaches the exascale, the dominant design parameter governing performance will shift from hardware to software. Intelligent usage of memory access, vectorization, and intranode threading will become critical to the performance of scientific applications and numerical calculations on exascale supercomputers. Although challenges remain in effectively programming the heterogeneous devices likely to be utilized in future supercomputers, new languages and tools are providing a pathway for application developers to tackle this new frontier. These languages include open programming standards such as OpenCL and OpenACC, as well as widely-adopted languages such as CUDA; also of importance are high-quality libraries such as CUDPP and Thrust. This article surveys a purposely diverse set of proof-of-concept applications developed at Los Alamos National Laboratory. We find that the capability level of the accelerator computing hardware and languages has moved beyond the regular grid finite difference calculations and molecular dynamics codes. More advanced applications requiring dynamic memory allocation, such as cell-based adaptive mesh refinement, can now be addressed—and with more effort even unstructured mesh codes can be moved to the GPU.
Similar content being viewed by others
References
Bergen BK, Daniels MG, Weber PM (2010) A hybrid programming model for compressible gas dynamics using OpenCL. In: 2010 39th international conference on parallel processing workshops. doi:10.1109/ICPPW.2010.60
Bhatele A (2010) Automating topology aware mapping for supercomputers. Dissertation, University of Illinois at Urbana–Champaign
Boillat J, Burkhart H, Decker K, Kropf P (1991) Parallel computing in the 1990’s: attacking the software problem. Phys Rep 207(3–5):141–165
Bowers KJ, Albright J, Bergen B, Yin L, Barker J, Kerbyson DJ (2008) 0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. In: Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08. IEEE, Piscataway, pp 63:1–63:11
Bowers KJ, Albright BJ, Yin L, Bergen B, Twan T (2008) Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Phys Plasmas 15(5):055703. doi:10.1063/1.2840133
Casaglia G (1976) Distributed computing systems: a biased review. Euromicro Newsl 2(4):5–18
Chen S, Gibbons B, Nath S (2011) Rethinking database algorithms for phase change memory. In: Proceedings of the 5th biennial conference on innovative data systems research (CIDR’11)
Davis SF (1987) A simplified TVD finite difference scheme via artificial viscosity. SIAM J Sci Comput 8(1):1–18. doi:10.1137/0908002
DeVito Z, Joubert N, Palacios F, Oakley S, Medina M, Barrientos M, Elsen E, Ham F, Aiken A, Duraisamy K, Darve E, Alonso J, Hanrahan P (2011) Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of the 2011 ACM/IEEE conference on supercomputing
Dongarra J (2009) An overview of HPC and challenges for the future. In: HPC Asia 2009. http://www.nchc.org.tw/en/news/index.php?NEWS_ID=49. Accessed 29 July 2011
Feng W, Cameron K (2007) The Green500 list: encouraging sustainable supercomputing. Computer 40(12):50–55
Ferenbaugh C (in review) A comparison of GPU strategies for unstructured mesh physics. Concurr Comput Pract Exp
Gropp W, Lusk E, Doss N, Skjellum A (1996) A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22(6):789–828
Harvey M, Fabritiis GD (2011) Swan: a tool for porting CUDA programs to OpenCL. Comput Phys Commun 182(4):1093–1099
Kato S, Lakshmanan K, Kumar A, Kelkar M, Ishikawa Y, Rajkumar R (2011) RGEM: a responsive GPGPU execution model for runtime engines. In: 2011 IEEE 32nd real-time systems symposium (RTSS), pp 57–66
Kato S, McThrow M, Maltzahn C, Brandt S (in press) Gdev: first-class GPU resource management in the operating system. In: 2012 USENIX annual technical conference (USENIX ATC’12)
Khaleel MA (2010) 2010 exascale workshop panel report meeting. Technical report PNNL-19515, Pacific Northwest National Laboratory, Department of Energy, Washington, DC
Klimovitski A (2001) Using SSE and SSE2: misconceptions and reality. In: Intel developer update magazine, March 2001, pp 1–8
Kogge P, Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J, Karp S, Keckler S, Klein D, Lucas R, Richards M, Scarpelli A, Scott S, Snavely A, Sterling T, Williams RS, Yelick K (2008) Exascale computing study: Technology challenges in achieving exascale systems. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.6676. Accessed 23 March 2012
Los Alamos National Laboratory (2011) Flag 3.2 alpha 5 radiation–hydrodynamics code (LA-CC 11-065)
Message Passing Interface Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl High Perform Comput 8(3–4):159–416
Mills A, Wood L (1981) Cray-1: a powerful delivery system for engineering software. Adv Eng Softw 3(2):62–66
Nicholaeff D, Davis N, Trujillo D Robey, R (in review) A cell-based adaptive mesh refinement implemented with general-purpose graphics processing units. SIAM J Sci Comput
Oyanagi Y (2002) Future of supercomputing. J Comput Appl Math 149(1):147–153
Pao K (2011) Co-design and you: why should mathematicians care about exascale computing. In: 2011 DOE applied mathematics program meeting
Papadrakakis M, Stavroulakis G, Karatarakis A (2011) A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures. Comput Methods Appl Mech Eng 200(13–16):1490–1508
Robey RN, Nicholaeff D Robey, RW (in review) Hash-based algorithms for discretized data. SIAM J Sci Comput
Simon H, Zacharia T, Stevens R (2007) Modeling and simulation at the exascale for energy and the environment. Technical report, Department of Energy, Washington, DC
Snir M, Gropp W, Kogge P (2011) Exascale research: preparing for the post-Moore era. http://hdl.handle.net/2142/25469. Accessed 25 July 2011
Sottile M, Rasmussen C, Weseloh W, Robey R, Quinlan D, Overbey, J (in press) ForOpenCL: transformations exploiting array syntax in Fortran for accelerator programming. Int J Comp Sci Eng
Sottile MJ, Rasmussen CE, Weseloh WN, Robey RW, Quinlan J, Overbey J (2011) ForOpenCL: transformations exploiting array syntax in fortran for accelerator programming. CoRR abs/1107.2157
Tendler J, Dodson JS, Fields S, Le H, Sinharoy B (2002) POWER4 system microarchitecture. IBM J Res Dev 46(1):5–25
Wolfe M (2008) How we should program GPGPUs. Linux Journal, November 2008. http://www.linuxjournal.com/magazine/how-we-should-program-gpgpus. Accessed 29 July 2011
Yang XJ, Liao XK, Lu K, Hu QF, Song JQ, Su JS (2011) The TianHe-1A supercomputer: its hardware and software. J Comput Sci Technol 26(3):344–351
Young J (2011) Supercomputers let up on speed. The Chronicle of Higher Education, April 2011. http://chronicle.com/article/In-University-Supercomputing/126979/. Accessed 20 July 2011
Zhang C, Yuan X, Srinivasan A (2010) Processor affinity and MPI performance on SMP–CMP clusters. In: 2010 IEEE international symposium on parallel distributed processing, workshops and PhD forum (IPDPSW), pp 1–8. doi:10.1109/IPDPSW.2010.5470774
Acknowledgements
The authors would like to thank Ben Bergen and Marcus Daniels for the use of Darwin, the LANL CCS GPU cluster.
The authors are also grateful to the LANL CCS/X-Division Exascale working group led by Tim Kelley and to Scott Runnels for organizing the LANL X-Division Summer Workshop exascale group at which much of the foundational work for this article was performed. These groups encouraged the work on the applications in different computational domains.
This work was supported by Los Alamos National Laboratory. Los Alamos National Laboratory is operated by Los Alamos National Security, LLC, for the National Nuclear Security Administration of the US Department of Energy under contract DE-AC52-06NA25396.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Davis, N.E., Robey, R.W., Ferenbaugh, C.R. et al. Paradigmatic shifts for exascale supercomputing. J Supercomput 62, 1023–1044 (2012). https://doi.org/10.1007/s11227-012-0789-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0789-3