Abstract
Quicksilver represents key elements of the Mercury Monte Carlo Particle Transport simulation software developed at Lawrence Livermore National Laboratory (LLNL). Mercury is one of the applications used in the Department of Energy (DOE) for nuclear security and nuclear reactor simulations. Thus Quicksilver, as a Mercury proxy, influences DOE’s hardware procurement and co-design activities. Quicksilver has a complicated implementation and performance profile: its performance is dominated by latency-bound table look-ups and control flow divergence that limit SIMD/SIMT parallelization opportunities. Therefore, obtaining high performance for Quicksilver is quite challenging.
This paper shows how to improve Quicksilver’s performance on Intel Xeon CPUs by \(1.8\times \) compared to its original version by selectively replicating conflict-prone data structures. It also shows how to efficiently port Quicksilver on the new Intel Programmable Integrated Unified Memory Architecture (PIUMA). Preliminary analysis shows that a PIUMA die (8 cores) is about \(2\times \) faster than an Intel Xeon 8280 socket (28 cores) and provides better strong scaling efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Co-design at Lawrence Livermore National Laboratory: Quicksilver, Lawrence Livermore National Laboratory (LLNL), Livermore, CA, United States. https://computing.llnl.gov/projects/co-design/quicksilver, https://github.com/LLNL/Quicksilver
“Coral2”. https://asc.llnl.gov/coral-2-benchmarks/
“Mercury”. https://wci.llnl.gov/simulation/computer-codes/mercury
Nvidia P100. https://www.nvidia.com/en-us/data-center/tesla-p100/
Aananthakrishnan, S., et al.: PIUMA: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277 (2020)
Bergmann, R.M., Vujić, J.L.: Algorithmic choices in WARP-A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs. Ann. Nucl. Energy 77, 176–193 (2015)
Bleile, R., Brantley, P., O’Brien, M., Childs, H.: Algorithmic improvements for portable event-based Monte Carlo transport using the nvidia thrust library. Tech. rep., Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA (2016)
Brown, F.B., Martin, W.R.: Monte Carlo methods for radiation transport analysis on vector computers. Progress Nucl. Energy 14(3), 269–299 (1984)
Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Trans. Archit. Code Optim. 11(3), 1–25 (2014). https://doi.org/10.1145/2629677
David, S.: DARPA ERI: HIVE and intel PUMA graph processor. WikiChip Fuse (2019). https://fuse.wikichip.org/news/2611/darpa-eri-hive-and-intel-puma-graph-processor/
Hamilton, S.P., Slattery, S.R., Evans, T.M.: Multigroup monte carlo on GPUs: comparison of history-and event-based algorithms. Ann. Nucl. Energy 113, 506–518 (2018)
McCreary, D.: Intel’s incredible PIUMA graph analytics hardware. Medium (2020). https://dmccreary.medium.com/intels-incredible-piuma-graph-analytics-hardware-a2e9c3daf8d8
Richards, D.F., Bleile, R.C., Brantley, P.S., Dawson, S.A., McKinley, M.S., O’Brien, M.J.: Quicksilver: a proxy app for the Monte Carlo transport code mercury. In: CLUSTER, pp. 866–873. IEEE (2017)
Tithi, J.J., Liu, X., Petrini, F.: Accelerating quicksilver-a Monte Carlo proxy app on multicores. https://www.youtube.com/watch?v=ARrymLNiL7M
Acknowledgments
This research was, in part, funded by the U.S. Government. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-CONF-817842. Thanks to Marcin Lisowski and Joanna Gagatko from Intel for their initial help with Quicksilver on PIUMA. We would also like to thank Sebastian Szkoda, Vincent Cave and Wim Heirman from Intel for their help with PIUMA runtime.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Lawrence Livermore National Security, LLC
About this paper
Cite this paper
Tithi, J.J., Petrini, F., Richards, D.F. (2021). Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That’s Different from CPU. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-78713-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78712-7
Online ISBN: 978-3-030-78713-4
eBook Packages: Computer ScienceComputer Science (R0)