Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That’s Different from CPU

Tithi, Jesmin Jahan; Petrini, Fabrizio; Richards, David F.

doi:10.1007/978-3-030-78713-4_3

Jesmin Jahan Tithi¹²,
Fabrizio Petrini¹² &
David F. Richards¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12728))

Included in the following conference series:

International Conference on High Performance Computing

2468 Accesses
1 Altmetric

Abstract

Quicksilver represents key elements of the Mercury Monte Carlo Particle Transport simulation software developed at Lawrence Livermore National Laboratory (LLNL). Mercury is one of the applications used in the Department of Energy (DOE) for nuclear security and nuclear reactor simulations. Thus Quicksilver, as a Mercury proxy, influences DOE’s hardware procurement and co-design activities. Quicksilver has a complicated implementation and performance profile: its performance is dominated by latency-bound table look-ups and control flow divergence that limit SIMD/SIMT parallelization opportunities. Therefore, obtaining high performance for Quicksilver is quite challenging.

This paper shows how to improve Quicksilver’s performance on Intel Xeon CPUs by \(1.8\times \) compared to its original version by selectively replicating conflict-prone data structures. It also shows how to efficiently port Quicksilver on the new Intel Programmable Integrated Unified Memory Architecture (PIUMA). Preliminary analysis shows that a PIUMA die (8 cores) is about \(2\times \) faster than an Intel Xeon 8280 socket (28 cores) and provides better strong scaling efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Co-design at Lawrence Livermore National Laboratory: Quicksilver, Lawrence Livermore National Laboratory (LLNL), Livermore, CA, United States. https://computing.llnl.gov/projects/co-design/quicksilver, https://github.com/LLNL/Quicksilver
“Coral2”. https://asc.llnl.gov/coral-2-benchmarks/
“Mercury”. https://wci.llnl.gov/simulation/computer-codes/mercury
Nvidia P100. https://www.nvidia.com/en-us/data-center/tesla-p100/
Aananthakrishnan, S., et al.: PIUMA: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277 (2020)
Bergmann, R.M., Vujić, J.L.: Algorithmic choices in WARP-A framework for continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs. Ann. Nucl. Energy 77, 176–193 (2015)
Article Google Scholar
Bleile, R., Brantley, P., O’Brien, M., Childs, H.: Algorithmic improvements for portable event-based Monte Carlo transport using the nvidia thrust library. Tech. rep., Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA (2016)
Google Scholar
Brown, F.B., Martin, W.R.: Monte Carlo methods for radiation transport analysis on vector computers. Progress Nucl. Energy 14(3), 269–299 (1984)
Article Google Scholar
Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM Trans. Archit. Code Optim. 11(3), 1–25 (2014). https://doi.org/10.1145/2629677
David, S.: DARPA ERI: HIVE and intel PUMA graph processor. WikiChip Fuse (2019). https://fuse.wikichip.org/news/2611/darpa-eri-hive-and-intel-puma-graph-processor/
Hamilton, S.P., Slattery, S.R., Evans, T.M.: Multigroup monte carlo on GPUs: comparison of history-and event-based algorithms. Ann. Nucl. Energy 113, 506–518 (2018)
Article Google Scholar
McCreary, D.: Intel’s incredible PIUMA graph analytics hardware. Medium (2020). https://dmccreary.medium.com/intels-incredible-piuma-graph-analytics-hardware-a2e9c3daf8d8
Richards, D.F., Bleile, R.C., Brantley, P.S., Dawson, S.A., McKinley, M.S., O’Brien, M.J.: Quicksilver: a proxy app for the Monte Carlo transport code mercury. In: CLUSTER, pp. 866–873. IEEE (2017)
Google Scholar
Tithi, J.J., Liu, X., Petrini, F.: Accelerating quicksilver-a Monte Carlo proxy app on multicores. https://www.youtube.com/watch?v=ARrymLNiL7M

Download references

Acknowledgments

This research was, in part, funded by the U.S. Government. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-CONF-817842. Thanks to Marcin Lisowski and Joanna Gagatko from Intel for their initial help with Quicksilver on PIUMA. We would also like to thank Sebastian Szkoda, Vincent Cave and Wim Heirman from Intel for their help with PIUMA runtime.

Author information

Authors and Affiliations

Parallel Computing Labs, Intel Corporation, 3600 Juliette Ln, Santa Clara, CA, 95054, USA
Jesmin Jahan Tithi & Fabrizio Petrini
Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
David F. Richards

Authors

Jesmin Jahan Tithi
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Petrini
View author publications
You can also search for this author in PubMed Google Scholar
David F. Richards
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesmin Jahan Tithi .

Editor information

Editors and Affiliations

Hewlett Packard Enterprise, Seattle, WA, USA
Bradford L. Chamberlain
University of Amsterdam, Amsterdam, The Netherlands
Ana-Lucia Varbanescu
Extreme Computing Research Center, Thuwal Jeddah, Saudi Arabia
Hatem Ltaief
The University of Tennessee, Knoxville, Knoxville, TN, USA
Piotr Luszczek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tithi, J.J., Petrini, F., Richards, D.F. (2021). Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How That’s Different from CPU. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-78713-4_3
Published: 17 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78712-7
Online ISBN: 978-3-030-78713-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics