ABSTRACT
Improving the precision in particle physics predictions obtained from lattice simulations of quantum chromodynamics (QCD) requires extension of the interactions considered thus far, leading to additional computational demands. Most commonly used publicly available program packages for efficient simulations of Wilson discretization of the Dirac operator are highly scalable on CPU hardware. In order to be able to run efficiently on existing and upcoming hybrid architectures, one needs to rethink the current strategy for data types used at different stages of the simulation, most notably in frequent solves of the Dirac equation. We perform the first steps towards porting on GPUs of the three type of solvers used in the simulations of clover improved Wilson fermions: Conjugate Gradient, Schwarz preconditioned GCR solver, and a variant of the deflated solver. The analysis of the reduced precision data types' impact on the convergence of each solver indicates several possibilities for overall performance improvement.
- Sz. Borsanyi et al. 2015. Ab initio calculation of the neutron-proton mass difference. Science 347 (2015), 1452--1455. arXiv:1406.4088 [hep-lat] Google ScholarCross Ref
- Peter Boyle, Azusa Yamaguchi, Guido Cossu, and Antonin Portelli. 2015. Grid: A next generation data parallel C++ QCD library. arXiv preprint arXiv:1512.03487 (2015).Google Scholar
- Lucius Bushnaq, Isabel Campos, Marco Catillo, Alessandro Cotellucci, Madeleine Dale, Patrick Fritzsch, Jens Lücke, Marina Krstić Marinković, Agostino Patella, and Nazario Tantalo. 2022. First results on QCD+ QED with C* boundary conditions. arXiv preprint arXiv:2209.13183 (2022). Google ScholarCross Ref
- Isabel Campos, Patrick Fritzsch, Martin Hansen, Marina Krstic Marinkovic, Agostino Patella, Alberto Ramos, and Nazario Tantalo. 2020. openQ*D code: a versatile tool for QCD+QED simulations. https://gitlab.com/rcstar/openQxD. The European Physical Journal C 80, 3 (2020), 1--24. Accessed: 2021-01-06.Google ScholarCross Ref
- Kate Clark. 2022. To the Exascale, and Beyond: Computing Challenges in Lattice QCD. https://indi.to/tC3sM. Workshop "Efficient simulations on GPU hardware".Google Scholar
- Michael A Clark, Ronald Babich, Kipton Barros, Richard C Brower, and Claudio Rebbi. 2010. Solving Lattice QCD systems of equations using mixed precision solvers on GPUs. Computer Physics Communications 181, 9 (2010), 1517--1528.Google ScholarCross Ref
- Simon Duane, A. D. Kennedy, Brian J. Pendleton, and Duncan Roweth. 1987. Hybrid Monte Carlo. Physics Letters B 195, 2 (Sept. 1987), 216--222. Google ScholarCross Ref
- Posit Working Group et al. 2018. Posit standard documentation - Release 3.2-draft. Posit Standard Documentation (2018).Google Scholar
- INCITE. 2019. INCITE Award Archive. https://www.doeleadershipcomputing.org/awardees/. Accessed: 2022-12-12.Google Scholar
- Institute of Electrical and Electronics Engineers. Computer Society. Standards Committee and Stevenson, David. 1985. IEEE standard for binary floating-point arithmetic. IEEE.Google Scholar
- Ronny Krashinsky, O Giroux, S Jones, N Stam, and S Ramaswamy. 2020. NVIDIA ampere architecture in-depth. NVIDIA blog: https://devblogs.nvidia.com/nvidia-ampere-architecture-in-depth (2020).Google Scholar
- A. S. Kronfeld and U. J. Wiese. 1991. SU(N) gauge theories with C-periodic boundary conditions (I). Topological structure. Nuclear Physics B 357, 2--3 (jul 1991), 521--533. Google ScholarCross Ref
- Martin Lüscher. 2007. Local coherence and deflation of the low quark modes in lattice QCD. Journal of High Energy Physics 2007, 07 (2007), 081.Google ScholarCross Ref
- M Lüscher and S Schaefer. 2013. openQCD simulation program for lattice QCD with open boundary conditions.Google Scholar
- Ryosuke Okuta, Yuya Unno, Daisuke Nishino, Shohei Hido, and Crissman Loomis. 2017. CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), http://learningsys.org/nips17/assets/papers/paper_16.pdfGoogle Scholar
- L Polley. 1993. Boundaries for SU (3)c X U(1)el lattice gauge theory with a chemical potential. Z Phys. C - Particles and Fields 59, 1 (1993), 105--108. Google ScholarCross Ref
- Sinéad Ryan. 2022. What can be Learned from Lattice QCD at Exascale: can We Reuse and Recycle not Reinvent Ideas? https://pasc22.pasc-conference.org/program/schedule/index.html%3Fpost_type=page&p=10&id=msa224&sess=sess142.html. PASC 22.Google Scholar
- Jiqun Tu, Michael A Clark, Chulwoo Jung, and Robert D Mawhinney. 2021. Solving DWF dirac equation using multi-splitting preconditioned conjugate gradient with tensor cores on NVIDIA GPUs. In Proceedings of the Platform for Advanced Scientific Computing Conference. 1--11.Google ScholarDigital Library
- Shibo Wang and Pankaj Kanwar. 2019. BFloat16: the secret to high performance on cloud TPUs. Google Cloud Blog (2019).Google Scholar
- U.J. Wiese. 1992. C- and G-periodic QCD at finite temperature. Nuclear Physics B 375, 1 (may 1992), 45--66. Google ScholarCross Ref
- Hantao Yin and Robert Mawhinney. 2012. Improving DWF Simulations: Force Gradient Integrator and the Mobius Accelerated DWF Solver. In Proceedings of XXIX International Symposium on Lattice Field Theory --- PoS(Lattice 2011), Vol. 139. 051. Google ScholarCross Ref
Index Terms
- Towards Lattice QCD+QED Simulations on GPUs
Recommendations
The Fat-Link Computation on Large GPU Clusters for Lattice QCD
SAAHPC '12: Proceedings of the 2012 Symposium on Application Accelerators in High Performance ComputingGraphics Processing Units (GPU) are becoming increasingly popular in high performance computing due to their high performance, high power efficiency and low cost. In this paper, we present results of an effort to implement the fatlink computation -- an ...
Scaling lattice QCD beyond 100 GPUs
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and AnalysisOver the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-...
Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisThe gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of ...
Comments