High Performance Optimizations for Nuclear Physics Code MFDn on KNL

Cook, Brandon; Maris, Pieter; Shao, Meiyue; Wichmann, Nathan; Wagner, Marcus; O’Neill, John; Phung, Thanh; Bansal, Gaurav

doi:10.1007/978-3-319-46079-6_26

Brandon Cook¹⁶,
Pieter Maris¹⁷,
Meiyue Shao¹⁶,
Nathan Wichmann¹⁸,
Marcus Wagner¹⁸,
John O’Neill¹⁹,
Thanh Phung¹⁹ &
…
Gaurav Bansal¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2594 Accesses
5 Citations

Abstract

Initial optimization strategies and results on MFDn, a large-scale nuclear physics application code, running on a single KNL node are presented. This code consists of the construction of a very large sparse real symmetric matrix and computing a few lowest eigenvalues and eigenvectors of this matrix through iterative methods. Challenges addressed include effectively utilizing MCDRAM with representative input data for production runs on 5,000 KNL nodes that require over 80 GB of memory per node, using OpenMP 4 to parallelize functions in the construction phase of the sparse matrices, and vectorizing those functions in spite of while-loops, conditionals, and lookup tables with indirect indexing. Moreover, hybrid MPI/OpenMP is employed not only to maximize the total problem size that can be solved per node, but also to eventually minimize parallel scaling overhead through the best scaling combination of MPI ranks per node with OpenMP threads. We describe a vectorized version of a popcount operation to avoid serialization on intrinsic popcnt which only operates on scalar registers. Additionally we leverage SSE 4.2 string comparison instructions to determine nonzero matrix elements. By utilizing MCDRAM, we achieve excellent Sparse Matrix–Matrix multiplication performance; in particular, using blocks of 8 vectors lead to a speedup of 6.4$\times $ on KNL and 2.9$\times $ on Haswell compared to the performance of repeated SpMV’s. This optimization was essential in achieving a 1.6$\times $ improvement on KNL over Haswell.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Block-wise dynamic mixed-precision for sparse matrix-vector multiplication on GPUs

Article Open access 11 March 2024

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Revisiting the performance optimization of QR factorization on Intel KNL and SKL multiprocessors

Article 13 March 2024

Notes

1.
The optimal choice of this number is certainly architecture dependent.

References

Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: 2014 IEEE 28th International on Parallel and Distributed Processing Symposium, pp. 1213–1222. IEEE (2014)
Google Scholar
Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P.: Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurr. Comput. Pract. Exper. 26(16), 2631–2651 (2014)
Article Google Scholar
Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016)
Article Google Scholar
Cantalupo, C., Venkatesan, V., Hammond, J.R., Hammond, S.: User extensible heap manager for heterogeneous memory platforms and mixed memory policies (2015)
Google Scholar
Knyazev, A.V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23(2), 517–541 (2001)
Article MathSciNet MATH Google Scholar
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Std. B Math. Sci. 45(4), 255–282 (1950)
Article MathSciNet Google Scholar
Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), 014310 (2015)
Article Google Scholar
Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of 14C. Phys. Rev. Lett. 106(20), 202502 (2011)
Article Google Scholar
Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013)
Article Google Scholar

Download references

Acknowledgments

This work is supported in part by U.S. DOE Grant Number DESC0008485 (SciDAC/NUCLEI). This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Brandon Cook & Meiyue Shao
Department of Physics and Astronomy, Iowa State University, Ames, IA, 50011, USA
Pieter Maris
Cray Inc., Seattle, USA
Nathan Wichmann & Marcus Wagner
Software and Services Group, Intel Corporation, Santa Clara, USA
John O’Neill, Thanh Phung & Gaurav Bansal

Authors

Brandon Cook
View author publications
You can also search for this author in PubMed Google Scholar
Pieter Maris
View author publications
You can also search for this author in PubMed Google Scholar
Meiyue Shao
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Wichmann
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Wagner
View author publications
You can also search for this author in PubMed Google Scholar
John O’Neill
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Phung
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brandon Cook .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cook, B. et al. (2016). High Performance Optimizations for Nuclear Physics Code MFDn on KNL. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_26
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics