Abstract
Initial optimization strategies and results on MFDn, a large-scale nuclear physics application code, running on a single KNL node are presented. This code consists of the construction of a very large sparse real symmetric matrix and computing a few lowest eigenvalues and eigenvectors of this matrix through iterative methods. Challenges addressed include effectively utilizing MCDRAM with representative input data for production runs on 5,000 KNL nodes that require over 80 GB of memory per node, using OpenMP 4 to parallelize functions in the construction phase of the sparse matrices, and vectorizing those functions in spite of while-loops, conditionals, and lookup tables with indirect indexing. Moreover, hybrid MPI/OpenMP is employed not only to maximize the total problem size that can be solved per node, but also to eventually minimize parallel scaling overhead through the best scaling combination of MPI ranks per node with OpenMP threads. We describe a vectorized version of a popcount operation to avoid serialization on intrinsic popcnt which only operates on scalar registers. Additionally we leverage SSE 4.2 string comparison instructions to determine nonzero matrix elements. By utilizing MCDRAM, we achieve excellent Sparse Matrix–Matrix multiplication performance; in particular, using blocks of 8 vectors lead to a speedup of 6.4\(\times \) on KNL and 2.9\(\times \) on Haswell compared to the performance of repeated SpMV’s. This optimization was essential in achieving a 1.6\(\times \) improvement on KNL over Haswell.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The optimal choice of this number is certainly architecture dependent.
References
Aktulga, H.M., Buluç, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In: 2014 IEEE 28th International on Parallel and Distributed Processing Symposium, pp. 1213–1222. IEEE (2014)
Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P.: Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurr. Comput. Pract. Exper. 26(16), 2631–2651 (2014)
Binder, S., Calci, A., Epelbaum, E., Furnstahl, R.J., Golak, J., Hebeler, K., Kamada, H., Krebs, H., Langhammer, J., Liebig, S., Maris, P., Meißner, U.G., Minossi, D., Nogga, A., Potter, H., Roth, R., Skinińki, R., Topolnicki, K., Vary, J.P., Witała, H.: Few-nucleon systems with state-of-the-art chiral nucleon-nucleon forces. Phys. Rev. C 93(4), 044002 (2016)
Cantalupo, C., Venkatesan, V., Hammond, J.R., Hammond, S.: User extensible heap manager for heterogeneous memory platforms and mixed memory policies (2015)
Knyazev, A.V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23(2), 517–541 (2001)
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Std. B Math. Sci. 45(4), 255–282 (1950)
Maris, P., Caprio, M.A., Vary, J.P.: Emergence of rotational bands in ab initio no-core configuration interaction calculations of the Be isotopes. Phys. Rev. C 91(1), 014310 (2015)
Maris, P., Vary, J.P., Navratil, P., Ormand, W.E., Nam, H., Dean, D.J.: Origin of the anomalous long lifetime of 14C. Phys. Rev. Lett. 106(20), 202502 (2011)
Maris, P., Vary, J.P., Gandolfi, S., Carlson, J., Pieper, S.C.: Properties of trapped neutrons interacting with realistic nuclear Hamiltonians. Phys. Rev. C 87(5), 054318 (2013)
Acknowledgments
This work is supported in part by U.S. DOE Grant Number DESC0008485 (SciDAC/NUCLEI). This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Cook, B. et al. (2016). High Performance Optimizations for Nuclear Physics Code MFDn on KNL. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)