Abstract
We profile and optimize calculations performed with the BerkeleyGW [2, 3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cs roofline toolkit. https://bitbucket.org/berkeleylab/cs-roofline-toolkit
Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: BerkeleyGW: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012)
Frigo, M., Steven, G.J.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)
Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., Fabris, S., Fratesi, G., de Gironcoli, S., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: J. Phys.: Condens. Matter 21, 395502 (2009). http://dx.doi.org/10.1088/0953-8984/21/39/395502
Hybertsen, M.S., Louie, S.G.: Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B 34(8), 5390 (1986)
Hybertsen, M.S., Louie, S.G.: First-principles theory of quasiparticles: calculation of band gaps in semiconductors and insulators. Phys. Rev. Lett. 55(13), 1418 (1985)
Intel vtune. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Kronik, L., Makmal, A., Tiago, M.L., Alemany, M.M.G., Jain, M., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC the pseudopotential algorithm for realspace electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi (b) 243(5), 1063–1079 (2006)
NERSC. http://www.nersc.gov
NERSC: Cori. http://www.nersc.gov/systems/cori/
NERSC: Measuring arithmetic intensity. http://www.nersc.gov/users/application-performance/measuring-arithmetic-intensity
Pfrommer, B., Raczkowski, D., Canning, A., Louie. S.G.: PARATEC (PARAllel Total Energy Code), Lawrence Berkeley National Laboratory (with contributions from Mauri, F., Cote, M., Yoon, Y., Pickard, C., Heynes, P.). For more information see www.nersc.gov/projects/paratec. There is no corresponding record for this reference
Raman, K.: Calculating “flop” using intel software development emulator (intel sde), March 2015. https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde
Soler, J.M., Artacho, E., Gale, J.D., Garca, A., Junquera, J., Ordejn, P., Snchez-Portal, D.: The SIESTA method for ab initio order-N materials simulation. J. Phys. Condens. Matter 14(11), 2745 (2002)
Tal, A.: Intel software development emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator
Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008
Williams, S., Watterman, A., Patterson, D.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), April 2009
Williams, S.: Roofline performance model. http://crd.lbl.gov/departments/computer-science/PAR/research/roofline/
Acknowledgments
Supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U.S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory. Derek Vigil-Fowler is support by NREL’s LDRD Director’s Postdoctoral Fellowship. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
We acknowledge helpful conversations with Mike Greenfield, Paul Kent, David Prendergast and Pierre Carrier.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Deslippe, J. et al. (2016). Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)