Skip to main content

Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

  • Conference paper
  • First Online:
Book cover High Performance Computing (ISC High Performance 2016)

Abstract

We profile and optimize calculations performed with the BerkeleyGW [2, 3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cs roofline toolkit. https://bitbucket.org/berkeleylab/cs-roofline-toolkit

  2. Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: BerkeleyGW: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012)

    Article  Google Scholar 

  3. http://www.berkeleygw.org

  4. Frigo, M., Steven, G.J.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)

    Google Scholar 

  5. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., Fabris, S., Fratesi, G., de Gironcoli, S., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: J. Phys.: Condens. Matter 21, 395502 (2009). http://dx.doi.org/10.1088/0953-8984/21/39/395502

    Google Scholar 

  6. Hybertsen, M.S., Louie, S.G.: Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B 34(8), 5390 (1986)

    Article  Google Scholar 

  7. Hybertsen, M.S., Louie, S.G.: First-principles theory of quasiparticles: calculation of band gaps in semiconductors and insulators. Phys. Rev. Lett. 55(13), 1418 (1985)

    Article  Google Scholar 

  8. Intel vtune. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  9. Kronik, L., Makmal, A., Tiago, M.L., Alemany, M.M.G., Jain, M., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC the pseudopotential algorithm for realspace electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi (b) 243(5), 1063–1079 (2006)

    Article  Google Scholar 

  10. NERSC. http://www.nersc.gov

  11. NERSC: Cori. http://www.nersc.gov/systems/cori/

  12. NERSC: Measuring arithmetic intensity. http://www.nersc.gov/users/application-performance/measuring-arithmetic-intensity

  13. Pfrommer, B., Raczkowski, D., Canning, A., Louie. S.G.: PARATEC (PARAllel Total Energy Code), Lawrence Berkeley National Laboratory (with contributions from Mauri, F., Cote, M., Yoon, Y., Pickard, C., Heynes, P.). For more information see www.nersc.gov/projects/paratec. There is no corresponding record for this reference

  14. Raman, K.: Calculating “flop” using intel software development emulator (intel sde), March 2015. https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde

  15. Soler, J.M., Artacho, E., Gale, J.D., Garca, A., Junquera, J., Ordejn, P., Snchez-Portal, D.: The SIESTA method for ab initio order-N materials simulation. J. Phys. Condens. Matter 14(11), 2745 (2002)

    Article  Google Scholar 

  16. Tal, A.: Intel software development emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator

  17. Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008

    Google Scholar 

  18. Williams, S., Watterman, A., Patterson, D.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), April 2009

    Google Scholar 

  19. Williams, S.: Roofline performance model. http://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

Download references

Acknowledgments

Supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U.S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory. Derek Vigil-Fowler is support by NREL’s LDRD Director’s Postdoctoral Fellowship. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

We acknowledge helpful conversations with Mike Greenfield, Paul Kent, David Prendergast and Pierre Carrier.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jack Deslippe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Deslippe, J. et al. (2016). Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics