ABSTRACT
In this work, we present our implementation of the density functional theory (DFT) plane wave pseudopotential (PWP) calculations on GPU clusters. This GPU version is developed based on a CPU DFT-PWP code: PEtot, which can calculate up to a thousand atoms on thousands of processors. Our test indicates that the GPU version can have a ~10 times speed-up over the CPU version. A detail analysis of the speed-up and the scaling on the number of CPU/GPU computing units (up to 256) are presented. The success of our speed-up relies on the adoption a hybrid reciprocal-space and band-index parallelization scheme. As far as we know, this is the first GPU DFT-PWP code scalable to large number of CPU/GPU computing units. We also outlined the future work, and what is needed to further increase the computational speed by another factor of 10.
- Carter, E. A. 2008, Challenges in Modeling Materials Properties Without Experimental Input, Science, 321 (Aug. 2008), 800--803.Google ScholarCross Ref
- Rapaport, D. C., The Art of Molecular Dynamics Simulation (Cambridge University Press, 2004, 2nd Ed). Google ScholarDigital Library
- Martin, R. M., Electronic Structure: Basic Theory and Practical Methods (Cambridge University Press, 2004).Google ScholarCross Ref
- Wang, L. W. 2005, A survey of codes and algorithms used in NERSC material science allocations, LBNL technical report, LBNL-61051. (http://escholarship.org/uc/item/9kh2q0cd).Google Scholar
- Wimmer, E., Krakauer, H., Weinert, M., Freeman A. J. 1981, Full-potential self-consistent linearized-augmented-plane-wave method for calculating the electronic structure of molecules and surfaces: O2 molecule, Phys. Rev. B 24, 864.Google ScholarCross Ref
- Saad, Y., Zhou, Y., Bekas, C., Tiago, M. L., Chelikowsky J. R. 2006, Diagonalization methods in PARSEC, Phys. Status Solidi B 243, 2188--97.Google ScholarCross Ref
- Pask, J. E., Klein, B. M., Sterne, P. A., Fong, C. Y. 2001, Finite-element methods in electronic-structure theory, Comp. Phys. Commun. 135, 1.Google ScholarCross Ref
- Ordejon, P., Artacho, E., Soler, J. M. 1996, Self-consistent order-N density-functional calculations for very large systems, Phys. Rev. B 53, R10441.Google ScholarCross Ref
- Genovese, L., Neelov, A., Goedecker S., Deutsch, T., Ghasemi, A., Willand, A, Galiste, D., Zilberberg, O., Rayson, M., Bergman, A., Schneider, R. 2008, Daubechies wavelets as a basis set for density functional pseudopotential calculations, J. Chem. Phys, 129, 014109.Google ScholarCross Ref
- Payne, M. C., Teter, M. P., Allan, D. C., Arias, T. A., Joannopoulos, J. D. 1992, Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients, Rev. Mod. Phys. 64, 1046--97.Google ScholarCross Ref
- There are many DFT PWP codes, here is a partial list: VASP, CASTEP, CPMD, ABINIT, PWSCF, PEtot, DACAPO, SOCORRO, DFT++, PARATEC, DOD-PW, CP2K, SPHINX, and QBOX.Google Scholar
- VASP: http://cms.mpi.univie.ac.at/vasp/Google Scholar
- Gygi, F. 2008, Architecture of Qbox: A scalable first-principles molecular dynamics code, IBM J Res. & Dev. 52, 1--8. Google ScholarDigital Library
- Gygi, F., Draeger, E., de Supinski, B. R., Yates, R. K., Franchetti, F., Kral, S., Lorenz, J., Ueberhuber, C. W., Gunnels, J., Sexton, J. 2005, Large-scale first-principles molecular dynamics simulations on the bluegen/L platform using the Qbox code, ACM. digital Lib, Proceedings of the 2005 ACM/IEEE SC/05 Conference (SC'05). Google ScholarDigital Library
- Dag, S., Canning, A., Wang, L. W., Large scale conjugated gradient electronic structure calculations including spin-orbit coupling (unpublished).Google Scholar
- Wang, L. W., Lee, B., Shan, H., Zhao, Z., Meza, J., Strohmaier, E., Bailey, D. 2008, Linear scaling 3D fragment method for large scale electronic structure calculations, Proc. 2008 ACM/IEE Conf. Supercomp. (ACM Gordon Bell), Article 65. Google ScholarDigital Library
- Gotz, A. W., Wolfle, T., Walker, R. C. 2010, Quantum Chemistry on Graphics Processing Units, Annual Reports in Comp. Chemistry. Vol.6, 21.Google ScholarCross Ref
- Genovese, L., Ospici, M., Deutsch, T., Mehaut, J.-F., Neelov, A., Goedecker, S. 2009, Density functional theory calculation on many-cores hybrid CPU-GPU architectures, J. Chem. Phys. 131, 034103.Google ScholarCross Ref
- Tomono, H., Aoki, M., Iitaka, T., Tsumuraya K. 2010, GPU based acceleration of first principles calculation, J. of Phys: Conf. Series 215, 012121.Google ScholarCross Ref
- Canning, A., Wang, L. W., Williamson, A., Zunger, A. 2000, Parallel empirical pseudopotential electronic structure calculations for million atom systems, J. Comp. Phys. 160, 29. Google ScholarDigital Library
- Wang, L. W., PEtot code: https://hpcrd.lbl.gov/~linwang/PEtot/PEtot.htmlGoogle Scholar
- Raczkowski, D., Canning, A., Wang, L. W. 2001, Thomas-Fermi charge mixing for obtaining self-consistency in density functional calculations, Phys. Rev. B, 64, R121101.Google ScholarCross Ref
- Canning, A., Shalf, J., Wang, L. W., Wasserman, H., Gajbe M. 2009, A comparison of different communication structures for scalable parallel three dimensional FFTs in first principle codes, Proceed. Parco09, Lyon France.Google Scholar
- Bekas, C., Curionic, A. 2010, Very large scale wave function orthogonalization in Density Functional Theory electronic structure calculation, Comp. Phys. Comm. 181, 1057.Google ScholarCross Ref
- Wang, L. W. 2001, Mask function real space implementations of nonlocal pseudopotentials, Phys. Rev. B 64, R201107.Google ScholarCross Ref
- Wang, L. W. 2001, Large scale LDA-band-gap-corrected GaAsN calculations, Appl. Phys. Lett. 78, 1565.Google ScholarCross Ref
- Wang, L. W., Zunger, A. 1996, Pseudopotential calculations of nanoscale CdSe quantum dots, Phys. Rev. B 53, 9579.Google ScholarCross Ref
- http://www.top500.orgGoogle Scholar
- Anderson, A. G., Goddard III, W. A., Schroder, P. 2007, Quantum Monte Carlo on graphical processing units, Comp. Phys. Comm. 177, 298--306.Google ScholarCross Ref
- Esler, K. P., Kim, J., Schulenburger L., Ceperley, D. M. 2011, Fully accelerating quantum Monte Carlo simulations of real materials on GPU clusters, Comp. Sci. Eng. (in press).Google Scholar
- Wang, L. W. 2008, A special purpose computer for ab initio molecular dynamics simulations, LBNL technical report, LBNL-817E. (http://escholarship.org/uc/item/2874t8zr)Google Scholar
- http://www.culatools.comGoogle Scholar
- Yasuda, K. 2008, Accelerating density functional calculations with graphics processing unit, J. Chem. Theory Comput. 4, 1230--1236.Google ScholarCross Ref
Index Terms
- Large scale plane wave pseudopotential density functional theory calculations on GPU clusters
Recommendations
Fast plane wave density functional theory molecular dynamics calculations on multi-GPU machines
Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most widely used method for material simulations, but its absolute speed stagnated due to the inability to use large scale CPU based computers. By a drastic redesign of ...
High-productivity Framework for Large-scale GPU/CPU Stencil Applications
A high-productivity framework for multi-GPU and multi-CPU computation of stencil applications is proposed. Our framework is implemented in C++ and CUDA languages. It automatically translates user-written stencil functions that update a grid point and ...
CPU/GPU computing for long-wave radiation physics on large GPU clusters
Geoscience simulations rely heavily on high performance computing (HPC) systems. To date, many CPU/GPU heterogeneous HPC systems have been established on which many geoscience simulations have been performed. For most of these simulations on GPU ...
Comments