A massively-parallel electronic-structure calculations based on real-space density functional theory

doi:10.1016/j.jcp.2009.11.038

Journal of Computational Physics

Volume 229, Issue 6, 20 March 2010, Pages 2339-2363

https://doi.org/10.1016/j.jcp.2009.11.038 Get rights and content

Abstract

Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of $O (N^{3})$ operations such as the Gram–Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.

Introduction

First-principles calculations based on the density functional theory (DFT) [1], [2] have been performed on a variety of materials and have provided important microscopic information for physical properties based on quantum theory [3], [4], [5]. Thus, among various theoretical methodologies, DFT is a top choice at present for clarification and prediction of phenomena in condensed matter. The popularity of DFT is due to its relatively low computational costs and its reasonable accuracy, which often favor it over elaborate and highly accurate quantum chemistry approaches such as the configuration-interaction [6] or the diffusion Monte-Carlo approaches [7]. Although there are a few report that the exceptionally large-scale DFT calculations have been achieved [8], [9], the usual target systems to which DFT can be applied easily are still limited to medium-sized systems consisting of hundreds to one thousand of atoms. Recent research interests in condensed matter and the material sciences require DFT calculations for much larger systems such as nanoscale systems. For instance, in semiconductor science, the typical size of metal–oxide–semiconductor field-effect transistors is on the nanometer scale. For such systems quantum mechanical simulations are important for understanding device characteristics [10], [11]. Furthermore, in the life sciences, correlations between atomic structures and the bio-functions of in vivo proteins are hard to clarify without the help of simulations based on quantum theory [12], [13]. Since nanoscale systems consist of 10,000–100,000 atoms or more, it is imperative to perform drastically large-scale calculations based on DFT to address these important subjects, and it is impossible to carry out such large-scale calculations with traditional computational programs. Thus, several computational approaches have been investigated intensively.

One promising approach is the order-N method [14] in which mathematical problems are re-formulated so as to utilize the possible localized nature of unitary-transformed wave functions or density matrices. However, another approach exists in which the conventional order- $N^{3}$ exact formulation is adopted and drastic improvements are achieved in the performance of the computer codes through restructuring and tuning for state-of-the-art computer architectures.

For the latter approach, the real-space finite-difference pseudopotential method proves to be a key ingredient because the methods are suitable for parallel computations. The real-space method for first-principles electronic-structure calculations was first proposed by Chelikowsky et al. in 1994 [15], and then, various developments and applications have been performed by many researchers [8], [16], [17], [18], [19], [20], [21]. Recently the real-space method have been applied for unprecedentedly large size Si nanocrystals by Zhou et al. [8]. In the real-space methods, singular ionic potentials are replaced by smoother pseudopotentials [22], and Schrödinger-type quantum mechanical equations are discretized on three-dimensional spatial grids and the solutions are produced by treating them as finite-difference (FD) equations. In principle, the matrix of the real-space formulation is sparse and fast Fourier transformation (FFT) is unnecessary for Hamiltonian matrix operations. The FFT-free character contrasts with the conventional plane-wave methods [23], [24], and provides a great advantage by easing the communication burden in parallel computations.

To date, the real-space method have achieved the calculations for the systems of thousands of atoms in the order- $N^{3}$ exact formulation [8], [9]. The real-space method is also promising for much larger systems containing 10,000–100,000 atoms. Therefore it is important to understand the computational details of the state-of-the-art real-space method, including the algorithms, implementations, and the performances, for further development of the first-principles calculations with the next-generation supercomputers. In this paper, we present a detailed description of our real-space DFT (RSDFT) code developed recently to overcome the size limitation of our computing system by utilizing the power of massively-parallel computing. We also investigate in detail the computational costs of RSDFT code to clarify how to study large systems using next-generation supercomputers.

The algorithm employed in our code is rather conventional, so that the computational costs scale as $O (N^{3})$ . However, there are several benefits to develop the code based on the conventional algorithm. First, we already know its applicability, accuracy, and suitable choices of computational parameters such as cut-off energies and sampling k points. Second, we are able to concentrate on particular problems in large-scale calculations, such as numerical precision, convergence behavior, and the reliability of the total-energy calculation itself. The present study focuses on the computational aspects in large-scale real-space calculations. Thus, the programming techniques developed for RSDFT are also applicable to other methods, e.g. order-N methods.

In Section 2, we briefly review the basics and fundamental equations of DFT. In Section 3, we present a similar formulation to that of Section 2, but in a discrete space for real-space FD calculations. Although the Sections 2 Density functional theory, 3 DFT on three-dimensional grid space are somewhat repetitious for the specialists of the DFT calculations, we add them by the following reason; for future development of the first-principles calculations, we must need the help of the specialists of computer and computational sciences to bring out the best performance of the supercomputers, and we aim to remove the barrier at the entrance of the DFT calculations for the non-DFT specialists. In Section 4, we summarize the computational parameters and the several initial configurations for the parallel computation. In Section 5, we describe the overall algorithm of our RSDFT code and the details of several main subroutines. We introduce a new algorithm to accelerate $O (N^{3})$ operations in Gram–Schmidt orthogonalization and in subspace diagonalization. In Section 6, we present the performance tests of our code and analyze the costs of computation and communication. In Section 7, we show several practical applications dealing with Si crystal, nanometer-scale Si quantum dots, and Si nanowires. Finally, we present a summary and conclusion in Section 8. Acronyms used in this paper are summarized in Table 1.

Section snippets

Density functional theory

The ultimate purpose of DFT calculations is to minimize an energy functional $E [ρ]$ with respect to the electron density $ρ$ . Following the standard Kohn–Sham DFT formalism [2], we introduce the orbital $ϕ_{n}$ , and assume that the electron density is expressed as a sum of the absolute square of the orbitals $ρ (r) = \sum_{n = 1}^{M_{B}} f_{n} | ϕ_{n} (r) |^{2}$ and the minimization is performed with respect to the orbitals. In Eq. (1), $M_{B}$ is the total number of orbitals and $f_{n}$ is the occupation number for the nth orbital. The explicit

DFT on three-dimensional grid space

To numerically minimize the energy functional, we introduce a three-dimensional spatial grid and consider the minimization problem within the discrete space of $M_{L}$ grid points. Then the orbitals, density, and potentials are expressed as column vectors whose elements are the value at each grid point. For example, $\vec{ϕ} = (\begin{matrix} ϕ^{1} \\ ⋮ \\ ϕ^{i} \\ ⋮ \\ ϕ^{M_{L}} \end{matrix}),$ where the ith element is the value at the grid point $r_{i}$ : $ϕ^{i} = ϕ (r_{i}) .$

In this discrete scheme, the energy functional can be written as $E [ϕ^{1}, ϕ^{2}, \dots, ϕ^{M_{L}}] = - \frac{1}{2} \sum_{n = 1}^{M_{B}} f_{n} \sum_{i = 1}^{M_{L}} \sum_{j = 1}^{M_{L}} ϕ_{n}^{i *} L_{ij} ϕ$

Computational set up and parallelization

The systems for which we chose to perform DFT calculations may be classified into two categories: (1) a system with periodic boundary conditions for orbitals and (2) a system with decaying boundary conditions. Crystalline solids or supercell geometries are typical of the former, and molecules or clusters belong to the latter. For both systems, we first must set the number N, the position, and the species of each atom in the unit cell. Since we are usually interested in the behavior of

Details of algorithms

Minimization of the energy functional is two-fold. One minimization is with respect to the electron density or the orbitals, and the other minimization is with respect to the atomic coordinates. In our computational code, the minimization with respect to atomic coordinates contains the minimization with respect to the orbitals for fixed atomic coordinates as an internal procedure. This corresponds to the Born–Oppenheimer approximation or the adiabatic approximation. The most time-consuming

Performance tests

The predominant portion of the total computational time in RSDFT calculations results from the SCF iterations. Therefore, in this section we investigate in detail the costs of computations and communications for SCF iterations. To this end, we present several test calculations performed by the RSDFT code for periodic systems. The test systems are cubic Si crystals in the diamond structure, with sizes of 512, 1000, 1728, 2744, and 4096 atoms. Only the gamma point is sampled for the Brillouin

Practical applications

We begin this section by showing an example of the application of our RSDFT code to a crystalline Si system with 4096 atoms. The crystalline lattice is diamond structure with a lattice constant of 43.4 Å. The grid-spacing is chosen to be 0.45 Å, which corresponds to the lattice constant divided into 96 segments. Thus, the total number of grid points is $96^{3} = 884, 736$ points. For the ionic potentials, we employ the norm-conserving pseudopotential [22] in the separable approximate form [27], and we

Summary

We have developed RSDFT code suitable for massively-parallel computers, and have demonstrated that the code, which is based on the usual $O (N^{3})$ formulation, can treat large systems. The code was developed to perform the computations using matrix products to the extent possible, which substantially improves the performance of $O (N^{3})$ parts of the computations. We apply the code to 10,000 Si atom systems, and demonstrate that the self-consistent electronic-structure can be obtained within a few

Acknowledgments

We acknowledge the members of the COMAS-DFT meeting at University of Tsukuba, especially Prof. M. Sato, Prof. T. Sakurai, and Prof. A. Ukawa, for fruitful discussions and comments on the development of the RSDFT code. Numerical calculations for the present work have been carried out under the Interdisciplinary Computational Science Program at the Center for Computational Sciences, University of Tsukuba.

References (50)

U.V. Waghmare et al.
HARES: an efficient method for first-principles electronic structure calculations of complex systems
Comput. Phys. Commun.
(2001)
P. Pulay
Convergence acceleration of iterative sequences. The case of SCF iteration
Chem. Phys. Lett.
(1980)
C. Vömel et al.
State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems
J. Comput. Phys.
(2008)
P. Hohenberg et al.
Inhomogeneous electron gas
Phys. Rev.
(1964)
W. Kohn et al.
Self-consistent equations including exchange and correlation effects
Phys. Rev.
(1965)
R.O. Jones et al.
The density functional formalism, its applications and prospects
Rev. Mod. Phys.
(1989)
M.C. Payne et al.
Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients
Rev. Mod. Phys.
(1992)
W. Kohn
Nobel lecture: electronic structure of matter – wave functions and density functionals
Rev. Mod. Phys.
(1999)
J.M. Foster et al.
Cannonical configurational interaction procedure
Rev. Mod. Phys.
(1960)
W.M.C. Foulkes et al.
Quantum Monte Carlo simulations of solids
Rev. Mod. Phys.
(2001)

Y. Zhou et al.

Parallel self-consistent calculations via Chebyshev-filtered subspace acceleration

Phys. Rev. E

(2006)

T.-L. Chan et al.

Size limits on doping phosphorus into silicon nanocrystals

Nano Lett.

(2008)

M.L. Lee et al.

J. Appl. Phys.

(2005)

N. Umezawa et al.

Chemical controllability of charge states of nitrogen-related defects in HfOxNy: first-principles calculations

Phys. Rev. B

(2008)

S. Raugei et al.

DFT modeling of biological systems

Phys. Status Solidi b

(2006)

K. Kamiya et al.

Possible mechanism of proton transfer through peptide groups in the H-pathway of the bovine cytochrome c oxidase

J. Am. Chem. Soc.

(2007)

S. Goedecker

Linear scalin electronic structure methods

Rev. Mod. Phys.

(1999)

J.R. Chelikowsky et al.

Higher-order finite-difference pseudopotential method – an application to diatomic-molecules

Phys. Rev. B

(1994)

K. Yabana et al.

Time-dependent local-density approximation in real time

Phys. Rev. B

(1996)

E.L. Briggs et al.

Real-space multigrid-based approach to large-scale electronic structure calculations

Phys. Rev. B

(1996)

T.L. Beck

Real-space mesh techniques in density functional theory

Rev. Mod. Phys.

(2000)

K. Hirose et al.

First-Principles Calculation in Real-Space Formalism, Electronic Configurations and Transport Properties of Nanostructures

(2005)

J.-I. Iwata et al.

Large-scale density functional calculations on silicon divacancies

Phys. Rev. B

(2008)

N. Troullier et al.

Efficient pseudopotentials for plane-wave calculations

Phys. Rev. B

(1991)

J. Yamauchi et al.

First-principles study on energetics of c-BN(0 0 1) reconstructed surfaces

Phys. Rev. B

(1996)

Cited by (143)

Efficient parallel strategy for molecular plasmonics – A numerical tool for integrating Maxwell-Schrödinger equations in three dimensions
2023, Journal of Computational Physics
An efficient parallelization approach to simulate optical properties of ensembles of quantum emitters in realistic electromagnetic environments is considered. It relies on balancing computing load of utilized processors and is built into three-dimensional domain decomposition methodology implemented for numerical integration of Maxwell's equations. The approach employed enables directly accessing dynamics of collective effects as the number of molecules in simulations can be drastically increased. Numerical experiments measuring speedup factors demonstrate the efficiency of the proposed methodology. As an example, we consider dynamics of nearly 700,000 diatomic molecules with ro-vibrational degrees of freedom explicitly accounted for coupled to electromagnetic radiation crafted by periodic arrays of split-ring resonators and triangular nanoholes. As an application of the approach, dissociation dynamics under strong coupling conditions is scrutinized. It is demonstrated that the dissociation rates are significantly affected near polaritonic frequencies.
Insight into the step flow growth of gallium nitride based on density functional theory
2023, Applied Surface Science
We report the first-principles calculations based on the density-functional theory that elucidate the mechanism of the epitaxial growth on the mixed GaN (0001) surface which consists of the Ga-adatom and the Ga-H structural motifs. We find that the source-gas molecule NH₃ is decomposed on the Ga-adatom site to an NH unit and the resulting NH unit diffuses in a newly found concerted way in which the Ga adatom assists in the NH diffusion by annihilating dangling bonds nearby. We also identify the structures of the H-covered step edges and obtain their formation energies. We find plausible reaction pathways near the step edges in which the NH units and Ga adatoms are incorporated in the epitaxial films, forming new units of GaN. The obtained final structures are certainly the manifestation of the step-flow mechanism of the epitaxial growth of GaN.
Energy levels of carbon dangling-bond center (P<inf>bC</inf> center) at 4H-SiC(0001)/SiO<inf>2</inf> interface
2023, APL Materials
Calculation of phonons in real-space density functional theory
2023, Physical Review E
RSDFT-NEGF transport simulations in realistic nanoscale transistors
2023, Journal of Computational Electronics
Finite Difference Interpolation for Reduction of Grid-Related Errors in Real-Space Pseudopotential Density Functional Theory
2023, Journal of Chemical Theory and Computation

View all citing articles on Scopus

View full text

A massively-parallel electronic-structure calculations based on real-space density functional theory

Abstract

Introduction

Section snippets

Density functional theory

DFT on three-dimensional grid space

Computational set up and parallelization

Details of algorithms

Performance tests

Practical applications

Summary

Acknowledgments

Comput. Phys. Commun.

Chem. Phys. Lett.

J. Comput. Phys.

Inhomogeneous electron gas

Phys. Rev.

Self-consistent equations including exchange and correlation effects

Phys. Rev.

The density functional formalism, its applications and prospects

Rev. Mod. Phys.

Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients

Rev. Mod. Phys.

Nobel lecture: electronic structure of matter – wave functions and density functionals

Rev. Mod. Phys.

Cannonical configurational interaction procedure

Rev. Mod. Phys.

Quantum Monte Carlo simulations of solids

Rev. Mod. Phys.

Parallel self-consistent calculations via Chebyshev-filtered subspace acceleration

Phys. Rev. E

Size limits on doping phosphorus into silicon nanocrystals

Nano Lett.

J. Appl. Phys.

Chemical controllability of charge states of nitrogen-related defects in HfOxNy: first-principles calculations

Phys. Rev. B

DFT modeling of biological systems

Phys. Status Solidi b

Possible mechanism of proton transfer through peptide groups in the H-pathway of the bovine cytochrome c oxidase

J. Am. Chem. Soc.

Linear scalin electronic structure methods

Rev. Mod. Phys.

Higher-order finite-difference pseudopotential method – an application to diatomic-molecules

Phys. Rev. B

Time-dependent local-density approximation in real time

Phys. Rev. B

Real-space multigrid-based approach to large-scale electronic structure calculations

Phys. Rev. B

Real-space mesh techniques in density functional theory

Rev. Mod. Phys.

First-Principles Calculation in Real-Space Formalism, Electronic Configurations and Transport Properties of Nanostructures

Large-scale density functional calculations on silicon divacancies

Phys. Rev. B

Efficient pseudopotentials for plane-wave calculations

Phys. Rev. B

First-principles study on energetics of c-BN(0 0 1) reconstructed surfaces

Phys. Rev. B