A massively-parallel electronic-structure calculations based on real-space density functional theory

https://doi.org/10.1016/j.jcp.2009.11.038Get rights and content

Abstract

Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N3) operations such as the Gram–Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.

Introduction

First-principles calculations based on the density functional theory (DFT) [1], [2] have been performed on a variety of materials and have provided important microscopic information for physical properties based on quantum theory [3], [4], [5]. Thus, among various theoretical methodologies, DFT is a top choice at present for clarification and prediction of phenomena in condensed matter. The popularity of DFT is due to its relatively low computational costs and its reasonable accuracy, which often favor it over elaborate and highly accurate quantum chemistry approaches such as the configuration-interaction [6] or the diffusion Monte-Carlo approaches [7]. Although there are a few report that the exceptionally large-scale DFT calculations have been achieved [8], [9], the usual target systems to which DFT can be applied easily are still limited to medium-sized systems consisting of hundreds to one thousand of atoms. Recent research interests in condensed matter and the material sciences require DFT calculations for much larger systems such as nanoscale systems. For instance, in semiconductor science, the typical size of metal–oxide–semiconductor field-effect transistors is on the nanometer scale. For such systems quantum mechanical simulations are important for understanding device characteristics [10], [11]. Furthermore, in the life sciences, correlations between atomic structures and the bio-functions of in vivo proteins are hard to clarify without the help of simulations based on quantum theory [12], [13]. Since nanoscale systems consist of 10,000–100,000 atoms or more, it is imperative to perform drastically large-scale calculations based on DFT to address these important subjects, and it is impossible to carry out such large-scale calculations with traditional computational programs. Thus, several computational approaches have been investigated intensively.

One promising approach is the order-N method [14] in which mathematical problems are re-formulated so as to utilize the possible localized nature of unitary-transformed wave functions or density matrices. However, another approach exists in which the conventional order-N3 exact formulation is adopted and drastic improvements are achieved in the performance of the computer codes through restructuring and tuning for state-of-the-art computer architectures.

For the latter approach, the real-space finite-difference pseudopotential method proves to be a key ingredient because the methods are suitable for parallel computations. The real-space method for first-principles electronic-structure calculations was first proposed by Chelikowsky et al. in 1994 [15], and then, various developments and applications have been performed by many researchers [8], [16], [17], [18], [19], [20], [21]. Recently the real-space method have been applied for unprecedentedly large size Si nanocrystals by Zhou et al. [8]. In the real-space methods, singular ionic potentials are replaced by smoother pseudopotentials [22], and Schrödinger-type quantum mechanical equations are discretized on three-dimensional spatial grids and the solutions are produced by treating them as finite-difference (FD) equations. In principle, the matrix of the real-space formulation is sparse and fast Fourier transformation (FFT) is unnecessary for Hamiltonian matrix operations. The FFT-free character contrasts with the conventional plane-wave methods [23], [24], and provides a great advantage by easing the communication burden in parallel computations.

To date, the real-space method have achieved the calculations for the systems of thousands of atoms in the order-N3 exact formulation [8], [9]. The real-space method is also promising for much larger systems containing 10,000–100,000 atoms. Therefore it is important to understand the computational details of the state-of-the-art real-space method, including the algorithms, implementations, and the performances, for further development of the first-principles calculations with the next-generation supercomputers. In this paper, we present a detailed description of our real-space DFT (RSDFT) code developed recently to overcome the size limitation of our computing system by utilizing the power of massively-parallel computing. We also investigate in detail the computational costs of RSDFT code to clarify how to study large systems using next-generation supercomputers.

The algorithm employed in our code is rather conventional, so that the computational costs scale as O(N3). However, there are several benefits to develop the code based on the conventional algorithm. First, we already know its applicability, accuracy, and suitable choices of computational parameters such as cut-off energies and sampling k points. Second, we are able to concentrate on particular problems in large-scale calculations, such as numerical precision, convergence behavior, and the reliability of the total-energy calculation itself. The present study focuses on the computational aspects in large-scale real-space calculations. Thus, the programming techniques developed for RSDFT are also applicable to other methods, e.g. order-N methods.

In Section 2, we briefly review the basics and fundamental equations of DFT. In Section 3, we present a similar formulation to that of Section 2, but in a discrete space for real-space FD calculations. Although the Sections 2 Density functional theory, 3 DFT on three-dimensional grid space are somewhat repetitious for the specialists of the DFT calculations, we add them by the following reason; for future development of the first-principles calculations, we must need the help of the specialists of computer and computational sciences to bring out the best performance of the supercomputers, and we aim to remove the barrier at the entrance of the DFT calculations for the non-DFT specialists. In Section 4, we summarize the computational parameters and the several initial configurations for the parallel computation. In Section 5, we describe the overall algorithm of our RSDFT code and the details of several main subroutines. We introduce a new algorithm to accelerate O(N3) operations in Gram–Schmidt orthogonalization and in subspace diagonalization. In Section 6, we present the performance tests of our code and analyze the costs of computation and communication. In Section 7, we show several practical applications dealing with Si crystal, nanometer-scale Si quantum dots, and Si nanowires. Finally, we present a summary and conclusion in Section 8. Acronyms used in this paper are summarized in Table 1.

Section snippets

Density functional theory

The ultimate purpose of DFT calculations is to minimize an energy functional E[ρ] with respect to the electron density ρ. Following the standard Kohn–Sham DFT formalism [2], we introduce the orbital ϕn, and assume that the electron density is expressed as a sum of the absolute square of the orbitalsρ(r)=n=1MBfn|ϕn(r)|2and the minimization is performed with respect to the orbitals. In Eq. (1), MB is the total number of orbitals and fn is the occupation number for the nth orbital. The explicit

DFT on three-dimensional grid space

To numerically minimize the energy functional, we introduce a three-dimensional spatial grid and consider the minimization problem within the discrete space of ML grid points. Then the orbitals, density, and potentials are expressed as column vectors whose elements are the value at each grid point. For example,ϕ=ϕ1ϕiϕML,where the ith element is the value at the grid point ri:ϕi=ϕ(ri).

In this discrete scheme, the energy functional can be written asE[ϕ1,ϕ2,,ϕML]=-12n=1MBfni=1MLj=1MLϕniLijϕ

Computational set up and parallelization

The systems for which we chose to perform DFT calculations may be classified into two categories: (1) a system with periodic boundary conditions for orbitals and (2) a system with decaying boundary conditions. Crystalline solids or supercell geometries are typical of the former, and molecules or clusters belong to the latter. For both systems, we first must set the number N, the position, and the species of each atom in the unit cell. Since we are usually interested in the behavior of

Details of algorithms

Minimization of the energy functional is two-fold. One minimization is with respect to the electron density or the orbitals, and the other minimization is with respect to the atomic coordinates. In our computational code, the minimization with respect to atomic coordinates contains the minimization with respect to the orbitals for fixed atomic coordinates as an internal procedure. This corresponds to the Born–Oppenheimer approximation or the adiabatic approximation. The most time-consuming

Performance tests

The predominant portion of the total computational time in RSDFT calculations results from the SCF iterations. Therefore, in this section we investigate in detail the costs of computations and communications for SCF iterations. To this end, we present several test calculations performed by the RSDFT code for periodic systems. The test systems are cubic Si crystals in the diamond structure, with sizes of 512, 1000, 1728, 2744, and 4096 atoms. Only the gamma point is sampled for the Brillouin

Practical applications

We begin this section by showing an example of the application of our RSDFT code to a crystalline Si system with 4096 atoms. The crystalline lattice is diamond structure with a lattice constant of 43.4 Å. The grid-spacing is chosen to be 0.45 Å, which corresponds to the lattice constant divided into 96 segments. Thus, the total number of grid points is 963=884,736 points. For the ionic potentials, we employ the norm-conserving pseudopotential [22] in the separable approximate form [27], and we

Summary

We have developed RSDFT code suitable for massively-parallel computers, and have demonstrated that the code, which is based on the usual O(N3) formulation, can treat large systems. The code was developed to perform the computations using matrix products to the extent possible, which substantially improves the performance of O(N3) parts of the computations. We apply the code to 10,000 Si atom systems, and demonstrate that the self-consistent electronic-structure can be obtained within a few

Acknowledgments

We acknowledge the members of the COMAS-DFT meeting at University of Tsukuba, especially Prof. M. Sato, Prof. T. Sakurai, and Prof. A. Ukawa, for fruitful discussions and comments on the development of the RSDFT code. Numerical calculations for the present work have been carried out under the Interdisciplinary Computational Science Program at the Center for Computational Sciences, University of Tsukuba.

References (50)

  • Y. Zhou et al.

    Parallel self-consistent calculations via Chebyshev-filtered subspace acceleration

    Phys. Rev. E

    (2006)
  • T.-L. Chan et al.

    Size limits on doping phosphorus into silicon nanocrystals

    Nano Lett.

    (2008)
  • M.L. Lee et al.

    J. Appl. Phys.

    (2005)
  • N. Umezawa et al.

    Chemical controllability of charge states of nitrogen-related defects in HfOxNy: first-principles calculations

    Phys. Rev. B

    (2008)
  • S. Raugei et al.

    DFT modeling of biological systems

    Phys. Status Solidi b

    (2006)
  • K. Kamiya et al.

    Possible mechanism of proton transfer through peptide groups in the H-pathway of the bovine cytochrome c oxidase

    J. Am. Chem. Soc.

    (2007)
  • S. Goedecker

    Linear scalin electronic structure methods

    Rev. Mod. Phys.

    (1999)
  • J.R. Chelikowsky et al.

    Higher-order finite-difference pseudopotential method – an application to diatomic-molecules

    Phys. Rev. B

    (1994)
  • K. Yabana et al.

    Time-dependent local-density approximation in real time

    Phys. Rev. B

    (1996)
  • E.L. Briggs et al.

    Real-space multigrid-based approach to large-scale electronic structure calculations

    Phys. Rev. B

    (1996)
  • T.L. Beck

    Real-space mesh techniques in density functional theory

    Rev. Mod. Phys.

    (2000)
  • K. Hirose et al.

    First-Principles Calculation in Real-Space Formalism, Electronic Configurations and Transport Properties of Nanostructures

    (2005)
  • J.-I. Iwata et al.

    Large-scale density functional calculations on silicon divacancies

    Phys. Rev. B

    (2008)
  • N. Troullier et al.

    Efficient pseudopotentials for plane-wave calculations

    Phys. Rev. B

    (1991)
  • J. Yamauchi et al.

    First-principles study on energetics of c-BN(0 0 1) reconstructed surfaces

    Phys. Rev. B

    (1996)
  • Cited by (143)

    View all citing articles on Scopus
    View full text