Rapid iterative method for electronic-structure eigenproblems using localised basis functions

https://doi.org/10.1016/j.cpc.2007.08.007Get rights and content

Abstract

Eigenproblems resulting from the use of localised basis functions (typically Gaussian or Slater type orbitals) in density functional electronic-structure calculations are often solved using direct linear algebra. A full implementation is presented built around an iterative method known as ‘residual minimisation—direct inversion of the iterative subspace’ (RM-DIIS) to be used to solve many similar eigenproblems in a self-consistency cycle. The method is more efficient than direct methods and exhibits superior scaling on parallel supercomputers.

Introduction

Density functional theory, using the Kohn–Sham formalism [1], [2], has been very successful in investigating the properties of condensed matter systems. In state-of-the-art localised orbital electronic-structure codes diagonalisation (cubically scaling) typically becomes the time dominant part of the calculation for more than around 100 atoms due to the Hamiltonian evaluation being asymptotically linear scaling with respect to system size. The following algorithm is primarily aimed at conventional (cubic scaling) electronic-structure codes that use such localised orbitals whether they be Gaussian, Slater or more recently numerical [3]. Often, the number of basis functions per atom lies in the range 10–20 in a typical pseudopotential calculation, with the result that as many as 20% of the eigenpairs1 are required to construct the density matrix. Although much effort has gone into linear scaling methods in the last few decades conventional algorithms are still the most widely used. Even when the use of linear scaling algorithms becomes widespread the cross-over point in the size of the system compared to conventional localised orbital codes is still rather unclear. For dense three-dimensional (cubic) systems, even wide-gap insulators, the cross-over point will almost certainly be above 500 atoms, and more for metals. Therefore, there still remains a huge amount of science to be done below these system sizes needing ever more improvements in the efficiency of the conventional algorithm. Furthermore, the linear scaling divide and conquer algorithm [4] relies on many separate conventional calculations, so improvements to these on the scale of a hundred to a few thousand atoms may also be useful for linear scaling approaches. Clearly, there is still a great deal to be gained in the improvement of the kernel of conventional algorithms.

In this work, we present an iterative algorithm and compare it directly to state-of-the-art linear algebra routines. So, for a given potential the eigenproblem is solved completely with no intermediate orthogonalisation or change to the potential. As it stands, therefore, the following implementation can be simply used replace a call to a direct diagonalisation routine. Presenting the algorithm in this way allows for a transparent comparison of the time-to-solution for an eigenproblem as it can be easily compared to well-known and widely used linear algebra software. However, we realise it may well be the case that fully solving each eigenproblem is not the most efficient overall implementation. If a potential update is negligible compared to a diagonalisation, as is typically the case for systems of more than around 100 atoms when using a Gaussian basis, then a trivial alteration to the algorithm (limiting the number of internal DIIS iterations, reorthogonalising and updating the many-body potential) is, of course, possible. Many such iterative schemes for the solution of the non-linear self-consistent eigenproblem have been presented (usually within the framework of large systematic basis sets which generate sparse matrices) [5], [6], [7]. For a more general discussion of iterative methods for large eigenvalue problems see Ref. [8]. Iterative methods for the solution of linear equations involving non-sparse matrices have also been presented [9].

Diagonalisation very rapidly becomes time dominant when using a Gaussian basis. Table 1 gives a stark demonstration of this using the AIMPRO code [10], it compares the time needed to evaluate the Hamiltonian matrix elements (the Hamiltonian build) to diagonalisation. These timings are least favourable to the Hamiltonian build for two main reasons. Firstly, a contracted basis set is used with only 13 functions per atom and the Hamiltonian build scales linearly with the number of basis functions whereas diagonalisation scales cubically. Secondly, cubic cells are used therefore the ratio of non-zero matrix-elements to number of atoms is high. Also, the parallel scaling of the Hamiltonian build is, in principle, always better than for diagonalisation.

In the context of plane-wave implementations [5] efficient parallelism of the DIIS algorithm has been achieved by splitting bands over processors and evaluating the H|ψ products either locally (on one CPU) or on a local grid of processors. This approach is possible if the explicit storage of the Hamiltonian is not required (as in plane-wave and many real-space approaches [11], [12]). When using a Gaussian basis one is required to store the Hamiltonian explicitly therefore our parallelisation strategy is the same as ScaLAPACK and we formulate the problem to be heavily reliant on matrix–matrix products (parallel BLAS) to ensure efficient parallel performance and portability. It is also worth noting that the scaling with respect to the number of basis functions (n) is O(n2) as opposed to O(n3) for direct diagonalisation, which for high quality, large basis calculations is a significant improvement. To improve upon direct diagonalisation using a small number of basis functions requires very good preconditioning of the iterative subspace (see Section 4). The preconditioning and pre-rotation steps (Sections 4 Preconditioning, 5 Preconditioning improvement, respectively) are essential for both stable and rapid convergence. In the implementation presented here typically only 3–6 iterations are required to converge eigenvalues to within a absolute tolerance of 10−12 atomic units.

Although the DIIS algorithm has been used for a wide array of problems in electronic structure algorithms, such as geometry optimisation [13] and the non-linear self-consistent iteration [5], [14], there has been comparatively little discussion of the possibility of its use to perform the core diagonalisation step in a localised orbital code. The algorithm presented has been used on thousands of varied calculations in the past four years in the AIMPRO code [10] and has led to significant speed-ups across a wide array of computational platforms due to its reliance on highly efficient and portable level 3 parallel BLAS routines.

Section snippets

Background

In the Kohn–Sham (KS) scheme [2] m one-particle equations of the form[122+Veff(r)]ψiKS(r)=λiψiKS(r) require solution. Expanding the set {ψiKS(r)}[1,m] in a general basis {ϕj(r)}[1,n] with coefficients {ψij}[i=1,m][j=1,n] such thatψiKS(r)=jnψijϕj(r) results in the generalised Hermitian matrix eigenproblemH|ψi=λiS|ψi of which the lowest m solutions are required. As Veff(r) depends on the density which in turn depends on the Kohn–Sham orbitals it must be iterated to self-consistency, thereby

RM-DIIS method

This method is central to our overall implementation and is based on the work of Wood and Zunger [17] who presented a plane-wave version. Subspace iterative methods aim to expand eigenvectors of interest in a subset of n-dimensional vectors (the ‘iterative subspace’). In the DIIS scheme for iteration number p and eigenvector i this set takes the form{|ξi}[p=1,,nx]=[|δψi0,|δψi(1),|δψi(2),,|δψi(nx)], where the number of vectors in the expansion set nx depends on the number of iterations

Preconditioning

At the start of the kth iteration we have the approximate eigenpair {|ψj(k),λj(k)} and corresponding residuals {|Rj(k)}. We wish to precondition this set of residuals to generate a good set of vectors {|δψj(k+1)} to add to our iterative subspace. We may begin with the perfect vector increment to the jth eigenpair as follows(Hλj(k)S)|ψj(k)+δψj(k+1)=|Rj(k)+(Hλj(k)S)|δψj(k+1)=0 rearranging gives|δψj(k+1)=(Hλj(k)S)−1|Rj(k). Clearly, exact solution of this leads to the trivial solution {|

Preconditioning improvement

The method described thus far is an ideal, dense linear algebra, implementation. Virtually all computational effort is in the form of matrix–matrix multiplications. In practice, however, it is found an extra step is required to improve the preconditioning to obtain stable and rapid convergence without the need for orthogonalisation. Good preconditioning is crucial to the stability of the DIIS method. Poor preconditioning leads to a plethora of problems such as band duplication, band omission,

Numerical examples

To demonstrate the efficacy of this approach, we have produced detailed timings illustrating the speed and parallel scaling behaviour of the DIIS algorithm. The matrices concerned were produced when AIMPRO was used to model 217 atoms of silicon, arranged in a 216 atom simple cubic unit cell with a single self-interstitial atom added. The basis sets used included 40, 22 and 16 basis functions per atom, giving matrices of sizes 8680, 4774 and 3472, respectively. Results are presented in Table 2

Conclusion

An efficient iterative diagonalisation algorithm for use with dense matrices arising from electronic-structure calculations has been presented. It has been extensively used in the AIMPRO code [10] over the past four years on a large number and variety of systems and has proven to be very stable. Its main advantages over direct diagonalisation are improved serial speed, superior parallel scaling and a reduction of the scaling of the algorithm with respect to the number of basis functions to

Acknowledgements

The authors would like to thank The Engineering and Physical Sciences Research Council (EPSRC) for financial support.

References (17)

  • C. Duneczky et al.

    Comput. Phys. Comm.

    (1989)
  • Y.K. Zhou et al.

    J. Comput. Phys.

    (2006)
  • P. Hohenberg et al.

    Phys. Rev. B

    (1964)
  • W. Kohn et al.

    Phys. Rev. A

    (1965)
  • J.M. Soler et al.

    J. Phys.: Condens. Matter

    (2002)
  • W. Yang

    Phys. Rev. Lett.

    (1991)
  • G. Kresse et al.

    Phys. Rev. B

    (1996)
  • Y.K. Zhou et al.

    Phys. Rev. E

    (2006)
There are more references available in the full text version of this article.

Cited by (158)

View all citing articles on Scopus
View full text