Rapid iterative method for electronic-structure eigenproblems using localised basis functions
Introduction
Density functional theory, using the Kohn–Sham formalism [1], [2], has been very successful in investigating the properties of condensed matter systems. In state-of-the-art localised orbital electronic-structure codes diagonalisation (cubically scaling) typically becomes the time dominant part of the calculation for more than around 100 atoms due to the Hamiltonian evaluation being asymptotically linear scaling with respect to system size. The following algorithm is primarily aimed at conventional (cubic scaling) electronic-structure codes that use such localised orbitals whether they be Gaussian, Slater or more recently numerical [3]. Often, the number of basis functions per atom lies in the range 10–20 in a typical pseudopotential calculation, with the result that as many as 20% of the eigenpairs1 are required to construct the density matrix. Although much effort has gone into linear scaling methods in the last few decades conventional algorithms are still the most widely used. Even when the use of linear scaling algorithms becomes widespread the cross-over point in the size of the system compared to conventional localised orbital codes is still rather unclear. For dense three-dimensional (cubic) systems, even wide-gap insulators, the cross-over point will almost certainly be above 500 atoms, and more for metals. Therefore, there still remains a huge amount of science to be done below these system sizes needing ever more improvements in the efficiency of the conventional algorithm. Furthermore, the linear scaling divide and conquer algorithm [4] relies on many separate conventional calculations, so improvements to these on the scale of a hundred to a few thousand atoms may also be useful for linear scaling approaches. Clearly, there is still a great deal to be gained in the improvement of the kernel of conventional algorithms.
In this work, we present an iterative algorithm and compare it directly to state-of-the-art linear algebra routines. So, for a given potential the eigenproblem is solved completely with no intermediate orthogonalisation or change to the potential. As it stands, therefore, the following implementation can be simply used replace a call to a direct diagonalisation routine. Presenting the algorithm in this way allows for a transparent comparison of the time-to-solution for an eigenproblem as it can be easily compared to well-known and widely used linear algebra software. However, we realise it may well be the case that fully solving each eigenproblem is not the most efficient overall implementation. If a potential update is negligible compared to a diagonalisation, as is typically the case for systems of more than around 100 atoms when using a Gaussian basis, then a trivial alteration to the algorithm (limiting the number of internal DIIS iterations, reorthogonalising and updating the many-body potential) is, of course, possible. Many such iterative schemes for the solution of the non-linear self-consistent eigenproblem have been presented (usually within the framework of large systematic basis sets which generate sparse matrices) [5], [6], [7]. For a more general discussion of iterative methods for large eigenvalue problems see Ref. [8]. Iterative methods for the solution of linear equations involving non-sparse matrices have also been presented [9].
Diagonalisation very rapidly becomes time dominant when using a Gaussian basis. Table 1 gives a stark demonstration of this using the AIMPRO code [10], it compares the time needed to evaluate the Hamiltonian matrix elements (the Hamiltonian build) to diagonalisation. These timings are least favourable to the Hamiltonian build for two main reasons. Firstly, a contracted basis set is used with only 13 functions per atom and the Hamiltonian build scales linearly with the number of basis functions whereas diagonalisation scales cubically. Secondly, cubic cells are used therefore the ratio of non-zero matrix-elements to number of atoms is high. Also, the parallel scaling of the Hamiltonian build is, in principle, always better than for diagonalisation.
In the context of plane-wave implementations [5] efficient parallelism of the DIIS algorithm has been achieved by splitting bands over processors and evaluating the products either locally (on one CPU) or on a local grid of processors. This approach is possible if the explicit storage of the Hamiltonian is not required (as in plane-wave and many real-space approaches [11], [12]). When using a Gaussian basis one is required to store the Hamiltonian explicitly therefore our parallelisation strategy is the same as ScaLAPACK and we formulate the problem to be heavily reliant on matrix–matrix products (parallel BLAS) to ensure efficient parallel performance and portability. It is also worth noting that the scaling with respect to the number of basis functions (n) is as opposed to for direct diagonalisation, which for high quality, large basis calculations is a significant improvement. To improve upon direct diagonalisation using a small number of basis functions requires very good preconditioning of the iterative subspace (see Section 4). The preconditioning and pre-rotation steps (Sections 4 Preconditioning, 5 Preconditioning improvement, respectively) are essential for both stable and rapid convergence. In the implementation presented here typically only 3–6 iterations are required to converge eigenvalues to within a absolute tolerance of 10−12 atomic units.
Although the DIIS algorithm has been used for a wide array of problems in electronic structure algorithms, such as geometry optimisation [13] and the non-linear self-consistent iteration [5], [14], there has been comparatively little discussion of the possibility of its use to perform the core diagonalisation step in a localised orbital code. The algorithm presented has been used on thousands of varied calculations in the past four years in the AIMPRO code [10] and has led to significant speed-ups across a wide array of computational platforms due to its reliance on highly efficient and portable level 3 parallel BLAS routines.
Section snippets
Background
In the Kohn–Sham (KS) scheme [2] m one-particle equations of the form require solution. Expanding the set in a general basis with coefficients such that results in the generalised Hermitian matrix eigenproblem of which the lowest m solutions are required. As depends on the density which in turn depends on the Kohn–Sham orbitals it must be iterated to self-consistency, thereby
RM-DIIS method
This method is central to our overall implementation and is based on the work of Wood and Zunger [17] who presented a plane-wave version. Subspace iterative methods aim to expand eigenvectors of interest in a subset of n-dimensional vectors (the ‘iterative subspace’). In the DIIS scheme for iteration number p and eigenvector i this set takes the form where the number of vectors in the expansion set depends on the number of iterations
Preconditioning
At the start of the kth iteration we have the approximate eigenpair and corresponding residuals . We wish to precondition this set of residuals to generate a good set of vectors to add to our iterative subspace. We may begin with the perfect vector increment to the jth eigenpair as follows rearranging gives Clearly, exact solution of this leads to the trivial solution
Preconditioning improvement
The method described thus far is an ideal, dense linear algebra, implementation. Virtually all computational effort is in the form of matrix–matrix multiplications. In practice, however, it is found an extra step is required to improve the preconditioning to obtain stable and rapid convergence without the need for orthogonalisation. Good preconditioning is crucial to the stability of the DIIS method. Poor preconditioning leads to a plethora of problems such as band duplication, band omission,
Numerical examples
To demonstrate the efficacy of this approach, we have produced detailed timings illustrating the speed and parallel scaling behaviour of the DIIS algorithm. The matrices concerned were produced when AIMPRO was used to model 217 atoms of silicon, arranged in a 216 atom simple cubic unit cell with a single self-interstitial atom added. The basis sets used included 40, 22 and 16 basis functions per atom, giving matrices of sizes 8680, 4774 and 3472, respectively. Results are presented in Table 2
Conclusion
An efficient iterative diagonalisation algorithm for use with dense matrices arising from electronic-structure calculations has been presented. It has been extensively used in the AIMPRO code [10] over the past four years on a large number and variety of systems and has proven to be very stable. Its main advantages over direct diagonalisation are improved serial speed, superior parallel scaling and a reduction of the scaling of the algorithm with respect to the number of basis functions to
Acknowledgements
The authors would like to thank The Engineering and Physical Sciences Research Council (EPSRC) for financial support.
References (17)
- et al.
Comput. Phys. Comm.
(1989) - et al.
J. Comput. Phys.
(2006) - et al.
Phys. Rev. B
(1964) - et al.
Phys. Rev. A
(1965) - et al.
J. Phys.: Condens. Matter
(2002) Phys. Rev. Lett.
(1991)- et al.
Phys. Rev. B
(1996) - et al.
Phys. Rev. E
(2006)
Cited by (158)
Mechanistic and experimental study of V<inf>x</inf>C<inf>y</inf> nitridation in N<inf>2</inf> atmosphere to prepare high-quality vanadium nitride
2024, Journal of Materials Research and TechnologyHyperfine interaction of H-divacancy in diamond
2020, Results in Physics