Rapid iterative method for electronic-structure eigenproblems using localised basis functions

doi:10.1016/j.cpc.2007.08.007

Computer Physics Communications

Volume 178, Issue 2, 15 January 2008, Pages 128-134

https://doi.org/10.1016/j.cpc.2007.08.007 Get rights and content

Abstract

Eigenproblems resulting from the use of localised basis functions (typically Gaussian or Slater type orbitals) in density functional electronic-structure calculations are often solved using direct linear algebra. A full implementation is presented built around an iterative method known as ‘residual minimisation—direct inversion of the iterative subspace’ (RM-DIIS) to be used to solve many similar eigenproblems in a self-consistency cycle. The method is more efficient than direct methods and exhibits superior scaling on parallel supercomputers.

Introduction

Density functional theory, using the Kohn–Sham formalism [1], [2], has been very successful in investigating the properties of condensed matter systems. In state-of-the-art localised orbital electronic-structure codes diagonalisation (cubically scaling) typically becomes the time dominant part of the calculation for more than around 100 atoms due to the Hamiltonian evaluation being asymptotically linear scaling with respect to system size. The following algorithm is primarily aimed at conventional (cubic scaling) electronic-structure codes that use such localised orbitals whether they be Gaussian, Slater or more recently numerical [3]. Often, the number of basis functions per atom lies in the range 10–20 in a typical pseudopotential calculation, with the result that as many as 20% of the eigenpairs¹ are required to construct the density matrix. Although much effort has gone into linear scaling methods in the last few decades conventional algorithms are still the most widely used. Even when the use of linear scaling algorithms becomes widespread the cross-over point in the size of the system compared to conventional localised orbital codes is still rather unclear. For dense three-dimensional (cubic) systems, even wide-gap insulators, the cross-over point will almost certainly be above 500 atoms, and more for metals. Therefore, there still remains a huge amount of science to be done below these system sizes needing ever more improvements in the efficiency of the conventional algorithm. Furthermore, the linear scaling divide and conquer algorithm [4] relies on many separate conventional calculations, so improvements to these on the scale of a hundred to a few thousand atoms may also be useful for linear scaling approaches. Clearly, there is still a great deal to be gained in the improvement of the kernel of conventional algorithms.

In this work, we present an iterative algorithm and compare it directly to state-of-the-art linear algebra routines. So, for a given potential the eigenproblem is solved completely with no intermediate orthogonalisation or change to the potential. As it stands, therefore, the following implementation can be simply used replace a call to a direct diagonalisation routine. Presenting the algorithm in this way allows for a transparent comparison of the time-to-solution for an eigenproblem as it can be easily compared to well-known and widely used linear algebra software. However, we realise it may well be the case that fully solving each eigenproblem is not the most efficient overall implementation. If a potential update is negligible compared to a diagonalisation, as is typically the case for systems of more than around 100 atoms when using a Gaussian basis, then a trivial alteration to the algorithm (limiting the number of internal DIIS iterations, reorthogonalising and updating the many-body potential) is, of course, possible. Many such iterative schemes for the solution of the non-linear self-consistent eigenproblem have been presented (usually within the framework of large systematic basis sets which generate sparse matrices) [5], [6], [7]. For a more general discussion of iterative methods for large eigenvalue problems see Ref. [8]. Iterative methods for the solution of linear equations involving non-sparse matrices have also been presented [9].

Diagonalisation very rapidly becomes time dominant when using a Gaussian basis. Table 1 gives a stark demonstration of this using the AIMPRO code [10], it compares the time needed to evaluate the Hamiltonian matrix elements (the Hamiltonian build) to diagonalisation. These timings are least favourable to the Hamiltonian build for two main reasons. Firstly, a contracted basis set is used with only 13 functions per atom and the Hamiltonian build scales linearly with the number of basis functions whereas diagonalisation scales cubically. Secondly, cubic cells are used therefore the ratio of non-zero matrix-elements to number of atoms is high. Also, the parallel scaling of the Hamiltonian build is, in principle, always better than for diagonalisation.

In the context of plane-wave implementations [5] efficient parallelism of the DIIS algorithm has been achieved by splitting bands over processors and evaluating the $H | ψ 〉$ products either locally (on one CPU) or on a local grid of processors. This approach is possible if the explicit storage of the Hamiltonian is not required (as in plane-wave and many real-space approaches [11], [12]). When using a Gaussian basis one is required to store the Hamiltonian explicitly therefore our parallelisation strategy is the same as ScaLAPACK and we formulate the problem to be heavily reliant on matrix–matrix products (parallel BLAS) to ensure efficient parallel performance and portability. It is also worth noting that the scaling with respect to the number of basis functions (n) is $O (n^{2})$ as opposed to $O (n^{3})$ for direct diagonalisation, which for high quality, large basis calculations is a significant improvement. To improve upon direct diagonalisation using a small number of basis functions requires very good preconditioning of the iterative subspace (see Section 4). The preconditioning and pre-rotation steps (Sections 4 Preconditioning, 5 Preconditioning improvement, respectively) are essential for both stable and rapid convergence. In the implementation presented here typically only 3–6 iterations are required to converge eigenvalues to within a absolute tolerance of 10⁻¹² atomic units.

Although the DIIS algorithm has been used for a wide array of problems in electronic structure algorithms, such as geometry optimisation [13] and the non-linear self-consistent iteration [5], [14], there has been comparatively little discussion of the possibility of its use to perform the core diagonalisation step in a localised orbital code. The algorithm presented has been used on thousands of varied calculations in the past four years in the AIMPRO code [10] and has led to significant speed-ups across a wide array of computational platforms due to its reliance on highly efficient and portable level 3 parallel BLAS routines.

Section snippets

Background

In the Kohn–Sham (KS) scheme [2] m one-particle equations of the form $[- \frac{1}{2} \nabla^{2} + V_{eff} (r)] ψ_{i}^{KS} (r) = λ_{i} ψ_{i}^{KS} (r)$ require solution. Expanding the set ${ψ_{i}^{KS} (r)} [1, m]$ in a general basis ${ϕ_{j} (r)} [1, n]$ with coefficients ${ψ_{i}^{j}} [i = 1, m] [j = 1, n]$ such that $ψ_{i}^{KS} (r) = \sum_{j}^{n} ψ_{i}^{j} ϕ_{j} (r)$ results in the generalised Hermitian matrix eigenproblem $H | ψ_{i} 〉 = λ_{i} S | ψ_{i} 〉$ of which the lowest m solutions are required. As $V_{eff} (r)$ depends on the density which in turn depends on the Kohn–Sham orbitals it must be iterated to self-consistency, thereby

RM-DIIS method

This method is central to our overall implementation and is based on the work of Wood and Zunger [17] who presented a plane-wave version. Subspace iterative methods aim to expand eigenvectors of interest in a subset of n-dimensional vectors (the ‘iterative subspace’). In the DIIS scheme for iteration number p and eigenvector i this set takes the form ${| ξ_{i} 〉} [p = 1, \dots, n_{x}] = [| δ ψ_{i}^{0} 〉, | δ ψ_{i}^{(1)} 〉, | δ ψ_{i}^{(2)} 〉, \dots, | δ ψ_{i}^{(n_{x})} 〉],$ where the number of vectors in the expansion set $n_{x}$ depends on the number of iterations

Preconditioning

At the start of the kth iteration we have the approximate eigenpair ${| ψ_{j}^{(k)} 〉, λ_{j}^{(k)}}$ and corresponding residuals ${| R_{j}^{(k)} 〉}$ . We wish to precondition this set of residuals to generate a good set of vectors ${| δ ψ_{j}^{(k + 1)} 〉}$ to add to our iterative subspace. We may begin with the perfect vector increment to the jth eigenpair as follows $(H - λ_{j}^{(k)} S) | ψ_{j}^{(k)} + δ ψ_{j}^{(k + 1)} 〉 = | R_{j}^{(k)} 〉 + (H - λ_{j}^{(k)} S) | δ ψ_{j}^{(k + 1)} 〉 = 0$ rearranging gives $| δ ψ_{j}^{(k + 1)} 〉 = - {(H - λ_{j}^{(k)} S)}^{−1} | R_{j}^{(k)} 〉 .$ Clearly, exact solution of this leads to the trivial solution ${|$

Preconditioning improvement

The method described thus far is an ideal, dense linear algebra, implementation. Virtually all computational effort is in the form of matrix–matrix multiplications. In practice, however, it is found an extra step is required to improve the preconditioning to obtain stable and rapid convergence without the need for orthogonalisation. Good preconditioning is crucial to the stability of the DIIS method. Poor preconditioning leads to a plethora of problems such as band duplication, band omission,

Numerical examples

To demonstrate the efficacy of this approach, we have produced detailed timings illustrating the speed and parallel scaling behaviour of the DIIS algorithm. The matrices concerned were produced when AIMPRO was used to model 217 atoms of silicon, arranged in a 216 atom simple cubic unit cell with a single self-interstitial atom added. The basis sets used included 40, 22 and 16 basis functions per atom, giving matrices of sizes 8680, 4774 and 3472, respectively. Results are presented in Table 2

Conclusion

An efficient iterative diagonalisation algorithm for use with dense matrices arising from electronic-structure calculations has been presented. It has been extensively used in the AIMPRO code [10] over the past four years on a large number and variety of systems and has proven to be very stable. Its main advantages over direct diagonalisation are improved serial speed, superior parallel scaling and a reduction of the scaling of the algorithm with respect to the number of basis functions to

Acknowledgements

The authors would like to thank The Engineering and Physical Sciences Research Council (EPSRC) for financial support.

References (17)

C. Duneczky et al.
Comput. Phys. Comm.
(1989)
Y.K. Zhou et al.
J. Comput. Phys.
(2006)
P. Hohenberg et al.
Phys. Rev. B
(1964)
W. Kohn et al.
Phys. Rev. A
(1965)
J.M. Soler et al.
J. Phys.: Condens. Matter
(2002)
W. Yang
Phys. Rev. Lett.
(1991)
G. Kresse et al.
Phys. Rev. B
(1996)
Y.K. Zhou et al.
Phys. Rev. E
(2006)

There are more references available in the full text version of this article.

Cited by (158)

Mechanism of hydrogenation and dehydrogenation in Mg/Cu<inf>9</inf>Al<inf>4</inf> @Mg and MgH<inf>2</inf>/Cu<inf>9</inf>Al<inf>4</inf> @MgH<inf>2</inf>: A DFT and experimental investigation
2024, Journal of Alloys and Compounds
The Cu₉Al₄ intermetallic has been found to enhance the hydrogenation and dehydrogenation capabilities of Mg. However, its impact on kinetics is unclear. In the present work, the hydrogenated and dehydrogenated mechanism were investigated by experimental and theoretical methods, including Density Functional Theory (DFT) and ab initio molecular dynamic (AIMD) simulations. Experimental results show that the addition of Cu₉Al₄ IMC can promote the hydrogenation and dehydrogenation kinetics of Mg, which the dehydrogenation rate of Cu₉Al₄ @MgH₂ is 3 times that of MgH₂. AIMD simulations illustrate that the electron properties of Cu₉Al₄ @Mg and Cu₉Al₄ @MgH₂ are different from that of pure Mg and MgH₂, which lead to the change of hydrogenated and dehydrogenated mechanism in Mg/Cu₉Al₄ @Mg and MgH₂/Cu₉Al₄ @MgH₂. Moreover, the diffusion coefficient of H atoms in Cu₉Al₄ @MgH₂ is 4 times that in MgH₂. The addition of Cu9Al4 IMC decreases the particle size of Cu₉Al₄ @Mg during ball milling and induces lattice distortion of Cu₉Al₄ @MgH₂ during the dehydrogenation process. These changes may contribute to the enhancement of the dehydrogenation rate in Cu₉Al₄ @MgH₂. This work complements the mechanism of IMC for improving the performance of hydrogen storage materials and provides a theoretical basis for the development of next-generation solid-state hydrogen storage materials.
Mechanistic and experimental study of V<inf>x</inf>C<inf>y</inf> nitridation in N<inf>2</inf> atmosphere to prepare high-quality vanadium nitride
2024, Journal of Materials Research and Technology
Vanadium nitride (VN) serves as a versatile material in various applications such as electrodes, ceramics, catalysts, and alloy components owing to its robust electrical performance, favorable stability, and high hardness. However, conventional VN synthesis methods require long soaking times and high temperatures. This study investigates a method wherein V_xC_y powder was soaked in an N₂ atmosphere for a shorter period of time at a lower temperature to produce high-quality VN particles. The mechanism underlying this process was investigated via thermodynamic analyses, ab initio molecular dynamics (AIMD) simulations, and experimental studies. Thermodynamic analysis validates the thermodynamic feasibility of the reactions between V_xC_y and N₂ to produce VN, while the atomic-level interactions between V_xC_y, N₂, and surplus C atoms were comprehensively elucidated. By analyzing the two systems, it was discovered that the surplus C promoted the combination of N₂ molecules with V atoms to form V–N in shorter AIMD simulation durations. Surplus C increased the chance of N₂ molecules and V atoms to contact, similar to a catalytic effect, by activating N₂ molecules to decompose and further reacted with V atoms to form VN. According to the experimental results, high-quality VN particles were formed at a heating temperature of 1323 K and soaking period of 90 min. Field-emission scanning electron microscopy (SEM) coupled with energy-dispersive spectrometry (EDS), particle size distribution analysis, and X-ray diffraction (XRD) were used to characterize the final product. The O and N contents were measured by an O/N analyzer at 0.11% and 17.4%, respectively. Accordingly, a C/S analyzer measured the C content at 3.92%. These analyses indicate that high-quality VN can be obtained by the V_xC_y nitridation in N₂ atmosphere.
Synthesis of ultrafine VN particles by vanadium trioxide reduction–nitridation in a CH<inf>4</inf>–N<inf>2</inf> atmosphere: Mechanistic and experimental study
2023, Ceramics International
Owing to its high hardness, favourable stability, and good electrical performance, vanadium nitride (VN) is extensively used as an alloy additive, electrode material, ceramic material, and catalyst. However, the traditional synthesis methods for VN require high temperatures and long soaking times. Herein, ultrafine VN particles were prepared at a lower temperature and shorter soaking time from V₂O₃ powder in the presence of a CH₄–N₂ gas mixture. Thermodynamic analyses, ab initio molecular dynamics (AIMD) simulations, and experimental investigations were performed to investigate the mechanisms of this process. Thermodynamic analysis showed that V₂O₃ reacts with the CH₄–N₂ gas mixture under atmospheric pressure at a lower temperature to produce VN. The interactions between CH₄, V₂O₃, and N₂ at the atomic level were systematically clarified, and the important role of N₂ molecules was determined using AIMD simulations. N₂ not only reacts with V₂O₃ to form V–N bonds but also acts as a medium for adsorbed CH₄ molecules to decompose and act as a further reducing agent. The experimental results showed that ultrafine VN particles were obtained at a heating temperature of 1173 K, soaking time of 1.5 h, CH₄:N₂ molar ratio of 1:2, and gas flow rate of 0.9 L/min. The final product was characterised by X-ray diffraction, field-emission scanning electron microscopy, transmission electron microscopy, and energy dispersive X-ray spectroscopy. The O and N content measured by an O/N analyser is 0.34% and 16.2%, respectively, and the C content by a C/S analyser is 4.87%.
Hyperfine interaction of H-divacancy in diamond
2020, Results in Physics
We present a first principles density functional theory study of microscopic properties of hydrogen defect centres in diamond. Several configurations, involving interstitial hydrogen impurities, have been considered either forming with other defects, such as hydrogen defects and vacancies. The atomic structures, and hyperfine parameters of hydrogen result compared with the experimental data on electrically active centres in synthetic diamond. Based on Local density functional theory our calculations are in excellent agreement with one interpretation of electron paramagnetic resonance of hydrogen in diamond.
Polyiodide structures in thin single-walled carbon nanotubes: A large-scale density-functional study
2019, Carbon
Using automatic structure generation algorithms and large-scale density functional computations we study polyiodide structures encapsulated within a 1 nm diameter single-walled carbon nanotube. The most energetically preferable confined iodine structures are the I₃⁻, I₅⁻ and I₈²⁻ molecular anions and periodic single, double and triple chain systems. The formation energy drops with increasing number of iodine atoms, reaching a minimum for a single iodine chain. Double and triple chains are metastable but have higher formation energies due to spatial confinement within the thin carbon nanotube. The calculated electron transfer from the nanotube to the molecular structures is close to the integer charge values of the molecular anions. The corresponding Fermi energy shift depends on the iodine concentration. For the single, double and triple chains the calculated Fermi shift is ∼0.13, ∼0.23 and ∼0.19 eV below the top of the nanotube valence band, respectively. The computational approaches presented here require minimal a priori knowledge of the system under study, yet are able to predict stable iodine structures observed in experiment.
Mapping the stacking interaction of triphenyl vinylene oligomers with graphene and carbon nanotubes
2019, Carbon
Using density functional calculations, we explore π-π and π-H stacking interactions in conjugated polymer – carbon nanomaterial composites, through a detailed surface energy mapping procedure. Taking the triphenyl vinylene oligomer as a structural model for poly para-phenylene vinylene (PPV), we map intermolecular stacking configurations between PPV pairs. We then map PPV translation over the surface of graphene, and explore orientation dependence in PPV-nanotube interaction with tubes of different chirality and diameter. Preferential stacking orientations are shown, and commensurability between the polymer and carbon substrate is demonstrated. Conjugated polymers such as PPV lie at the crossroads between small organic molecules such as benzene and extended conjugated carbon systems such as graphene. With the current calculations we are able to reconcile the range of stacking behaviours seen in these diverse materials, with literature experimental polarized Raman spectroscopy results.

View all citing articles on Scopus

View full text

Rapid iterative method for electronic-structure eigenproblems using localised basis functions

Abstract

Introduction

Section snippets

Background

RM-DIIS method

Preconditioning

Preconditioning improvement

Numerical examples

Conclusion

Acknowledgements

Comput. Phys. Comm.

J. Comput. Phys.

Phys. Rev. B

Phys. Rev. A

J. Phys.: Condens. Matter

Phys. Rev. Lett.

Phys. Rev. B

Phys. Rev. E