skip to main content
10.1145/2807591.2807654acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free access

Efficient implementation of quantum materials simulations on distributed CPU-GPU systems

Published: 15 November 2015 Publication History

Abstract

We present a scalable implementation of the Linearized Augmented Plane Wave method for distributed memory systems, which relies on an efficient distributed, block-cyclic setup of the Hamiltonian and overlap matrices and allows us to turn around highly accurate 1000+ atom all-electron quantum materials simulations on clusters with a few hundred nodes. The implementation runs efficiently on standard multi-core CPU nodes, as well as hybrid CPU-GPU nodes. The key for the latter is a novel algorithm to solve the generalized eigenvalue problem for dense, complex Hermitian matrices on distributed hybrid CPU-GPU systems. Performance tests for Li-intercalated CoO2 supercells containing 1501 atoms demonstrate that high-accuracy, transferable quantum simulations can now be used in throughput materials search problems. While our application can benefit and get scalable performance through CPU-only libraries like ScaLAPACK or ELPA2, our new hybrid solver enables the efficient use of GPUs and shows that a hybrid CPU-GPU architecture scales to a desired performance using substantially fewer cluster nodes, and notably, is considerably more energy efficient than the traditional multi-core CPU only systems for such complex applications.

References

[1]
J. Aasen. On the reduction of a symmetric matrix to tridiagonal form. BIT Numerical Mathematics, 11(3):233--242, 1971.
[2]
O. K. Andersen. Linear methods in band theory. Phys. Rev. B, 12(8):3060--3083, Oct 1975.
[3]
T. Auckenthaler, V. Blum, H.-J. Bungartz, T. Huckle, R. Johanni, L. Krämer, B. Lang, H. Lederer, and P. Willems. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Computing, 2011.
[4]
T. Auckenthaler, V. Blum, H. J. Bungartz, T. Huckle, R. Johanni, L. Krämer, B. Lang, H. Lederer, and P. R. Willems. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput., 37(12):783--794, Dec. 2011.
[5]
S. R. Bahn and K. W. Jacobsen. An object-oriented scripting interface to a legacy electronic structure code. Comput. Sci. Eng., 4(3):56--66, MAY--JUN 2002.
[6]
P. Bientinesi, F. D. Igual, D. Kressner, and E. S. Quintana-Ortí. Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures. In Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, PPAM'09, pages 387--395, Berlin, Heidelberg, 2010. Springer-Verlag.
[7]
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1997.
[8]
P. Blaha, H. Hofstätter, O. Koch, R. Laskowski, and K. Schwarz. Iterative diagonalization in augmented plane wave based methods in electronic structure calculations. Journal of Computational Physics, 229(2):453--460, Jan. 2010.
[9]
P. E. Blöchl. Projector augmented-wave method. Phys. Rev. B, 50:17953--17979, Dec 1994.
[10]
J. J. M. Cuppen. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numerische Mathematik, 36(2):177--195, 1980.
[11]
J. J. Dongarra and D. C. Sorensen. A fully parallel algorithm for the symmetric eigenvalue problem. SIAM J. Sci. Stat. Comput., 8(2):139--154, Mar. 1987.
[12]
J. J. Dongarra, D. C. Sorensen, and S. J. Hammarling. Block reduction of matrices to condensed forms for eigenvalue computations. Journal of Computational and Applied Mathematics, 27(1-2):215--227, 1989.
[13]
G. Fourestey, B. Cumming, L. Gilly, and T. C. Schulthess. First experiences with validating and using the Cray power management database tool. Proceedings of CUG Meeting, 2014. (reprint available on arxiv.org under arXiv:1408.2657).
[14]
W. Gansterer, D. Kvasnicka, and C. Ueberhuber. Multi-sweep algorithms for the symmetric eigenproblem. In Vector and Parallel Processing - VECPAR'98, volume 1573 of Lecture Notes in Computer Science, pages 20--28. Springer, 1999.
[15]
K. Gates and P. Arbenz. Parallel divide and conquer algorithms for the symmetric tridiagonal eigenproblem, 1994.
[16]
G. H. Golub and C. F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996.
[17]
J. Grotendorst, S. Blügel, and D. Marx, editors. Computational Nanoscience: Do It Yourself!, volume 31 of NIC serie. Forschungszentrum Jülich, 2006.
[18]
A. Haidar, J. Kurzak, and P. Luszczek. An improved parallel singular value algorithm and its implementation for multicore hardware. In SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, Colorado, USA, November 17-22 2013.
[19]
A. Haidar, H. Ltaief, and J. Dongarra. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. In Proceedings of SC '11, pages 8:1--8:11, New York, NY, USA, 2011. ACM.
[20]
A. Haidar, H. Ltaief, P. Luszczek, and J. Dongarra. A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, May 21-25 2012. ISBN 978-1-4673-0975-2.
[21]
A. Haidar, R. Solcà, M. Gates, S. Tomov, T. C. Schulthess, and J. Dongarra. Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations. In Supercomputing, pages 67--80. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013.
[22]
A. Haidar, S. Tomov, J. Dongarra, R. Solcà, and T. Schulthess. A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine grained memory aware tasks. International Journal of High Performance Computing Applications, August 2013.
[23]
P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Phys. Rev., 136(3B):B864--B871, Nov 1964.
[24]
L. C. F. Ipsen and E. R. Jessup. Solving the symmetric tridiagonal eigenvalues problem on the hypercube. SIAM J. Sci. Stat. Comput., 11:203--229, March 1990.
[25]
W. Jia, J. Fu, Z. Cao, L. Wang, X. Chi, W. Gao, and L.-W. Wang. Fast plane wave density functional theory molecular dynamics calculations on multi-gpu machines. Journal of Computational Physics, 251(0):102--115, 2013.
[26]
P. R. C. Kent. Computational challenges of large-scale, long-time, first-principles molecular dynamics. Journal of Physics: Conference Series, 125(1):012058, 2008.
[27]
W. Kohn. Density functional and density matrix method scaling linearly with the number of atoms. Phys. Rev. Lett., 76:3168--3171, Apr 1996.
[28]
W. Kohn and L. J. Sham. Self-consistent equations including exchange and correlation effects. Phys. Rev., 140(4A):A1133--A1138, Nov 1965.
[29]
B. Lang. Efficient eigenvalue and singular value computations on shared memory machines. Parallel Computing, 25(7):845--860, 1999.
[30]
K. Lejaeghere, V. Van Speybroeck, G. Van Oost, and S. Cottenier. Error Estimates for Solid-State Density-Functional Theory Predictions: An Overview by Means of the Ground-State Elemental Crystals. Critical Reviews in Solid State & Materials Sciences, 39:1--24, Jan. 2014.
[31]
Q. Lin, Q. Li, K. E. Gray, and J. F. Mitchell. Vapor growth and chemical delithiation of stoichiometric LiCoO2 crystals. Crystal Growth and Design, 12(3):1232--1238, 2012.
[32]
H. Ltaief, P. Luszczek, A. Haidar, and J. Dongarra. Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction. In R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, editors, Proceedings of 9th International Conference, PPAM 2011, volume 7203, pages 661--670, Torun, Poland, 2012.
[33]
A. Marek, V. Blum, R. Johanni, V. Havu, B. Lang, T. Auckenthaler, A. Heinecke, H.-J. Bungartz, and H. Lederer. The elpa library - scalable parallel eigenvalue solutions for electronic structure theory and computational science. Psi-K Research Highlight, 2014(1), Jan. 2014.
[34]
S. P. Ong, W. D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V. L. Chevrier, K. A. Persson, and G. Ceder. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68(0):314--319, 2013.
[35]
B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1998.
[36]
J. P. Perdew, A. Ruzsinszky, J. Tao, V. N. Staroverov, G. E. Scuseria, and G. I. Csonka. Prescription for the design and selection of density functional approximations: more constraint satisfaction with fewer fits. The Journal of Chemical Physics, 123(6):62201--62201, Aug. 2005.
[37]
E. Prodan and W. Kohn. Nearsightedness of electronic matter. Proceedings of the National Academy of Sciences of the United States of America, 102(33):11635--11638, 2005.
[38]
J. Rutter and J. D. Rutter. A serial implementation of cuppen's divide and conquer algorithm for the symmetric eigenvalue problem, 1994.
[39]
D. J. Singh. Planewaves, pseudopotentials and the LAPW method. Kluwer, Boston, MA, 1994.
[40]
D. C. Sorensen and P. T. P. Tang. On the orthogonality of eigenvectors computed by divide-and-conquer techniques. SIAM J. Numer. Anal., 28(6):1752--1775, 1991.
[41]
F. Tisseur and J. Dongarra. Parallelizing the divide and conquer algorithm for the symmetric tridiagonal eigenvalue problem on distributed memory architectures. SIAM J. SCI. COMPUT, 20:2223--2236, 1998.
[42]
D. Vanderbilt. Soft self-consistent pseudopotentials in a generalized eigenvalue formalism. Phys. Rev. B, 41:7892--7895, Apr 1990.

Cited By

View all
  • (2024)DLA-Future: A Task-Based Linear Algebra Library Which Provides a GPU-Enabled Distributed EigensolverAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_13(135-141)Online publication date: 14-Feb-2024
  • (2023)Efficient variable cell shape geometry optimizationJournal of Computational Physics: X10.1016/j.jcpx.2023.100131(100131)Online publication date: Jul-2023
  • (2022)Performance Enhancement of APW+lo Calculations by Simplest Separation of ConcernsComputation10.3390/computation1003004310:3(43)Online publication date: 16-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2015
985 pages
ISBN:9781450337236
DOI:10.1145/2807591
  • General Chair:
  • Jackie Kern,
  • Program Chair:
  • Jeffrey S. Vetter
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC15
Sponsor:

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)15
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DLA-Future: A Task-Based Linear Algebra Library Which Provides a GPU-Enabled Distributed EigensolverAsynchronous Many-Task Systems and Applications10.1007/978-3-031-61763-8_13(135-141)Online publication date: 14-Feb-2024
  • (2023)Efficient variable cell shape geometry optimizationJournal of Computational Physics: X10.1016/j.jcpx.2023.100131(100131)Online publication date: Jul-2023
  • (2022)Performance Enhancement of APW+lo Calculations by Simplest Separation of ConcernsComputation10.3390/computation1003004310:3(43)Online publication date: 16-Mar-2022
  • (2020)DRAM-Less: Hardware Acceleration of Data Processing with New Memory2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00032(287-302)Online publication date: Feb-2020
  • (2019)Red-blue pebbling revisitedProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356181(1-22)Online publication date: 17-Nov-2019
  • (2018)Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.284278529:12(2700-2712)Online publication date: 1-Dec-2018
  • (2018)Hierarchical Domain Representation in the AlgoWiki Encyclopedia: From Problems to ImplementationsParallel Computational Technologies10.1007/978-3-319-99673-8_1(3-15)Online publication date: 26-Aug-2018
  • (2018)Hybrid Parallelization and Performance Optimization of the FLEUR Code: New Possibilities for All-Electron Density Functional TheoryEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_52(735-748)Online publication date: 1-Aug-2018
  • (2018)Accelerating the computation of FLAPW methods on heterogeneous architecturesConcurrency and Computation: Practice and Experience10.1002/cpe.490530:24Online publication date: 31-Aug-2018
  • (2017)Exploiting In-Memory Processing Capabilities for Density Functional Theory ApplicationsEuro-Par 2016: Parallel Processing Workshops10.1007/978-3-319-58943-5_60(750-762)Online publication date: 28-May-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media