research-article

ChASE: a distributed hybrid CPU-GPU eigensolver for large-scale hermitian eigenvalue problems

Authors:

Davor Davidović,

Sebastian Achilles,

Edoardo Di NapoliAuthors Info & Claims

PASC '22: Proceedings of the Platform for Advanced Scientific Computing Conference

Article No.: 9, Pages 1 - 12

https://doi.org/10.1145/3539781.3539792

Published: 12 July 2022 Publication History

Abstract

As modern massively parallel clusters are getting larger with beefier compute nodes, traditional parallel eigensolvers, such as direct solvers, struggle keeping the pace with the hardware evolution and being able to scale efficiently due to additional layers of communication and synchronization. This difficulty is especially important when porting traditional libraries to heterogeneous computing architectures equipped with accelerators, such as Graphics Processing Unit (GPU). Recently, there have been significant scientific contributions to the development of filter-based subspace eigensolver to compute partial eigenspectrum. The simpler structure of these type of algorithms makes for them easier to avoid the communication and synchronization bottlenecks typical of direct solvers. The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern subspace eigensolver to compute partial extremal eigenpairs of large-scale Hermitian eigenproblems with the acceleration of a filter based on Chebyshev polynomials. In this work, we extend our previous work on ChASE by adding support for distributed hybrid CPU-multi-GPU computing architectures. Out tests show that ChASE achieves very good scaling performance up to 144 nodes with 526 NVIDIA A100 GPUs in total on dense eigenproblems of size up to 360k.

References

[1]

Edward C Anderson, Zhaojun Bai, Christian H. Bischof, L S Blackford, James W. Demmel, Jack Dongarra, Jeremy J Du Croz, Anne Greenbaum, A. Hammarling, S. McKenney, and Danny C Sorensen. 1999. {LAPACK} Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA.

[2]

Dominic Antonelli and Christof Vömel. 2005. PDSYEVR. ScaLAPACK's Parallel MRRR Algorithm for the Symmetric Eigenvalue Problem. LAPACK Working Note 168 (2005), 1--18. http://www.netlib.org/lapack/lawnspdf/lawn168.pdf

[3]

Steven F Ashby. 1991. Minimax Polynomial Preconditioning for Hermitian Linear Systems. SIAM J. Matrix Anal. Appl. 12, 4 (1991), 766--789.

Digital Library

[4]

Amartya S. Banerjee, Lin Lin, Wei Hu, Chao Yang, and John E. Pask. 2016. Chebyshev Polynomial Filtered Subspace Iteration in the Discontinuous Galerkin Method for Large-Scale Electronic Structure Calculations. The Journal of Chemical Physics 145, 15 (Oct. 2016), 154101.

[5]

Friedrich L. Bauer. 1957. Das Verfahren der Treppeniteration und verwandte Verfahren zur Lösung algebraischer Eigenwertprobleme. Zeitschrift für Angewandte Mathematik und Physik ZAMP 8, 3 (May 1957), 214--235.

Digital Library

[6]

Axel D Becke. 1993. A new Mixing of Hartree-Fock and Local Density-functional Theories. The Journal of chemical physics 98, 2 (1993), 1372--1377.

[7]

Mario Berljafa, Daniel Wortmann, and Edoardo Di Napoli. 2015. An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems. Concurrency and Computation: Practice and Experience 27 (Sept. 2015), 905--922.

Digital Library

[8]

Paolo Bientinesi, Inderjit S Dhillon, and Robert A Van De Geijn. 2005. A Parallel Eigensolver for Dense Symmetric Matrices based on Multiple Relatively Robust Representations. SIAM Journal on Scientific Computing 27, 1 (2005), 43--66.

Digital Library

[9]

J. Choi, J.J. Dongarra, R. Pozo, and D.W. Walker. 2003. ScaLAPACK: a Scalable Linear Algebra Library for Distributed Memory Concurrent Computers. In [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation. IEEE Comput. Soc. Press, Los Alamitos, CA, USA, 120--127.

[10]

Inderjit S Dhillon, Beresford N Parlett, and Christof Vömel. 2006. The Design and Implementation of the MRRR Algorithm. ACM Transactions on Mathematical Software (TOMS) 32, 4 (2006), 533--560.

Digital Library

[11]

ELPA. 2014. Eigenvalue Solvers for Petaflop-Applications (ELPA). https://elpa.mpcdf.mpg.de/

[12]

Takeshi Fukaya and Toshiyuki Imamura. 2015. Performance Evaluation of the Eigen Exa eigensolver on Oakleaf-FX: Tridiagonalization versus pentadiagonalization. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, 960--969.

Digital Library

[13]

Takeshi Fukaya, Toshiyuki Imamura, and Yusaku Yamamoto. 2018. A Case Study on Modeling the Performance of Dense Matrix Computation: Tridiagonalization in the Eigenexa Eigensolver on the K Computer. In Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018. Institute of Electrical and Electronics Engineers Inc., 1113--1122.

[14]

Mark Gates, Jakub Kurzak, Ali Charara, Asim Yarkhan, and Jack Dongarra. 2019. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society, New York, NY, USA, 1--18.

Digital Library

[15]

Robert T Gregory. 1978. A Collection of Matrices for Testing Computational Algorithms. Technical Report.

[16]

Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solcà, and Thomas Schulthess. 2014. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations based on Fine-grained Memory Aware Tasks. The International Journal of High Performance Computing Applications 28, 2 (2014), 196--209. arXiv:https://doi.org/10.1177/1094342013502097

Digital Library

[17]

Nicholas J Higham. 1991. Algorithm 694: A Collection of Test Matrices in MATLAB. ACM Transactions on Mathematical Software (TOMS) 17, 3 (1991), 289--305.

Digital Library

[18]

Tsutomu Ikegami, Tetsuya Sakurai, and Umpei Nagashima. 2010. A Filter Diagonalization for Generalized Eigenvalue Problems based on the Sakurai-Sugiura Projection Method. J. Comput. Appl. Math. 233, 8 (2010), 1927--1936.

Digital Library

[19]

Toshiyuki Imamura, Susumu Yamada, and Masahiko Machida. 2011. Development of a High Performance Eigensolver on the Petascale next Generation Supercomputer System. Progress in Nuclear Science and Technology 2 (2011), 643--650.

[20]

James Kestyn, Vasileios Kalantzis, Eric Polizzi, and Yousef Saad. 2016. PFEAST: a High Performance Sparse Eigenvalue Solver using Distributed-memory Linear Solvers. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 178--189.

[21]

Walter Kohn. 1999. Nobel Lecture: Electronic Structure of Matter---wave Functions and Density Functionals. Reviews of Modern Physics 71, 5 (1999), 1253.

[22]

Walter Kohn and Lu Jeu Sham. 1965. Self-consistent Equations including Exchange and Correlation Effects. Physical review 140, 4A (1965), A1133.

[23]

Antoine Levitt and Marc Torrent. 2015. Parallel Eigensolvers in Plane-Wave Density Functional Theory. Computer Physics Communications 187 (Feb. 2015), 98--105.

[24]

Lin Lin, Yousef Saad, and Chao Yang. 2016. Approximating Spectral Densities of Large Matrices. SIAM Rev. 58, 1 (2016), 34--65.

Digital Library

[25]

Andreas Marek, Volker Blum, Rainer Johanni, Ville Havu, Bruno Lang, Thomas Auckenthaler, Alexander Heinecke, H-J Bungartz, and Hermann Lederer. 2014. The ELPA library: Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science. Journal of Physics: Condensed Matter 26, 21 (5 2014), 213201.

[26]

Osni A Marques, Christof Vömel, James W Demmel, and Beresford N Parlett. 2008. Algorithm 880: A Testing Infrastructure for Symmetric Tridiagonal Eigensolvers. ACM Transactions on Mathematical Software (TOMS) 35, 1 (2008), 1--13.

Digital Library

[27]

Mirko Myllykoski and Carl Christian Kjelgaard Mikkelsen. 2021. Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems. Concurrency and Computation: Practice and Experience 33, 11 (2021), e5915. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.5915

[28]

NVIDIA. 2019. cuSOLVER :: CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cusolver/

[29]

NVIDIA. 2021. https://www.nvidia.com/en-us/data-center/grace-cpu. Accessed: 2021-10-06.

[30]

Eric Polizzi. 2009. Density-matrix-based Algorithm for Solving Eigenvalue Problems. Physical Review B 79, 11 (2009), 115112.

[31]

Jack Poulson, Bryan Marker, Robert A. Van De Geijn, Jeff R. Hammond, and Nichols A. Romero. 2013. Elemental: A New Framework for Distributed Memory Dense Matrix Computations. ACM Trans. Math. Software 39, 2 (2 2013), 1--24.

Digital Library

[32]

Heinz Rutishauser. 1970. Simultaneous Iteration Method for Symmetric Matrices. Numer. Math. 16, 3 (1970), 205--223.

Digital Library

[33]

Tetsuya Sakurai and Hiroshi Sugiura. 2003. A Projection Method for Generalized Eigenvalue Problems using Numerical Integration. Journal of computational and applied mathematics 159, 1 (2003), 119--128.

Digital Library

[34]

Ahmed H Sameh and David J Kuck. 1977. A Parallel QR Algorithm for Symmetric Tridiagonal Matrices. IEEE Trans. Comput. 100, 2 (1977), 147--153.

Digital Library

[35]

Heidi K Thornquist. 2006. Fixed-polynomial Approximate Spectral Transformations for Preconditioning the Eigenvalue Problem. Rice University.

[36]

Françoise Tisseur and Jack Dongarra. 1999. A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures. SIAM Journal on Scientific Computing 20, 6 (1999), 2223--2236.

Digital Library

[37]

Stanimire Tomov, Jack Dongarra, and Marc Baboulin. 2010. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems. Parallel Comput. 36, 5--6 (jun 2010), 232--240.

Digital Library

[38]

Field G Van Zee, Ernie Chan, Robert A Van de Geijn, Enrique S Quintana-Orti, and Gregorio Quintana-Orti. 2009. The libflame Library for Dense Matrix Computations. Computing in science & engineering 11, 6 (2009), 56--63.

[39]

Field G Van Zee and Robert A Van De Geijn. 2015. BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Transactions on Mathematical Software (TOMS) 41, 3 (2015), 1--33.

Digital Library

[40]

Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel Math Kernel Library. In High-Performance Computing on the Intel® Xeon Phi^™. Springer, 167--188.

[41]

David B. Williams-Young and Chao Yang. 2020. Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators. In 49th International Conference on Parallel Processing - ICPP. ACM, New York, NY, USA, 1--11.

Digital Library

[42]

Jan Winkelmann, Paul Springer, and Edoardo Di Napoli. 2019. ChASE: Chebyshev Accelerated Subspace Iteration Eigensolver for Sequences of Hermitian Eigenvalue Problems. ACM Trans. Math. Software 45, 2 (jun 2019), 1--34.

Digital Library

[43]

Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2012. OpenBLAS. URL: http://xianyi.github.io/OpenBLAS 88 (2012).

[44]

Victor Wen zhe Yu, Jonathan Moussa, Pavel Kůs, Andreas Marek, Peter Messmer, Mina Yoon, Hermann Lederer, and Volker Blum. 2021. GPU-acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigen-problems. Computer Physics Communications 262 (5 2021), 107808.

[45]

Xiao Zhang, Sebastian Achilles, Jan Winkelmann, Roland Haas, André Schleife, and Edoardo Di Napoli. 2021. Solving the Bethe-Salpeter Equation on Massively Parallel Architectures. Computer Physics Communications 267 (2021), 108081.

[46]

Yunkai Zhou, James R. Chelikowsky, and Yousef Saad. 2014. Chebyshev-Filtered Subspace Iteration Method Free of Sparse Diagonalization for Solving the Kohn-Sham Equation. J. Comput. Phys. 274 (Oct. 2014), 770--782.

Digital Library

[47]

Yunkai Zhou, Yousef Saad, Murilo L Tiago, and James R Chelikowsky. 2006. Parallel Self-Consistent-Field Calculations via Chebyshev-Filtered Subspace Acceleration. Physical Review E 74, 6 (2006), 066704.

Cited By

Lam JNakano AKatritch V(2024)Scalable computation of anisotropic vibrations for large macromolecular assembliesNature Communications10.1038/s41467-024-47685-815:1Online publication date: 24-Apr-2024
https://doi.org/10.1038/s41467-024-47685-8

Index Terms

ChASE: a distributed hybrid CPU-GPU eigensolver for large-scale hermitian eigenvalue problems
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
2. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance

Recommendations

Advancing the distributed Multi-GPU ChASE library through algorithm optimization and NCCL library
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

As supercomputers become larger with powerful Graphics Processing Unit (GPU), traditional direct eigensolvers struggle to keep up with the hardware evolution and scale efficiently due to communication and synchronization demands. Conversely, subspace ...
ChASE: Chebyshev Accelerated Subspace Iteration Eigensolver for Sequences of Hermitian Eigenvalue Problems

Solving dense Hermitian eigenproblems arranged in a sequence with direct solvers fails to take advantage of those spectral properties that are pertinent to the entire sequence and not just to the single problem. When such features take the form of ...
The upper and lower bounds for generalized minimal residual method on a tridiagonal Toeplitz linear system

The generalized minimal residual GMRES method is widely used to solve a linear system <inline-formula><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink="gcom_a_1009901_ilm0001.gif"/></inline-formula>. This paper establishes upper and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PASC '22: Proceedings of the Platform for Advanced Scientific Computing Conference

June 2022

181 pages

ISBN:9781450394109

DOI:10.1145/3539781

Conference Chair:
Timothy Robinson
ETH Zurich/CSCS

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

CSCS: Swiss National Supercomputing Centre

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Akademische Austauschdienst (DAAD)
Croatian Science Foundation
PRACE-6IP WP8

Conference

PASC '22

Sponsor:

SIGHPC

PASC '22: Platform for Advanced Scientific Computing Conference

June 27 - 29, 2022

Basel, Switzerland

Acceptance Rates

PASC '22 Paper Acceptance Rate 17 of 22 submissions, 77%;

Overall Acceptance Rate 109 of 221 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
103
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lam JNakano AKatritch V(2024)Scalable computation of anisotropic vibrations for large macromolecular assembliesNature Communications10.1038/s41467-024-47685-815:1Online publication date: 24-Apr-2024
https://doi.org/10.1038/s41467-024-47685-8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents