skip to main content
10.1145/3539781.3539792acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article

ChASE: a distributed hybrid CPU-GPU eigensolver for large-scale hermitian eigenvalue problems

Published: 12 July 2022 Publication History

Abstract

As modern massively parallel clusters are getting larger with beefier compute nodes, traditional parallel eigensolvers, such as direct solvers, struggle keeping the pace with the hardware evolution and being able to scale efficiently due to additional layers of communication and synchronization. This difficulty is especially important when porting traditional libraries to heterogeneous computing architectures equipped with accelerators, such as Graphics Processing Unit (GPU). Recently, there have been significant scientific contributions to the development of filter-based subspace eigensolver to compute partial eigenspectrum. The simpler structure of these type of algorithms makes for them easier to avoid the communication and synchronization bottlenecks typical of direct solvers. The Chebyshev Accelerated Subspace Eigensolver (ChASE) is a modern subspace eigensolver to compute partial extremal eigenpairs of large-scale Hermitian eigenproblems with the acceleration of a filter based on Chebyshev polynomials. In this work, we extend our previous work on ChASE by adding support for distributed hybrid CPU-multi-GPU computing architectures. Out tests show that ChASE achieves very good scaling performance up to 144 nodes with 526 NVIDIA A100 GPUs in total on dense eigenproblems of size up to 360k.

References

[1]
Edward C Anderson, Zhaojun Bai, Christian H. Bischof, L S Blackford, James W. Demmel, Jack Dongarra, Jeremy J Du Croz, Anne Greenbaum, A. Hammarling, S. McKenney, and Danny C Sorensen. 1999. {LAPACK} Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[2]
Dominic Antonelli and Christof Vömel. 2005. PDSYEVR. ScaLAPACK's Parallel MRRR Algorithm for the Symmetric Eigenvalue Problem. LAPACK Working Note 168 (2005), 1--18. http://www.netlib.org/lapack/lawnspdf/lawn168.pdf
[3]
Steven F Ashby. 1991. Minimax Polynomial Preconditioning for Hermitian Linear Systems. SIAM J. Matrix Anal. Appl. 12, 4 (1991), 766--789.
[4]
Amartya S. Banerjee, Lin Lin, Wei Hu, Chao Yang, and John E. Pask. 2016. Chebyshev Polynomial Filtered Subspace Iteration in the Discontinuous Galerkin Method for Large-Scale Electronic Structure Calculations. The Journal of Chemical Physics 145, 15 (Oct. 2016), 154101.
[5]
Friedrich L. Bauer. 1957. Das Verfahren der Treppeniteration und verwandte Verfahren zur Lösung algebraischer Eigenwertprobleme. Zeitschrift für Angewandte Mathematik und Physik ZAMP 8, 3 (May 1957), 214--235.
[6]
Axel D Becke. 1993. A new Mixing of Hartree-Fock and Local Density-functional Theories. The Journal of chemical physics 98, 2 (1993), 1372--1377.
[7]
Mario Berljafa, Daniel Wortmann, and Edoardo Di Napoli. 2015. An Optimized and Scalable Eigensolver for Sequences of Eigenvalue Problems. Concurrency and Computation: Practice and Experience 27 (Sept. 2015), 905--922.
[8]
Paolo Bientinesi, Inderjit S Dhillon, and Robert A Van De Geijn. 2005. A Parallel Eigensolver for Dense Symmetric Matrices based on Multiple Relatively Robust Representations. SIAM Journal on Scientific Computing 27, 1 (2005), 43--66.
[9]
J. Choi, J.J. Dongarra, R. Pozo, and D.W. Walker. 2003. ScaLAPACK: a Scalable Linear Algebra Library for Distributed Memory Concurrent Computers. In [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation. IEEE Comput. Soc. Press, Los Alamitos, CA, USA, 120--127.
[10]
Inderjit S Dhillon, Beresford N Parlett, and Christof Vömel. 2006. The Design and Implementation of the MRRR Algorithm. ACM Transactions on Mathematical Software (TOMS) 32, 4 (2006), 533--560.
[11]
ELPA. 2014. Eigenvalue Solvers for Petaflop-Applications (ELPA). https://elpa.mpcdf.mpg.de/
[12]
Takeshi Fukaya and Toshiyuki Imamura. 2015. Performance Evaluation of the Eigen Exa eigensolver on Oakleaf-FX: Tridiagonalization versus pentadiagonalization. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, 960--969.
[13]
Takeshi Fukaya, Toshiyuki Imamura, and Yusaku Yamamoto. 2018. A Case Study on Modeling the Performance of Dense Matrix Computation: Tridiagonalization in the Eigenexa Eigensolver on the K Computer. In Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018. Institute of Electrical and Electronics Engineers Inc., 1113--1122.
[14]
Mark Gates, Jakub Kurzak, Ali Charara, Asim Yarkhan, and Jack Dongarra. 2019. SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society, New York, NY, USA, 1--18.
[15]
Robert T Gregory. 1978. A Collection of Matrices for Testing Computational Algorithms. Technical Report.
[16]
Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solcà, and Thomas Schulthess. 2014. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations based on Fine-grained Memory Aware Tasks. The International Journal of High Performance Computing Applications 28, 2 (2014), 196--209. arXiv:https://doi.org/10.1177/1094342013502097
[17]
Nicholas J Higham. 1991. Algorithm 694: A Collection of Test Matrices in MATLAB. ACM Transactions on Mathematical Software (TOMS) 17, 3 (1991), 289--305.
[18]
Tsutomu Ikegami, Tetsuya Sakurai, and Umpei Nagashima. 2010. A Filter Diagonalization for Generalized Eigenvalue Problems based on the Sakurai-Sugiura Projection Method. J. Comput. Appl. Math. 233, 8 (2010), 1927--1936.
[19]
Toshiyuki Imamura, Susumu Yamada, and Masahiko Machida. 2011. Development of a High Performance Eigensolver on the Petascale next Generation Supercomputer System. Progress in Nuclear Science and Technology 2 (2011), 643--650.
[20]
James Kestyn, Vasileios Kalantzis, Eric Polizzi, and Yousef Saad. 2016. PFEAST: a High Performance Sparse Eigenvalue Solver using Distributed-memory Linear Solvers. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 178--189.
[21]
Walter Kohn. 1999. Nobel Lecture: Electronic Structure of Matter---wave Functions and Density Functionals. Reviews of Modern Physics 71, 5 (1999), 1253.
[22]
Walter Kohn and Lu Jeu Sham. 1965. Self-consistent Equations including Exchange and Correlation Effects. Physical review 140, 4A (1965), A1133.
[23]
Antoine Levitt and Marc Torrent. 2015. Parallel Eigensolvers in Plane-Wave Density Functional Theory. Computer Physics Communications 187 (Feb. 2015), 98--105.
[24]
Lin Lin, Yousef Saad, and Chao Yang. 2016. Approximating Spectral Densities of Large Matrices. SIAM Rev. 58, 1 (2016), 34--65.
[25]
Andreas Marek, Volker Blum, Rainer Johanni, Ville Havu, Bruno Lang, Thomas Auckenthaler, Alexander Heinecke, H-J Bungartz, and Hermann Lederer. 2014. The ELPA library: Scalable Parallel Eigenvalue Solutions for Electronic Structure Theory and Computational Science. Journal of Physics: Condensed Matter 26, 21 (5 2014), 213201.
[26]
Osni A Marques, Christof Vömel, James W Demmel, and Beresford N Parlett. 2008. Algorithm 880: A Testing Infrastructure for Symmetric Tridiagonal Eigensolvers. ACM Transactions on Mathematical Software (TOMS) 35, 1 (2008), 1--13.
[27]
Mirko Myllykoski and Carl Christian Kjelgaard Mikkelsen. 2021. Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems. Concurrency and Computation: Practice and Experience 33, 11 (2021), e5915. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.5915
[28]
NVIDIA. 2019. cuSOLVER :: CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/cusolver/
[29]
NVIDIA. 2021. https://www.nvidia.com/en-us/data-center/grace-cpu. Accessed: 2021-10-06.
[30]
Eric Polizzi. 2009. Density-matrix-based Algorithm for Solving Eigenvalue Problems. Physical Review B 79, 11 (2009), 115112.
[31]
Jack Poulson, Bryan Marker, Robert A. Van De Geijn, Jeff R. Hammond, and Nichols A. Romero. 2013. Elemental: A New Framework for Distributed Memory Dense Matrix Computations. ACM Trans. Math. Software 39, 2 (2 2013), 1--24.
[32]
Heinz Rutishauser. 1970. Simultaneous Iteration Method for Symmetric Matrices. Numer. Math. 16, 3 (1970), 205--223.
[33]
Tetsuya Sakurai and Hiroshi Sugiura. 2003. A Projection Method for Generalized Eigenvalue Problems using Numerical Integration. Journal of computational and applied mathematics 159, 1 (2003), 119--128.
[34]
Ahmed H Sameh and David J Kuck. 1977. A Parallel QR Algorithm for Symmetric Tridiagonal Matrices. IEEE Trans. Comput. 100, 2 (1977), 147--153.
[35]
Heidi K Thornquist. 2006. Fixed-polynomial Approximate Spectral Transformations for Preconditioning the Eigenvalue Problem. Rice University.
[36]
Françoise Tisseur and Jack Dongarra. 1999. A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures. SIAM Journal on Scientific Computing 20, 6 (1999), 2223--2236.
[37]
Stanimire Tomov, Jack Dongarra, and Marc Baboulin. 2010. Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems. Parallel Comput. 36, 5--6 (jun 2010), 232--240.
[38]
Field G Van Zee, Ernie Chan, Robert A Van de Geijn, Enrique S Quintana-Orti, and Gregorio Quintana-Orti. 2009. The libflame Library for Dense Matrix Computations. Computing in science & engineering 11, 6 (2009), 56--63.
[39]
Field G Van Zee and Robert A Van De Geijn. 2015. BLIS: A Framework for Rapidly Instantiating BLAS Functionality. ACM Transactions on Mathematical Software (TOMS) 41, 3 (2015), 1--33.
[40]
Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel Math Kernel Library. In High-Performance Computing on the Intel® Xeon Phi. Springer, 167--188.
[41]
David B. Williams-Young and Chao Yang. 2020. Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators. In 49th International Conference on Parallel Processing - ICPP. ACM, New York, NY, USA, 1--11.
[42]
Jan Winkelmann, Paul Springer, and Edoardo Di Napoli. 2019. ChASE: Chebyshev Accelerated Subspace Iteration Eigensolver for Sequences of Hermitian Eigenvalue Problems. ACM Trans. Math. Software 45, 2 (jun 2019), 1--34.
[43]
Zhang Xianyi, Wang Qian, and Zaheer Chothia. 2012. OpenBLAS. URL: http://xianyi.github.io/OpenBLAS 88 (2012).
[44]
Victor Wen zhe Yu, Jonathan Moussa, Pavel Kůs, Andreas Marek, Peter Messmer, Mina Yoon, Hermann Lederer, and Volker Blum. 2021. GPU-acceleration of the ELPA2 Distributed Eigensolver for Dense Symmetric and Hermitian Eigen-problems. Computer Physics Communications 262 (5 2021), 107808.
[45]
Xiao Zhang, Sebastian Achilles, Jan Winkelmann, Roland Haas, André Schleife, and Edoardo Di Napoli. 2021. Solving the Bethe-Salpeter Equation on Massively Parallel Architectures. Computer Physics Communications 267 (2021), 108081.
[46]
Yunkai Zhou, James R. Chelikowsky, and Yousef Saad. 2014. Chebyshev-Filtered Subspace Iteration Method Free of Sparse Diagonalization for Solving the Kohn-Sham Equation. J. Comput. Phys. 274 (Oct. 2014), 770--782.
[47]
Yunkai Zhou, Yousef Saad, Murilo L Tiago, and James R Chelikowsky. 2006. Parallel Self-Consistent-Field Calculations via Chebyshev-Filtered Subspace Acceleration. Physical Review E 74, 6 (2006), 066704.

Cited By

View all
  • (2024)Scalable computation of anisotropic vibrations for large macromolecular assembliesNature Communications10.1038/s41467-024-47685-815:1Online publication date: 24-Apr-2024

Index Terms

  1. ChASE: a distributed hybrid CPU-GPU eigensolver for large-scale hermitian eigenvalue problems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PASC '22: Proceedings of the Platform for Advanced Scientific Computing Conference
      June 2022
      181 pages
      ISBN:9781450394109
      DOI:10.1145/3539781
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      • CSCS: Swiss National Supercomputing Centre

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. chebyshev polynomial
      2. dense hermitian matrix
      3. distributed hybrid CPU-GPU
      4. heterogeneous GPU supercomputers
      5. subspace iteration eigensolver

      Qualifiers

      • Research-article

      Funding Sources

      • Deutsche Akademische Austauschdienst (DAAD)
      • Croatian Science Foundation
      • PRACE-6IP WP8

      Conference

      PASC '22
      Sponsor:

      Acceptance Rates

      PASC '22 Paper Acceptance Rate 17 of 22 submissions, 77%;
      Overall Acceptance Rate 109 of 221 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)34
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Scalable computation of anisotropic vibrations for large macromolecular assembliesNature Communications10.1038/s41467-024-47685-815:1Online publication date: 24-Apr-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media