skip to main content
10.1145/3581784.3607109acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open Access

High-Performance SVD Partial Spectrum Computation

Published:11 November 2023Publication History

ABSTRACT

We introduce a new singular value decomposition (SVD) solver based on the QR-based Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD (QDWHpartial-SVD) problems. By optimizing the rational function underlying the algorithms in the desired part of the spectrum only, the QDWHpartial-SVD algorithm efficiently computes a fraction (say 1--20%) of the leading singular values/vectors. We develop a high-performance implementation of QDWHpartial-SVD 1 on distributed-memory manycore systems and demonstrate its numerical robustness. We perform a benchmarking campaign against counterparts from the state-of-the-art numerical libraries across various matrix sizes using up to 36K MPI processes. Experimental results show performance speedups for QDWHpartial-SVD up to 6X and 2X against vendor-optimized PDGESVD from ScaLAPACK and KSVD on a Cray XC40 system using 1152 nodes based on two-socket 16-core Intel Haswell CPU, respectively. We also port our QDWHpartial-SVD software library to a system composed of 256 nodes with two-socket 64-Core AMD EPYC Milan CPU and achieve performance speedup up to 4X compared to vendor-optimized PDGESVD from ScaLAPACK. We also compare energy consumption for the two algorithms and demonstrate how QDWHpartial-SVD can further outperform PDGESVD in that regard by performing fewer memory-bound operations.

References

  1. Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G Genton, and David E Keyes. 2018. ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems. IEEE Transactions on Parallel and Distributed Systems 29, 12 (2018), 2771--2784.Google ScholarGoogle ScholarCross RefCross Ref
  2. Kadir Akbudak, Hatem Ltaief, Aleksandr Mikhalev, Ali Charara, Aniello Esposito, and David Keyes. 2018. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.), Vol. 11014. Springer International Publishing, Cham, 721--734.Google ScholarGoogle Scholar
  3. Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari, Jean-Yves L'Excellent, and Clément Weisbecker. 2015. Improving Multifrontal Methods by Means of Block Low-Rank Representations. SIAM Journal on Scientific Computing 37, 3 (2015), A1451--A1474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Edward Anderson, Zhaojun Bai, Christian Heinrich Bischof, Laura Susan Blackford, James Weldon Demmel, Jack J Dongarra, Jeremy J Du Croz, Anne Greenbaum, Sven Hammarling, A McKenney, and Danny C Sorensen. 1999. LAPACK User's Guide (3rd ed.). SIAM, Philadelphia.Google ScholarGoogle Scholar
  5. I.Y. Bar-Itzhack. 1975. Iterative Optimal Orthogonalization of the Strapdown Matrix. IEEE Trans. Aerospace Electron. Systems AES-11, 1 (Jan 1975), 30--37. Google ScholarGoogle ScholarCross RefCross Ref
  6. Christian H. Bischof, Bruno Lang, and Xiaobai Sun. 2000. Algorithm 807: The SBR Toolbox---Software for Successive Band Reduction. ACM Trans. Math. Software 26, 4 (2000), 602--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Suzan Blackford, J. Choi, Andy Cleary, Eduardo F. D'Azevedo, James W. Demmel, Inderjit S. Dhillon, Jack J. Dongarra, Sven Hammarling, Greg Henry, Antoine Petitet, Ken Stanley, David W. Walker, and R. Clint Whaley. 1997. ScaLAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia.Google ScholarGoogle Scholar
  8. Qinglei Cao, Sameh Abdulah, Rabab Alomairy, Yu Pei, Pratik Nag, George Bosilca, Jack Dongarra, Marc G. Genton, David E. Keyes, Hatem Ltaief, and Ying Sun. 2022. Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'22). IEEE Press, Dallas, Texas, Article 2, 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lars Eldén. 2007. Matrix Methods in Data Mining and Pattern Recognition. Society for Industrial and Applied Mathematics. x + 224 pages.Google ScholarGoogle Scholar
  10. Aniello Esposito, David E. Keyes, Hatem Ltaief, and Dalal Sukkari. 2018. Performance Impact of Rank-Reordering on Advanced Polar Decomposition Algorithms. In Cray Users' Group Conference. http://hdl.handle.net/10754/628026Google ScholarGoogle Scholar
  11. Massimiliano Fasi and Nicholas J. Higham. 2021. Generating Extreme-Scale Matrices With Specified Singular Values or Condition Number. SIAM Journal on Scientific Computing 43, 1 (2021), A663--A684. arXiv:https://doi.org/10.1137/20M1327938 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jerome A. Goldstein and Mel Levy. 1991. Linear Algebra and Quantum Chemistry. Amer. Math. Monthly 98, 10 (Oct. 1991), 710--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gene H. Golub and C. Reinsch. 1970. Singular Value Decomposition and Least Squares Solutions. Numerische Mathematik 14 (1970), 403--420.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gene H. Golub and Charles F. Van Loan. 2012. Matrix Computations (4th ed.). The Johns Hopkins University Press.Google ScholarGoogle Scholar
  15. Ming Gu and Stanley C. Eisenstat. 1996. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal on Scientific Computing 17, 4 (1996), 848--869.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wolfgang Hackbusch. 2015. Hierarchical Matrices: Algorithms and Analysis. Vol. 49. Springer. Springer Series in Computational Mathematics.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Azzam Haidar, Hatem Ltaief, and Jack Dongarra. 2011. Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems Using Aggregated Finegrained And Memory-aware Kernels. In Proceedings of SC'11 Conference on High Performance Computing Networking, Storage and Analysis. ACM SIGARCH/IEEE Computer Society, Seattle, WA, USA, 8.Google ScholarGoogle Scholar
  18. Azzam Haidar, Hatem Ltaief, and Jack Dongarra. 2012. Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem. SIAM Journal on Scientific Computing 34, 6 (2012), 249--274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Azzam Haidar, Stanimire Tomov, Jack Dongarra, Raffaele Solcá, and Thomas Schulthess. 2014. A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine-Grained Memory Aware Tasks. The International Journal of High Performance Computing Applications 28, 2 (2014), 196--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nicholas J. Higham. 2008. Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. xx+425 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bruno Lang. 1999. Efficient Eigenvalue and Singular Value Computations On Shared Memory Machines. Parallel Comput. 25, 7 (1999), 845--860.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hatem Ltaief, Piotr Luszczek, and Jack Dongarra. 2012. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction. In Parallel Processing and Applied Mathematics, Roman Wyrzykowski, Jack Dongarra, Konrad Karczewski, and Jerzy Waśniewski (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 661--670.Google ScholarGoogle Scholar
  24. Hatem Ltaief, Piotr Luszczek, and Jack Dongarra. 2012. High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures. ACM Trans. Math. Software 39, 3 (2012), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hatem Ltaief, Piotr Luszczek, Azzam Haidar, and Jack Dongarra. 2011. Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures. In PARCO (Advances in Parallel Computing, Vol. 22), Koen De Bosschere, Erik H. D'Hollander, Gerhard R. Joubert, David A. Padua, Frans J. Peters, and Mark Sawyer (Eds.). IOS Press, 397--404. Google ScholarGoogle ScholarCross RefCross Ref
  26. Hatem Ltaief, Dalal Sukkari, Aniello Esposito, Yuji Nakatsukasa, and David Keyes. 2019. Massively Parallel Polar Decomposition on Distributed-Memory Systems. ACM Transactions on Parallel Computing 6, 1, Article 4 (June 2019), 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hatem Ltaief, Dalal Sukkari, Oliver Guyon, and David Keyes. 2018. Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside Our Solar System. In Proceedings of the Platform for Advanced Scientific Computing Conference (Basel, Switzerland) (PASC'18). ACM, New York, NY, USA, Article 1, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Piotr Luszczek, Hatem Ltaief, and Jack Dongarra. 2011. Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures. In 2011 IEEE International Parallel & Distributed Processing Symposium. ACM, Anchorage, AK USA, 944--955.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Osni Marques, James Demmel, and Paulo B. Vasconcelos. 2020. Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem. ACM Trans. Math. Software 46, 2, Article 14 (May 2020), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yuji Nakatsukasa. 2020. Fast and stable randomized low-rank matrix approximation. arXiv:2009.11392 (2020).Google ScholarGoogle Scholar
  31. Yuji Nakatsukasa, Zhaojun Bai, and François Gygi. 2010. Optimizing Halley's Iteration for Computing the Matrix Polar Decomposition. SIAM J. Matrix Anal. Appl. 31, 5 (2010), 2700--2720. arXiv:https://doi.org/10.1137/090774999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yuji Nakatsukasa and Roland W. Freund. 2016. Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions. SIAM Rev. 58, 3 (2016), 461--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuji Nakatsukasa and Nicholas J. Higham. 2013. Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD. SIAM Journal on Scientific Computing 35, 3 (2013), A1325--A1349. Google ScholarGoogle ScholarCross RefCross Ref
  34. Ivan V. Oseledets and Eugene E. Tyrtyshnikov. 2009. Breaking the Curse of Dimensionality, Or How to Use SVD in Many Dimensions. SIAM Journal on Scientific Computing 31, 5 (Oct. 2009), 3744--3759. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robert Schreiber and Beresford Parlett. 1988. Block Reflectors: Theory and Computation. SIAM J. Numer. Anal. 25, 1 (1988), 189--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rémi Soummer, Laurent Pueyo, and James Larkin. 2012. Detection and characterization of exoplanets and disks using projections on Karhunen-Loève eigenimages. The Astrophysical Journal Letters 755, 2 (2012), L28.Google ScholarGoogle ScholarCross RefCross Ref
  37. Dalal Sukkari, Hatem Ltaief, Aniello Esposito, and David Keyes. 2019. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems. ACM Trans. Math. Software 45, 2, Article 18 (April 2019), 21 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dalal Sukkari, Hatem Ltaief, Mathieu Faverge, and David Keyes. 2017. Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures. IEEE Transactions on Parallel and Distributed Systems PP, 99 (2017), 1--1. Google ScholarGoogle ScholarCross RefCross Ref
  39. Dalal Sukkari, Hatem Ltaief, and David E. Keyes. 2016. A High Performance QDWH-SVD Solver Using Hardware Accelerators. ACM Trans. Math. Software 43, 1 (2016), 6:1--6:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dalal Sukkari, Hatem Ltaief, and David E. Keyes. 2016. High Performance Polar Decomposition on Distributed Memory Systems. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24--26, 2016, Proceedings (Lecture Notes in Computer Science, Vol. 9833), Pierre-François Dutot and Denis Trystram (Eds.). Springer, 605--616. Google ScholarGoogle ScholarCross RefCross Ref
  41. Lloyd N. Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA. http://www.siam.org/books/OT50/Index.htmGoogle ScholarGoogle Scholar

Index Terms

  1. High-Performance SVD Partial Spectrum Computation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
            November 2023
            1428 pages
            ISBN:9798400701092
            DOI:10.1145/3581784

            Copyright © 2023 Owner/Author(s)

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 November 2023

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,516of6,373submissions,24%
          • Article Metrics

            • Downloads (Last 12 months)167
            • Downloads (Last 6 weeks)43

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader