Distributed-memory lattice H-matrix factorization

Yamazaki, Ichitaro; Ida, Akihiro; Yokota, Rio; Dongarra, Jack

doi:10.1177/1094342019861139

Title: Distributed-memory lattice H-matrix factorization

Journal Article · Thu Aug 01 00:00:00 EDT 2019 · International Journal of High Performance Computing Applications

DOI:https://doi.org/10.1177/1094342019861139· OSTI ID:1559494

^[1]; Ida, Akihiro ^[2];

^[3]; Dongarra, Jack ^[4]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
The Univ. of Tokyo, Tokyo (Japan)
Tokyo Inst. of Technology, Tokyo (Japan)
The Univ. of Tennessee, Knoxville, TN (United States)

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1559494

Report Number(s):: SAND-2019-8102J; 677691

Journal Information:: International Journal of High Performance Computing Applications, Vol. 33, Issue 5; ISSN 1094-3420

Publisher:: SAGECopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 11 works

Citation information provided by
Web of Science

References (35)

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers Gahvari, Hormozd; Gropp, William; Jordan, Kirk E. 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.164	conference	May 2013
Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator Ida, Akihiro; Ataka, Tadashi; Takahashi, Yasuhito IEEE Transactions on Magnetics, Vol. 54, Issue 3 https://doi.org/10.1109/TMAG.2017.2763611	journal	March 2018
Improving Multifrontal Methods by Means of Block Low-Rank Representations Amestoy, Patrick; Ashcraft, Cleve; Boiteau, Olivier SIAM Journal on Scientific Computing, Vol. 37, Issue 3 https://doi.org/10.1137/120903476	journal	January 2015
Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems Liu, Xiao; Xia, Jianlin; de Hoop, Maarten V. SIAM Journal on Scientific Computing, Vol. 38, Issue 5 https://doi.org/10.1137/15M1023774	journal	January 2016
Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs Aliaga, Jose I.; Carratala-Saez, Rocio; Kriemann, Ronald 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) https://doi.org/10.1109/IPDPSW.2017.124	conference	May 2017
The adaptive cross-approximation technique for the 3D boundary-element method Kurz, S.; Rain, O.; Rjasanow, S. IEEE Transactions on Magnetics, Vol. 38, Issue 2 https://doi.org/10.1109/20.996112	journal	March 2002
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems van der Vorst, H. A. SIAM Journal on Scientific and Statistical Computing, Vol. 13, Issue 2 https://doi.org/10.1137/0913035	journal	March 1992
Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP Iwashita, Takeshi; Ida, Akihiro; Mifune, Takeshi Procedia Computer Science, Vol. 108 https://doi.org/10.1016/j.procs.2017.05.263	journal	January 2017
A class of parallel tiled linear algebra algorithms for multicore architectures Buttari, Alfredo; Langou, Julien; Kurzak, Jakub Parallel Computing, Vol. 35, Issue 1 https://doi.org/10.1016/j.parco.2008.10.002	journal	January 2009
Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters Ida, Akihiro; Iwashita, Takeshi; Mifune, Takeshi Journal of Information Processing, Vol. 22, Issue 4 https://doi.org/10.2197/ipsjjip.22.642	journal	January 2014
Adaptive Low-Rank Approximation of Collocation Matrices Bebendorf, M.; Rjasanow, S. Computing, Vol. 70, Issue 1 https://doi.org/10.1007/s00607-002-1469-6	journal	February 2003
Geometry-oblivious FMM for compressing dense SPD matrices Yu, Chenhan D.; Levitt, James; Reiz, Severin Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126921	conference	January 2017
Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures Wang, Shen; Li, Xiaoye S.; Xia, Jianlin SIAM Journal on Scientific Computing, Vol. 35, Issue 6 https://doi.org/10.1137/110848062	journal	January 2013
INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices Yu, Chenhan D.; March, William B.; Xiao, Bo 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2016.12	conference	May 2016
A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices Hackbusch, W. Computing, Vol. 62, Issue 2 https://doi.org/10.1007/s006070050015	journal	April 1999
A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations Chandrasekaran, S.; Gu, M.; Pals, T. SIAM Journal on Matrix Analysis and Applications, Vol. 28, Issue 3 https://doi.org/10.1137/S0895479803436652	journal	January 2006
Lattice H-Matrices on Distributed-Memory Systems Ida, Akihiro 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) https://doi.org/10.1109/IPDPS.2018.00049	conference	May 2018
A Parallel Butterfly Algorithm Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas SIAM Journal on Scientific Computing, Vol. 36, Issue 1 https://doi.org/10.1137/130921544	journal	January 2014
On the Complexity of the Block Low-Rank Multifrontal Factorization Amestoy, Patrick; Buttari, Alfredo; L'Excellent, Jean-Yves SIAM Journal on Scientific Computing, Vol. 39, Issue 4 https://doi.org/10.1137/16M1077192	journal	January 2017
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter ACM Transactions on Mathematical Software, Vol. 42, Issue 4 https://doi.org/10.1145/2930660	journal	June 2016
A Recursive Skeletonization Factorization Based on Strong Admissibility Minden, Victor; Ho, Kenneth L.; Damle, Anil Multiscale Modeling & Simulation, Vol. 15, Issue 2 https://doi.org/10.1137/16M1095949	journal	January 2017
Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation Pouransari, Hadi; Coulier, Pieter; Darve, Eric SIAM Journal on Scientific Computing, Vol. 39, Issue 3 https://doi.org/10.1137/15M1046939	journal	January 2017
Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors Tominaga, Naoki; Mifune, Takeshi; Ida, Akihiro IEEE Transactions on Applied Superconductivity, Vol. 28, Issue 3 https://doi.org/10.1109/TASC.2017.2780821	journal	April 2018
Butterfly Factorization Li, Yingzhou; Yang, Haizhao; Martin, Eileen R. Multiscale Modeling & Simulation, Vol. 13, Issue 2 https://doi.org/10.1137/15M1007173	journal	January 2015
Porting the PLASMA Numerical Library to the OpenMP Standard YarKhan, Asim; Kurzak, Jakub; Luszczek, Piotr International Journal of Parallel Programming, Vol. 45, Issue 3 https://doi.org/10.1007/s10766-016-0441-6	journal	June 2016
A fast algorithm for particle simulations Greengard, L.; Rokhlin, V. Journal of Computational Physics, Vol. 73, Issue 2 https://doi.org/10.1016/0021-9991(87)90140-9	journal	December 1987
Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems Ida, Akihiro; Nakashima, Hiroshi; Kawai, Masatoshi Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region - HPC Asia 2018 https://doi.org/10.1145/3149457.3149477	conference	January 2018
Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation Engquist, Björn; Ying, Lexing Communications on Pure and Applied Mathematics, Vol. 64, Issue 5 https://doi.org/10.1002/cpa.20358	journal	February 2011
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano Journal of Computational and Applied Mathematics, Vol. 344 https://doi.org/10.1016/j.cam.2017.11.035	journal	December 2018
A Fast Algorithm for Particle Simulations Greengard, L.; Rokhlin, V. Journal of Computational Physics, Vol. 135, Issue 2 https://doi.org/10.1006/jcph.1997.5706	journal	August 1997
Adaptive low-rank approximation of collocation matrices Bebendorf, Mario; Rjasanow, Sergej Universität des Saarlandes https://doi.org/10.22028/d291-26177	collection	January 2001
Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation Engquist, Björn; Ying, Lexing arXiv https://doi.org/10.48550/arxiv.1007.4290	preprint	January 2010
Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation Pouransari, Hadi; Coulier, Pieter; Darve, Eric arXiv https://doi.org/10.48550/arxiv.1510.07363	text	January 2015
A recursive skeletonization factorization based on strong admissibility Minden, Victor; Ho, Kenneth L.; Damle, Anil arXiv https://doi.org/10.48550/arxiv.1609.08130	text	January 2016
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients Chávez, Gustavo; Turkiyyah, George; Zampini, Stefano arXiv https://doi.org/10.48550/arxiv.1712.08872	text	January 2017