skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Distributed-memory lattice H-matrix factorization

Journal Article · · International Journal of High Performance Computing Applications
ORCiD logo [1];  [2]; ORCiD logo [3];  [4]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
  2. The Univ. of Tokyo, Tokyo (Japan)
  3. Tokyo Inst. of Technology, Tokyo (Japan)
  4. The Univ. of Tennessee, Knoxville, TN (United States)

We parallelize the LU factorization of a hierarchical low-rank matrix (H-matrix) on a distributed-memory computer. This is much more difficult than the H-matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the parallelization of a dense matrix factorization due to the irregular hierarchical block structure of the matrix. Block low-rank (BLR) format gets rid of the hierarchy and simplifies the parallelization, often increasing concurrency. However, this comes at a price of losing the near-linear complexity of the H-matrix factorization. In this work, we propose to factorize the matrix using a “lattice H-matrix” format that generalizes the BLR format by storing each of the blocks (both diagonals and off-diagonals) in the H-matrix format. These blocks stored in the H-matrix format are referred to as lattices. Thus, this lattice format aims to combine the parallel scalability of BLR factorization with the near-linear complexity of H-matrix factorization. We first compare factorization performances using the H-matrix, BLR, and lattice H-matrix formats under various conditions on a shared-memory computer. Our performance results show that the lattice format has storage and computational complexities similar to those of the H-matrix format, and hence a much lower cost of factorization than BLR. In conclusion, we then compare the BLR and lattice H-matrix factorization on distributed-memory computers. Our performance results demonstrate that compared with BLR, the lattice format with the lower cost of factorization may lead to faster factorization on the distributed-memory computer.

Research Organization:
Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC04-94AL85000
OSTI ID:
1559494
Report Number(s):
SAND-2019-8102J; 677691
Journal Information:
International Journal of High Performance Computing Applications, Vol. 33, Issue 5; ISSN 1094-3420
Publisher:
SAGECopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

References (35)

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers
  • Gahvari, Hormozd; Gropp, William; Jordan, Kirk E.
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) https://doi.org/10.1109/IPDPSW.2013.164
conference May 2013
Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator journal March 2018
Improving Multifrontal Methods by Means of Block Low-Rank Representations journal January 2015
Parallel Randomized and Matrix-Free Direct Solvers for Large Structured Dense Linear Systems journal January 2016
Task-Parallel LU Factorization of Hierarchical Matrices Using OmpSs conference May 2017
The adaptive cross-approximation technique for the 3D boundary-element method journal March 2002
Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems journal March 1992
Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP journal January 2017
A class of parallel tiled linear algebra algorithms for multicore architectures journal January 2009
Parallel Hierarchical Matrices with Adaptive Cross Approximation on Symmetric Multiprocessing Clusters journal January 2014
Adaptive Low-Rank Approximation of Collocation Matrices journal February 2003
Geometry-oblivious FMM for compressing dense SPD matrices
  • Yu, Chenhan D.; Levitt, James; Reiz, Severin
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126921
conference January 2017
Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures journal January 2013
INV-ASKIT: A Parallel Fast Direct Solver for Kernel Matrices conference May 2016
A Sparse Matrix Arithmetic Based on $\Cal H$ -Matrices. Part I: Introduction to ${\Cal H}$ -Matrices journal April 1999
A Fast $ULV$ Decomposition Solver for Hierarchically Semiseparable Representations journal January 2006
Lattice H-Matrices on Distributed-Memory Systems conference May 2018
A Parallel Butterfly Algorithm journal January 2014
On the Complexity of the Block Low-Rank Multifrontal Factorization journal January 2017
A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization journal June 2016
A Recursive Skeletonization Factorization Based on Strong Admissibility journal January 2017
Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation journal January 2017
Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors journal April 2018
Butterfly Factorization journal January 2015
Porting the PLASMA Numerical Library to the OpenMP Standard journal June 2016
A fast algorithm for particle simulations journal December 1987
Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems
  • Ida, Akihiro; Nakashima, Hiroshi; Kawai, Masatoshi
  • Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region - HPC Asia 2018 https://doi.org/10.1145/3149457.3149477
conference January 2018
Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation journal February 2011
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients journal December 2018
A Fast Algorithm for Particle Simulations journal August 1997
Adaptive low-rank approximation of collocation matrices collection January 2001
Sweeping Preconditioner for the Helmholtz Equation: Hierarchical Matrix Representation preprint January 2010
Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation text January 2015
A recursive skeletonization factorization based on strong admissibility text January 2016
Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients text January 2017

Figures / Tables (24)


Similar Records

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling
Journal Article · Thu Oct 27 00:00:00 EDT 2016 · SIAM Journal on Scientific Computing · OSTI ID:1559494

Sparse Approximate Multifrontal Factorization with Composite Compression Methods
Journal Article · Tue Sep 19 00:00:00 EDT 2023 · ACM Transactions on Mathematical Software · OSTI ID:1559494

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization
Journal Article · Thu Jun 30 00:00:00 EDT 2016 · ACM Transactions on Mathematical Software · OSTI ID:1559494