Reordering sparse matrices into block-diagonal column-overlapped form

doi:10.1016/j.jpdc.2020.03.002

Journal of Parallel and Distributed Computing

Volume 140, June 2020, Pages 99-109

https://doi.org/10.1016/j.jpdc.2020.03.002 Get rights and content

Highlights

•
Reorders sparse matrices into block-diagonal column-overlapped (BDCO) form.
•
Minimizes the total number of coupling columns between diagonal blocks.
•
Balances the number of nonzeros in diagonal blocks.
•
Proposes a hypergraph partitioning model to solve the BDCO reordering problem.
•
Uses recursive bipartitioning and fixed vertices.
•
Experimental results validate the effectiveness of the proposed model.

Abstract

Many scientific and engineering applications necessitate computing the minimum norm solution of a sparse underdetermined linear system of equations. The minimum 2-norm solution of such systems can be obtained by a recent parallel algorithm, whose numerical effectiveness and parallel scalability are validated in both shared- and distributed-memory architectures. This parallel algorithm assumes the coefficient matrix in a block-diagonal column-overlapped (BDCO) form, which is a variant of the block-diagonal form where the successive diagonal blocks may overlap along their columns. The total overlap size of the BDCO form is an important metric in the performance of the subject parallel algorithm since it determines the size of the reduced system, solution of which is a bottleneck operation in the parallel algorithm. In this work, we propose a hypergraph partitioning model for reordering sparse matrices into BDCO form with the objective of minimizing the total overlap size and the constraint of maintaining balance on the number of nonzeros of the diagonal blocks. Our model makes use of existing partitioning tools that support fixed vertices in the recursive bipartitioning paradigm. Experimental results validate the use of our model as it achieves small overlap size and balanced diagonal blocks.

Introduction

Many scientific and engineering applications [6], [12], [19], [22], [25], [27] require computing the minimum norm solution of an underdetermined system of equations of the form $A x = f,$ where $A$ is an $m \times n$ sparse matrix and $m < n$ [18]. A common approach for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of a QR factorization in a direct method. Various packages such as SuiteSparseQR [13], [14], HSL MA49 [2], and qr_mumps [7] provide efficient parallel and sequential implementations for the general sparse QR algorithm.

One recent approach [24] for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of the Balance scheme [17], [21], [23], which is an effective and efficient parallel algorithm originally proposed for an ill-conditioned banded linear system of equations. This parallel algorithm [24] can also be considered as an extension of the general sparse QR factorization to distributed-memory systems for obtaining the minimum 2-norm solution. In this parallel algorithm, the coefficient matrix is assumed to be in block-diagonal column-overlapped (BDCO) form with $K$ diagonal blocks, which is shown in Fig. 1. As seen in the figure, the $k$ th diagonal block $E_{k}$ of the BDCO form is given by $E_{k} = [C_{k} A_{k} B_{k}],$ for $k = 2, 3, \dots, K - 1$ , whereas the first and last diagonal blocks $E_{1}$ and $E_{K}$ are respectively given by $E_{1} = [A_{1} B_{1}] and E_{K} = [C_{K} A_{K}] .$ Here, successive diagonal blocks $E_{k - 1}$ and $E_{k}$ overlap along the columns of their submatrices $B_{k - 1}$ and $C_{k}$ . Columns of the overlapping submatrices are referred to as coupling columns.

The parallel algorithm [24] exploits the BDCO form so that each of $K$ processors independently solves a small linear least squares problem of the form $E_{k} z_{k} = f_{k},$ where $z_{1} = [\begin{matrix} x_{1} \\ e_{1} \end{matrix}], z_{k} = [\begin{matrix} {\hat{e}}_{k - 1} \\ x_{k} \\ e_{k} \end{matrix}] for k = 2, 3, \dots, K - 1, and z_{K} = [\begin{matrix} {\hat{e}}_{K - 1} \\ x_{K} \end{matrix}] .$ This step is performed in parallel without any communication. For $x$ to be a solution, $e_{k}$ and ${\hat{e}}_{k}$ should be equal for $k = 1, 2, \dots, K - 1$ . This is ensured by a sequential step in which a small system called reduced system is solved. The size of the reduced system is determined by the total overlap size, i.e., the total number of coupling columns. When compared to the state-of-the-art parallel QR implementations, this approach is reported to achieve better scalability on both shared- and distributed-memory architectures [24].

In this paper, our aim is to find two permutations $P$ and $Q$ such that $P A Q$ is in a $K$ -way BDCO form. We refer to this permutation problem as the BDCO problem. In the BDCO problem, the objective is to minimize the total overlap size in $P A Q$ since the performance of the above-mentioned parallel algorithm considerably deteriorates with increasing total overlap size, as reported in [24]. The constraint of the BDCO problem is to maintain balance on the number of nonzeros in the diagonal blocks of $P A Q$ to ensure balanced workload for processors.

To the best of our knowledge, literature lacks an algorithm for solving the BDCO problem. However, there are efforts for solving problems that resemble the BDCO problem. In [5] and [26], the problems of reordering sparse matrices into singly-bordered block diagonal (SBBD) form and separated block diagonal (SBD) form were addressed, respectively. In the SBBD form, all coupling rows are reordered to a border at the bottom or all coupling columns are reordered to a border on the right. In the SBD form, coupling rows are reordered in between two parts as they are encountered during the recursive bipartitioning steps. The SBBD and SBD forms differ from the BDCO form as they allow coupling rows/columns to span more than two diagonal blocks, which are not necessarily consecutive. In [1], the problem of reordering sparse square matrices into block diagonal form with overlap (BDO form) was addressed. The differences between the BDCO and BDO forms are two-fold: (i) the BDO form is obtained on structurally symmetric matrices via symmetric row/column permutation, whereas the BDCO form is obtained on non-square matrices via row and column permutations that are different from each other, (ii) consecutive diagonal blocks overlap along both rows and columns in the BDO form, whereas they overlap along only columns in the BDCO form.

In order to solve the BDCO problem, we propose the HP $_{BDCO}$ algorithm, which utilizes the column-net hypergraph model [8] of the given coefficient matrix $A$ . First, we investigate the feasibility of the $K$ -way BDCO permutation for matrix $A$ by considering the vertex-to-vertex adjacency topology of the column-net hypergraph of $A$ . If the respective permutation is feasible, then the corresponding hypergraph is partitioned by the proposed HP $_{BDCO}$ algorithm so that the partitioning objective corresponds to minimizing the total overlap size, whereas the partitioning constraint corresponds to maintaining balance on the number of nonzeros of the diagonal blocks. The HP $_{BDCO}$ algorithm utilizes fixed vertices within the recursive bipartitioning paradigm in order to ensure that only the successive diagonal blocks may share columns. The proposed algorithm is flexible in the sense that any hypergraph partitioning tool that supports fixed vertices can be used in the implementation. The effectiveness of the proposed HP $_{BDCO}$ algorithm is validated by reordering a wide range of matrices into BDCO form with small overlap size and balanced diagonal blocks as well as by running the parallel code [24] on the reordered matrices.

The rest of the paper is organized as follows. Section 2 gives background on hypergraph partitioning. The proposed HP $_{BDCO}$ algorithm is given in Section 3. Section 4 gives experimental results and Section 5 concludes.

Section snippets

Hypergraphs

A hypergraph $H = (V, N)$ is defined as a set $V$ of vertices and a set $N$ of nets. Each net $n \in N$ connects a subset of vertices in $V$ , which is denoted by $P i n s (n)$ . We use phrase “pins of $n$ ” to refer to the vertices connected by net $n$ . In a dual manner, each vertex $v \in V$ is connected by a subset of nets in $N$ , which is denoted by $N e t s (v)$ .

We adapt the following graph terminology to hypergraphs. In a hypergraph, two vertices are said to be adjacent if they are connected by at least one common net. Let $A d j (v)$

Proposed model

Suppose that we are given an $m \times n$ sparse matrix $A$ and an integer $K > 1$ . For reordering $A$ into $K$ -way BDCO form, we propose a hypergraph partitioning algorithm, HP $_{BDCO}$ , which utilizes the column-net hypergraph model [8]. Let $H = (V, N)$ denote the column-net hypergraph of $A$ with vertex and net sets $V = {v_{1}, v_{2}, \dots, v_{m}} and N = {n_{1}, n_{2}, \dots, n_{n}} .$ Vertex $v_{i}$ represents row $i$ of $A$ for $i = 1, 2, \dots, m$ , whereas net $n_{j}$ represents column $j$ of $A$ for $j = 1, 2, \dots, n$ . Net $n_{j}$ connects vertices that represent the rows with a nonzero entry in

Experiments

Recall that the HP $_{BDCO}$ algorithm is the first algorithm in the literature that addresses the BDCO problem. For evaluating the performance of HP $_{BDCO}$ , we consider a baseline method that first permutes the given matrix into a banded form and then applies block partitioning on it to obtain $K$ diagonal blocks with column overlap. The details of this baseline algorithm, which we refer to as the RCM $_{BDCO}$ algorithm, are described in Section 4.1.

We performed two different sets of experiments. The first

Conclusion

In this paper, we target the problem of reordering a given sparse matrix into block-diagonal column-overlapped (BDCO) form, which arises in a recent parallel algorithm proposed for obtaining the minimum norm solution of a given underdetermined system of equations. We first defined an equivalent hypergraph partitioning problem with an additional constraint tailored to the BDCO form and investigated the feasibility of its solutions. Then we proposed a recursive-bipartitioning-based partitioning

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Seher Acer received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Bilkent University, Turkey. She is currently a postdoctoral researcher at Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico, USA. Her research interests include parallel computing and combinatorial scientific computing with a focus on partitioning sparse irregular computations.

References (27)

AykanatC. et al.
Adaptive decomposition and remapping algorithms for object-space-parallel direct volume rendering of unstructured grids
J. Parallel Distrib. Comput.
(2007)
AykanatC. et al.
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices
J. Parallel Distrib. Comput.
(2008)
CatalyurekU.V. et al.
A repartitioning hypergraph model for dynamic load balancing
J. Parallel Distrib. Comput.
(2009)
SamehA.H. et al.
Parallel algorithms for indefinite linear systems
Parallel Comput.
(2002)
TezduyarT.E. et al.
Parallel finite element computations in fluid mechanics
Comput. Methods Appl. Mech. Engrg.
(2006)
ZhdanovM. et al.
AcerS. et al.
A recursive bipartitioning algorithm for permuting sparse square matrices into block diagonal form with overlap
SIAM J. Sci. Comput.
(2013)
AmestoyP.R. et al.
Multifrontal QR factorization in a multiprocessor environment
Numer. Linear Algebra Appl.
(1996)
AykanatC. et al.
Permuting sparse rectangular matrices into block-diagonal form
SIAM J. Sci. Comput.
(2004)
BrucksteinA.M. et al.
From sparse solutions of systems of equations to sparse modeling of signals and images
SIAM Rev.
(2009)

ButtariA.

Fine-grained multithreading for the multifrontal QR factorization of sparse matrices

SIAM J. Sci. Comput.

(2013)

CatalyurekU.V. et al.

Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication

IEEE Trans. Parallel Distrib. Syst.

(1999)

ÇatalyürekU.V. et al.

PaToH: Partitioning Tool for HypergraphsTech. Rep.

(1999)

Cited by (2)

Aggregation of clans to speed-up solving linear systems on parallel architectures
2022, International Journal of Parallel, Emergent and Distributed Systems
The Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures
2020, Modeling and Simulation in Science, Engineering and Technology

Cevdet Aykanat received the B.S. and M.S. degrees from Middle East Technical University, Ankara, Turkey, both in Electrical Engineering, and the Ph.D. degree from Ohio State University, Columbus, in Electrical and Computer Engineering. He worked at the Intel Supercomputer Systems Division, Beaverton, Oregon, as a research associate. Since 1989, he has been affiliated with Computer Engineering Department, Bilkent University, Ankara, Turkey, where he is currently a professor. His research interests mainly include parallel computing and its combinatorial aspects. He is the recipient of the 1995 Investigator Award of The Scientific and Technological Research Council of Turkey and 2007 Parlar Science Award. He has served as an Associate Editor of IEEE Transactions of Parallel and Distributed Systems between 2008 and 2012.

^☆: Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. The work was done when Seher Acer was with Bilkent University.

View full text

Reordering sparse matrices into block-diagonal column-overlapped form☆

Highlights

Abstract

Introduction

Section snippets

Hypergraphs

Proposed model

Experiments

Conclusion

Declaration of Competing Interest

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput.

Parallel Comput.

Comput. Methods Appl. Mech. Engrg.

A recursive bipartitioning algorithm for permuting sparse square matrices into block diagonal form with overlap

SIAM J. Sci. Comput.

Multifrontal QR factorization in a multiprocessor environment

Numer. Linear Algebra Appl.

Permuting sparse rectangular matrices into block-diagonal form

SIAM J. Sci. Comput.

From sparse solutions of systems of equations to sparse modeling of signals and images

SIAM Rev.

Fine-grained multithreading for the multifrontal QR factorization of sparse matrices

SIAM J. Sci. Comput.

Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication

IEEE Trans. Parallel Distrib. Syst.

PaToH: Partitioning Tool for HypergraphsTech. Rep.