Reordering sparse matrices into block-diagonal column-overlapped form

https://doi.org/10.1016/j.jpdc.2020.03.002Get rights and content

Highlights

  • Reorders sparse matrices into block-diagonal column-overlapped (BDCO) form.

  • Minimizes the total number of coupling columns between diagonal blocks.

  • Balances the number of nonzeros in diagonal blocks.

  • Proposes a hypergraph partitioning model to solve the BDCO reordering problem.

  • Uses recursive bipartitioning and fixed vertices.

  • Experimental results validate the effectiveness of the proposed model.

Abstract

Many scientific and engineering applications necessitate computing the minimum norm solution of a sparse underdetermined linear system of equations. The minimum 2-norm solution of such systems can be obtained by a recent parallel algorithm, whose numerical effectiveness and parallel scalability are validated in both shared- and distributed-memory architectures. This parallel algorithm assumes the coefficient matrix in a block-diagonal column-overlapped (BDCO) form, which is a variant of the block-diagonal form where the successive diagonal blocks may overlap along their columns. The total overlap size of the BDCO form is an important metric in the performance of the subject parallel algorithm since it determines the size of the reduced system, solution of which is a bottleneck operation in the parallel algorithm. In this work, we propose a hypergraph partitioning model for reordering sparse matrices into BDCO form with the objective of minimizing the total overlap size and the constraint of maintaining balance on the number of nonzeros of the diagonal blocks. Our model makes use of existing partitioning tools that support fixed vertices in the recursive bipartitioning paradigm. Experimental results validate the use of our model as it achieves small overlap size and balanced diagonal blocks.

Introduction

Many scientific and engineering applications [6], [12], [19], [22], [25], [27] require computing the minimum norm solution of an underdetermined system of equations of the form Ax=f,where A is an m×n sparse matrix and m<n [18]. A common approach for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of a QR factorization in a direct method. Various packages such as SuiteSparseQR [13], [14], HSL MA49 [2], and qr_mumps [7] provide efficient parallel and sequential implementations for the general sparse QR algorithm.

One recent approach [24] for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of the Balance scheme [17], [21], [23], which is an effective and efficient parallel algorithm originally proposed for an ill-conditioned banded linear system of equations. This parallel algorithm [24] can also be considered as an extension of the general sparse QR factorization to distributed-memory systems for obtaining the minimum 2-norm solution. In this parallel algorithm, the coefficient matrix is assumed to be in block-diagonal column-overlapped (BDCO) form with K diagonal blocks, which is shown in Fig. 1. As seen in the figure, the kth diagonal block Ek of the BDCO form is given by Ek=[CkAkBk],for k=2,3,,K1, whereas the first and last diagonal blocks E1 and EK are respectively given by E1=[A1B1]andEK=[CKAK].Here, successive diagonal blocks Ek1 and Ek overlap along the columns of their submatrices Bk1 and Ck. Columns of the overlapping submatrices are referred to as coupling columns.

The parallel algorithm [24] exploits the BDCO form so that each of K processors independently solves a small linear least squares problem of the form Ekzk=fk,where z1=x1e1,zk=eˆk1xkek for k=2,3,,K1, and zK=eˆK1xK. This step is performed in parallel without any communication. For x to be a solution, ek and eˆk should be equal for k=1,2,,K1. This is ensured by a sequential step in which a small system called reduced system is solved. The size of the reduced system is determined by the total overlap size, i.e., the total number of coupling columns. When compared to the state-of-the-art parallel QR implementations, this approach is reported to achieve better scalability on both shared- and distributed-memory architectures [24].

In this paper, our aim is to find two permutations P and Q such that PAQ is in a K-way BDCO form. We refer to this permutation problem as the BDCO problem. In the BDCO problem, the objective is to minimize the total overlap size in PAQ since the performance of the above-mentioned parallel algorithm considerably deteriorates with increasing total overlap size, as reported in [24]. The constraint of the BDCO problem is to maintain balance on the number of nonzeros in the diagonal blocks of PAQ to ensure balanced workload for processors.

To the best of our knowledge, literature lacks an algorithm for solving the BDCO problem. However, there are efforts for solving problems that resemble the BDCO problem. In [5] and [26], the problems of reordering sparse matrices into singly-bordered block diagonal (SBBD) form and separated block diagonal (SBD) form were addressed, respectively. In the SBBD form, all coupling rows are reordered to a border at the bottom or all coupling columns are reordered to a border on the right. In the SBD form, coupling rows are reordered in between two parts as they are encountered during the recursive bipartitioning steps. The SBBD and SBD forms differ from the BDCO form as they allow coupling rows/columns to span more than two diagonal blocks, which are not necessarily consecutive. In [1], the problem of reordering sparse square matrices into block diagonal form with overlap (BDO form) was addressed. The differences between the BDCO and BDO forms are two-fold: (i) the BDO form is obtained on structurally symmetric matrices via symmetric row/column permutation, whereas the BDCO form is obtained on non-square matrices via row and column permutations that are different from each other, (ii) consecutive diagonal blocks overlap along both rows and columns in the BDO form, whereas they overlap along only columns in the BDCO form.

In order to solve the BDCO problem, we propose the HPBDCO algorithm, which utilizes the column-net hypergraph model [8] of the given coefficient matrix A. First, we investigate the feasibility of the K-way BDCO permutation for matrix A by considering the vertex-to-vertex adjacency topology of the column-net hypergraph of A. If the respective permutation is feasible, then the corresponding hypergraph is partitioned by the proposed HPBDCO algorithm so that the partitioning objective corresponds to minimizing the total overlap size, whereas the partitioning constraint corresponds to maintaining balance on the number of nonzeros of the diagonal blocks. The HPBDCO algorithm utilizes fixed vertices within the recursive bipartitioning paradigm in order to ensure that only the successive diagonal blocks may share columns. The proposed algorithm is flexible in the sense that any hypergraph partitioning tool that supports fixed vertices can be used in the implementation. The effectiveness of the proposed HPBDCO algorithm is validated by reordering a wide range of matrices into BDCO form with small overlap size and balanced diagonal blocks as well as by running the parallel code [24] on the reordered matrices.

The rest of the paper is organized as follows. Section 2 gives background on hypergraph partitioning. The proposed HPBDCO algorithm is given in Section 3. Section 4 gives experimental results and Section 5 concludes.

Section snippets

Hypergraphs

A hypergraph H=(V,N) is defined as a set V of vertices and a set N of nets. Each net nN connects a subset of vertices in V, which is denoted by Pins(n). We use phrase “pins of n” to refer to the vertices connected by net n. In a dual manner, each vertex vV is connected by a subset of nets in N, which is denoted by Nets(v).

We adapt the following graph terminology to hypergraphs. In a hypergraph, two vertices are said to be adjacent if they are connected by at least one common net. Let Adj(v)

Proposed model

Suppose that we are given an m×n sparse matrix A and an integer K>1. For reordering A into K-way BDCO form, we propose a hypergraph partitioning algorithm, HPBDCO, which utilizes the column-net hypergraph model [8]. Let H=(V,N) denote the column-net hypergraph of A with vertex and net sets V={v1,v2,,vm} and N={n1,n2,,nn}.Vertex vi represents row i of A for i=1,2,,m, whereas net nj represents column j of A for j=1,2,,n. Net nj connects vertices that represent the rows with a nonzero entry in

Experiments

Recall that the HPBDCO algorithm is the first algorithm in the literature that addresses the BDCO problem. For evaluating the performance of HPBDCO, we consider a baseline method that first permutes the given matrix into a banded form and then applies block partitioning on it to obtain K diagonal blocks with column overlap. The details of this baseline algorithm, which we refer to as the RCMBDCO algorithm, are described in Section 4.1.

We performed two different sets of experiments. The first

Conclusion

In this paper, we target the problem of reordering a given sparse matrix into block-diagonal column-overlapped (BDCO) form, which arises in a recent parallel algorithm proposed for obtaining the minimum norm solution of a given underdetermined system of equations. We first defined an equivalent hypergraph partitioning problem with an additional constraint tailored to the BDCO form and investigated the feasibility of its solutions. Then we proposed a recursive-bipartitioning-based partitioning

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Seher Acer received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Bilkent University, Turkey. She is currently a postdoctoral researcher at Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico, USA. Her research interests include parallel computing and combinatorial scientific computing with a focus on partitioning sparse irregular computations.

References (27)

  • ButtariA.

    Fine-grained multithreading for the multifrontal QR factorization of sparse matrices

    SIAM J. Sci. Comput.

    (2013)
  • CatalyurekU.V. et al.

    Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication

    IEEE Trans. Parallel Distrib. Syst.

    (1999)
  • ÇatalyürekU.V. et al.

    PaToH: Partitioning Tool for HypergraphsTech. Rep.

    (1999)
  • Cited by (2)

    Seher Acer received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Bilkent University, Turkey. She is currently a postdoctoral researcher at Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico, USA. Her research interests include parallel computing and combinatorial scientific computing with a focus on partitioning sparse irregular computations.

    Cevdet Aykanat received the B.S. and M.S. degrees from Middle East Technical University, Ankara, Turkey, both in Electrical Engineering, and the Ph.D. degree from Ohio State University, Columbus, in Electrical and Computer Engineering. He worked at the Intel Supercomputer Systems Division, Beaverton, Oregon, as a research associate. Since 1989, he has been affiliated with Computer Engineering Department, Bilkent University, Ankara, Turkey, where he is currently a professor. His research interests mainly include parallel computing and its combinatorial aspects. He is the recipient of the 1995 Investigator Award of The Scientific and Technological Research Council of Turkey and 2007 Parlar Science Award. He has served as an Associate Editor of IEEE Transactions of Parallel and Distributed Systems between 2008 and 2012.

    Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. The work was done when Seher Acer was with Bilkent University.

    View full text