Reordering sparse matrices into block-diagonal column-overlapped form☆
Introduction
Many scientific and engineering applications [6], [12], [19], [22], [25], [27] require computing the minimum norm solution of an underdetermined system of equations of the form where is an sparse matrix and [18]. A common approach for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of a QR factorization in a direct method. Various packages such as SuiteSparseQR [13], [14], HSL MA49 [2], and qr_mumps [7] provide efficient parallel and sequential implementations for the general sparse QR algorithm.
One recent approach [24] for obtaining the minimum 2-norm solution of an underdetermined linear least squares problem is the use of the Balance scheme [17], [21], [23], which is an effective and efficient parallel algorithm originally proposed for an ill-conditioned banded linear system of equations. This parallel algorithm [24] can also be considered as an extension of the general sparse QR factorization to distributed-memory systems for obtaining the minimum 2-norm solution. In this parallel algorithm, the coefficient matrix is assumed to be in block-diagonal column-overlapped (BDCO) form with diagonal blocks, which is shown in Fig. 1. As seen in the figure, the th diagonal block of the BDCO form is given by for , whereas the first and last diagonal blocks and are respectively given by Here, successive diagonal blocks and overlap along the columns of their submatrices and . Columns of the overlapping submatrices are referred to as coupling columns.
The parallel algorithm [24] exploits the BDCO form so that each of processors independently solves a small linear least squares problem of the form where This step is performed in parallel without any communication. For to be a solution, and should be equal for . This is ensured by a sequential step in which a small system called reduced system is solved. The size of the reduced system is determined by the total overlap size, i.e., the total number of coupling columns. When compared to the state-of-the-art parallel QR implementations, this approach is reported to achieve better scalability on both shared- and distributed-memory architectures [24].
In this paper, our aim is to find two permutations and such that is in a -way BDCO form. We refer to this permutation problem as the BDCO problem. In the BDCO problem, the objective is to minimize the total overlap size in since the performance of the above-mentioned parallel algorithm considerably deteriorates with increasing total overlap size, as reported in [24]. The constraint of the BDCO problem is to maintain balance on the number of nonzeros in the diagonal blocks of to ensure balanced workload for processors.
To the best of our knowledge, literature lacks an algorithm for solving the BDCO problem. However, there are efforts for solving problems that resemble the BDCO problem. In [5] and [26], the problems of reordering sparse matrices into singly-bordered block diagonal (SBBD) form and separated block diagonal (SBD) form were addressed, respectively. In the SBBD form, all coupling rows are reordered to a border at the bottom or all coupling columns are reordered to a border on the right. In the SBD form, coupling rows are reordered in between two parts as they are encountered during the recursive bipartitioning steps. The SBBD and SBD forms differ from the BDCO form as they allow coupling rows/columns to span more than two diagonal blocks, which are not necessarily consecutive. In [1], the problem of reordering sparse square matrices into block diagonal form with overlap (BDO form) was addressed. The differences between the BDCO and BDO forms are two-fold: (i) the BDO form is obtained on structurally symmetric matrices via symmetric row/column permutation, whereas the BDCO form is obtained on non-square matrices via row and column permutations that are different from each other, (ii) consecutive diagonal blocks overlap along both rows and columns in the BDO form, whereas they overlap along only columns in the BDCO form.
In order to solve the BDCO problem, we propose the HP algorithm, which utilizes the column-net hypergraph model [8] of the given coefficient matrix . First, we investigate the feasibility of the -way BDCO permutation for matrix by considering the vertex-to-vertex adjacency topology of the column-net hypergraph of . If the respective permutation is feasible, then the corresponding hypergraph is partitioned by the proposed HP algorithm so that the partitioning objective corresponds to minimizing the total overlap size, whereas the partitioning constraint corresponds to maintaining balance on the number of nonzeros of the diagonal blocks. The HP algorithm utilizes fixed vertices within the recursive bipartitioning paradigm in order to ensure that only the successive diagonal blocks may share columns. The proposed algorithm is flexible in the sense that any hypergraph partitioning tool that supports fixed vertices can be used in the implementation. The effectiveness of the proposed HP algorithm is validated by reordering a wide range of matrices into BDCO form with small overlap size and balanced diagonal blocks as well as by running the parallel code [24] on the reordered matrices.
The rest of the paper is organized as follows. Section 2 gives background on hypergraph partitioning. The proposed HP algorithm is given in Section 3. Section 4 gives experimental results and Section 5 concludes.
Section snippets
Hypergraphs
A hypergraph is defined as a set of vertices and a set of nets. Each net connects a subset of vertices in , which is denoted by . We use phrase “pins of ” to refer to the vertices connected by net . In a dual manner, each vertex is connected by a subset of nets in , which is denoted by .
We adapt the following graph terminology to hypergraphs. In a hypergraph, two vertices are said to be adjacent if they are connected by at least one common net. Let
Proposed model
Suppose that we are given an sparse matrix and an integer . For reordering into -way BDCO form, we propose a hypergraph partitioning algorithm, HP, which utilizes the column-net hypergraph model [8]. Let denote the column-net hypergraph of with vertex and net sets Vertex represents row of for , whereas net represents column of for . Net connects vertices that represent the rows with a nonzero entry in
Experiments
Recall that the HP algorithm is the first algorithm in the literature that addresses the BDCO problem. For evaluating the performance of HP, we consider a baseline method that first permutes the given matrix into a banded form and then applies block partitioning on it to obtain diagonal blocks with column overlap. The details of this baseline algorithm, which we refer to as the RCM algorithm, are described in Section 4.1.
We performed two different sets of experiments. The first
Conclusion
In this paper, we target the problem of reordering a given sparse matrix into block-diagonal column-overlapped (BDCO) form, which arises in a recent parallel algorithm proposed for obtaining the minimum norm solution of a given underdetermined system of equations. We first defined an equivalent hypergraph partitioning problem with an additional constraint tailored to the BDCO form and investigated the feasibility of its solutions. Then we proposed a recursive-bipartitioning-based partitioning
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Seher Acer received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Bilkent University, Turkey. She is currently a postdoctoral researcher at Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico, USA. Her research interests include parallel computing and combinatorial scientific computing with a focus on partitioning sparse irregular computations.
References (27)
- et al.
Adaptive decomposition and remapping algorithms for object-space-parallel direct volume rendering of unstructured grids
J. Parallel Distrib. Comput.
(2007) - et al.
Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices
J. Parallel Distrib. Comput.
(2008) - et al.
A repartitioning hypergraph model for dynamic load balancing
J. Parallel Distrib. Comput.
(2009) - et al.
Parallel algorithms for indefinite linear systems
Parallel Comput.
(2002) - et al.
Parallel finite element computations in fluid mechanics
Comput. Methods Appl. Mech. Engrg.
(2006) - et al.
- et al.
A recursive bipartitioning algorithm for permuting sparse square matrices into block diagonal form with overlap
SIAM J. Sci. Comput.
(2013) - et al.
Multifrontal QR factorization in a multiprocessor environment
Numer. Linear Algebra Appl.
(1996) - et al.
Permuting sparse rectangular matrices into block-diagonal form
SIAM J. Sci. Comput.
(2004) - et al.
From sparse solutions of systems of equations to sparse modeling of signals and images
SIAM Rev.
(2009)
Fine-grained multithreading for the multifrontal QR factorization of sparse matrices
SIAM J. Sci. Comput.
Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
IEEE Trans. Parallel Distrib. Syst.
PaToH: Partitioning Tool for HypergraphsTech. Rep.
Cited by (2)
Aggregation of clans to speed-up solving linear systems on parallel architectures
2022, International Journal of Parallel, Emergent and Distributed SystemsThe Effect of Various Sparsity Structures on Parallelism and Algorithms to Reveal Those Structures
2020, Modeling and Simulation in Science, Engineering and Technology
Seher Acer received her B.S., M.S. and Ph.D. degrees in Computer Engineering from Bilkent University, Turkey. She is currently a postdoctoral researcher at Center for Computing Research, Sandia National Laboratories, Albuquerque, New Mexico, USA. Her research interests include parallel computing and combinatorial scientific computing with a focus on partitioning sparse irregular computations.
Cevdet Aykanat received the B.S. and M.S. degrees from Middle East Technical University, Ankara, Turkey, both in Electrical Engineering, and the Ph.D. degree from Ohio State University, Columbus, in Electrical and Computer Engineering. He worked at the Intel Supercomputer Systems Division, Beaverton, Oregon, as a research associate. Since 1989, he has been affiliated with Computer Engineering Department, Bilkent University, Ankara, Turkey, where he is currently a professor. His research interests mainly include parallel computing and its combinatorial aspects. He is the recipient of the 1995 Investigator Award of The Scientific and Technological Research Council of Turkey and 2007 Parlar Science Award. He has served as an Associate Editor of IEEE Transactions of Parallel and Distributed Systems between 2008 and 2012.
- ☆
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. The work was done when Seher Acer was with Bilkent University.