research-article

Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout

Authors:

Benjamin Lipshitz,

Sivan ToledoAuthors Info & Claims

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Pages 232 - 240

https://doi.org/10.1145/2486159.2486198

Published: 23 July 2013 Publication History

Abstract

High performance for numerical linear algebra often comes at the expense of stability. Computing the LU decomposition of a matrix via Gaussian Elimination can be organized so that the computation involves regular and efficient data access. However, maintaining numerical stability via partial pivoting involves row interchanges that lead to inefficient data access patterns. To optimize communication efficiency throughout the memory hierarchy we confront two seemingly contradictory requirements: partial pivoting is efficient with column-major layout, whereas a block-recursive layout is optimal for the rest of the computation. We resolve this by introducing a shape morphing procedure that dynamically matches the layout to the computation throughout the algorithm, and show that Gaussian Elimination with partial pivoting can be performed in a communication efficient and cache-oblivious way. Our technique extends to QR decomposition, where computing Householder vectors prefers a different data layout than the rest of the computation.

References

[1]

N. Ahmed and K. Pingali. Automatic generation of block-recursive codes. In Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, pages 368--378, London, UK, 2000. Springer-Verlag.

Digital Library

[2]

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK's user's guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1992. Also available from http://www.netlib.org/lapack/.

Digital Library

[3]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. In SPAA '09: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, pages 245--252, New York, NY, USA, 2009. ACM.

Digital Library

[4]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. SIAM Journal on Scientific Computing, 32(6):3495--3523, 2010.

Digital Library

[5]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in numerical linear algebra. SIAM J. Matrix Analysis Applications, 32(3):866--901, 2011.

[6]

J. Demmel. LAPACK Working Note 53: Trading off parallelism and numerical stability. Technical report, University of Tennessee, Knoxville, TN, USA, 1992.

Digital Library

[7]

J. Demmel, I. Dumitriu, and O. Holtz. Fast linear algebra is stable. Numerische Mathematik, 108(1):59--91, 2007.

Digital Library

[8]

E. Elmroth and F. Gustavson. New serial and parallel recursive QR factorization algorithms for SMP systems. Applied Parallel Computing Large Scale Scientific and Industrial Problems, pages 120--128, 1998.

Digital Library

[9]

E. Elmroth, F. Gustavson, I. Jonsson, and B. Kågström. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review, 46(1):3--45, 2004.

[10]

J. Frens and D. Wise. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code. In In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 206--216, 1997.

Digital Library

[11]

J. Frens and D. Wise. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism. SIGPLAN Not., 38(10):144--154, 2003.

Digital Library

[12]

M. Frigo, C. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pages 285--297, Washington, DC, USA, 1999. IEEE Computer Society.

Digital Library

[13]

L. Grigori, J. Demmel, and H. Xiang. CALU: A communication optimal LU factorization algorithm. SIAM Journal on Matrix Analysis and Applications, 32(4):1317--1350, 2011.

Digital Library

[14]

F. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Dev., 41(6):737--756, 1997.

Digital Library

[15]

F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Recursive blocked data formats and BLAS's for dense linear algebra algorithms. Applied Parallel Computing Large Scale Scientific and Industrial Problems, pages 195--206, 1998.

Digital Library

[16]

J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the thirteenth annual ACM symposium on theory of computing, pages 326--333. ACM, 1981.

Digital Library

[17]

D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.

Digital Library

[18]

A. Khabou, J. Demmel, L. Grigori, and M. Gu. LU factorization with panel rank revealing pivoting and its communication avoiding version. Technical Report UCB/EECS-2012-15, EECS Department, University of California, Berkeley, Jan 2012.

[19]

G. Morton. A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. International Business Machines Company, 1966.

[20]

E. Solomonik and J. Demmel. Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par'11: Proceedings of the 17th International European Conference on Parallel and Distributed Computing. Springer, 2011.

Digital Library

[21]

S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl., 18(4):1065--1081, 1997.

Digital Library

Cited By

Livingston KLandwehr AMonsalve JZuckerman SMeister BGao G(2017)Energy Avoiding Matrix MultiplyLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_5(55-70)Online publication date: 24-Jan-2017
https://doi.org/10.1007/978-3-319-52709-3_5
Grigori L(2017)Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear AlgebraComputational Mathematics, Numerical Analysis and Applications10.1007/978-3-319-49631-3_4(153-185)Online publication date: 5-Aug-2017
https://doi.org/10.1007/978-3-319-49631-3_4
Ballard GDemmel JKnight N(2015)Avoiding Communication in Successive Band ReductionACM Transactions on Parallel Computing10.1145/26868771:2(1-37)Online publication date: 18-Feb-2015
https://dl.acm.org/doi/10.1145/2686877
Show More Cited By

Index Terms

Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

An Implementation of Gaussian Elimination with Partial Pivoting for Sparse Systems

In this paper, we consider the problem of solving a sparse nonsingular system of linear equations. We show that the structures of the triangular matrices obtained in the $LU$-decomposition of a sparse nonsingular matrix A using Gaussian elimination with ...
Symbolic Factorization for Sparse Gaussian Elimination with Partial Pivoting

Let $Ax = b$ be a large sparse nonsingular system of linear equations to be solved using Gaussian elimination with partial pivoting. The factorization obtained can be expressed in the form $A = P_1 M_1 P_2 M_2 \cdots P_{n - 1} M_{n - 1} U$, where $P_k $...
Pivoting techniques for symmetric Gaussian elimination

In order to factorize an indefinite symmetric matrix G of the form G=LDL ^T where L is a trivially invertible matrix and D is a diagonal matrix, we introduce a new kind of pivoting operation. The algorithm suggested maintains the stability and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

July 2013

348 pages

ISBN:9781450315722

DOI:10.1145/2486159

General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Berthold Vöcking
RWTH Aachen University, Germany

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA '13

Sponsor:

SPAA '13: 25th ACM Symposium on Parallelism in Algorithms and Architectures

July 23 - 25, 2013

Québec, Montréal, Canada

Acceptance Rates

SPAA '13 Paper Acceptance Rate 31 of 130 submissions, 24%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
162
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Livingston KLandwehr AMonsalve JZuckerman SMeister BGao G(2017)Energy Avoiding Matrix MultiplyLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_5(55-70)Online publication date: 24-Jan-2017
https://doi.org/10.1007/978-3-319-52709-3_5
Grigori L(2017)Introduction to Communication Avoiding Algorithms for Direct Methods of Factorization in Linear AlgebraComputational Mathematics, Numerical Analysis and Applications10.1007/978-3-319-49631-3_4(153-185)Online publication date: 5-Aug-2017
https://doi.org/10.1007/978-3-319-49631-3_4
Ballard GDemmel JKnight N(2015)Avoiding Communication in Successive Band ReductionACM Transactions on Parallel Computing10.1145/26868771:2(1-37)Online publication date: 18-Feb-2015
https://dl.acm.org/doi/10.1145/2686877
Vishkin U(2014)Is multicore hardware for general-purpose parallel processing broken?Communications of the ACM10.1145/258094557:4(35-39)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2580945

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten