short-paper

Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Authors:

Laura Grigori,

Pierre-Yves David,

James W. Demmel,

Sylvain PeyronnetAuthors Info & Claims

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

Pages 79 - 81

https://doi.org/10.1145/1810479.1810496

Published: 13 June 2010 Publication History

Get Access

Abstract

Previous work has shown that a lower bound on the number of words moved between large, slow memory and small, fast memory of size M by any conventional (non-Strassen like) direct linear algebra algorithm (matrix multiply, the LU, Cholesky, QR factorizations,...) is Ω(# flops / √ (M)). This holds for dense or sparse matrices. There are analogous lower bounds for the number of messages, and for parallel algorithms instead of sequential algorithms.

Our goal here is to find algorithms that attain these lower bounds on interesting classes of sparse matrices. We focus on matrices for which there is a lower bound on the number of flops of their Cholesky factorization. Our Cholesky lower bounds on communication hold for any possible ordering of the rows and columns of the matrix, and so are globally optimal in this sense. For matrices arising from discretization on two dimensional and three dimensional regular grids, we discuss sequential and parallel algorithms that are optimal in terms of communication. The algorithms turn out to require combining previously known sparse and dense Cholesky algorithms in simple ways

References

[1]

N. Ahmed and K. Pingali. Automatic generation of block-recursive codes. In Springer-Verlag, editor, Euro-Par, 2000, page. 368--378, 2000.

Digital Library

Google Scholar

[2]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. ACM SPAA, 2009.

Digital Library

Google Scholar

[3]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in linear algebra. Technical Report UCB/EECS-2009-62, UC Berkeley, 2009.

Google Scholar

[4]

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. Technical Report UCB/EECS-2008-89, UC Berkeley, 2008. LAPACK Working Note 204.

Google Scholar

[5]

S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Applications of an element model for Gaussian elimination. In J. Bunch and D. Rose, editors, Sparse Matrix Computations, pages 85--96. Academic Press, New York, 1976.

Crossref

Google Scholar

[6]

A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis}, 10:345--363, 1973.

Google Scholar

[7]

A. George, J. W.-H. Liu, and E. G. Ng. Communication results for parallel sparse Cholesky factorization of a hypercube. Parallel Computing, 10(3):287--298, 1989.

Crossref

Google Scholar

[8]

L. Grigori, J. W. Demmel, and H. Xiang. Communication avoiding Gaussian elimination. Proceedings of the ACM/IEEE SC08 Conference, 2008.

Digital Library

Google Scholar

[9]

A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5). 1995.

Digital Library

Google Scholar

[10]

J.-W. Hong and H. T. Kung. I/O complexity: The Red-Blue Pebble Game. In STOC '81: Proceedings of the Thirteenth Annual ACM. Symposium on Theory of Computing, pages 326--333, New York, NY, USA, 1981. ACM.

Digital Library

Google Scholar

[11]

D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distribed Computing}, 64(9): 1017--1026. 2004.

Digital Library

Google Scholar

[12]

R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16: 346--358, 1979.

Crossref

Google Scholar

Cited By

View all

Ding NMaris PNam HGroves TAwan MLindsey LDaley CSelvitopi OOliker LWright NWilliams S(2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
https://doi.org/10.1002/cpe.8147
Dongarra JGrigori LHigham N(2020)Numerical algorithms for high-performance computational sciencePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0066378:2166(20190066)Online publication date: 20-Jan-2020
https://doi.org/10.1098/rsta.2019.0066
Ballard GBuluc ADemmel JGrigori LLipshitz BSchwartz OToledo SBlelloch GVöcking B(2013)Communication optimal parallel multiplication of sparse random matricesProceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures10.1145/2486159.2486196(222-231)Online publication date: 23-Jul-2013
https://dl.acm.org/doi/10.1145/2486159.2486196
Show More Cited By

Index Terms

Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible values. The main result of this work is ...
Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently ...
Brief Announcement: Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity
SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures

In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse n x n matrices, using n computers. They assumed that the input matrices are uniformly sparse---there are at most d non-zeros in each row and column---and the ...

Comments

Information & Contributors

Information

Published In

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

June 2010

378 pages

ISBN:9781450300797

DOI:10.1145/1810479

General Chairs:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Cynthia Phillips
Sandia National Laboratories, USA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SPAA 10

Sponsor:

SPAA 10: 22nd ACM Symposium on Parallelism in Algorithms and Architectures

June 13 - 15, 2010

Santorini, Thira, Greece

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
123
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ding NMaris PNam HGroves TAwan MLindsey LDaley CSelvitopi OOliker LWright NWilliams S(2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
https://doi.org/10.1002/cpe.8147
Dongarra JGrigori LHigham N(2020)Numerical algorithms for high-performance computational sciencePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0066378:2166(20190066)Online publication date: 20-Jan-2020
https://doi.org/10.1098/rsta.2019.0066
Ballard GBuluc ADemmel JGrigori LLipshitz BSchwartz OToledo SBlelloch GVöcking B(2013)Communication optimal parallel multiplication of sparse random matricesProceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures10.1145/2486159.2486196(222-231)Online publication date: 23-Jul-2013
https://dl.acm.org/doi/10.1145/2486159.2486196
Ballard GDemmel JHoltz OSchwartz O(2013)Graph expansion and communication costs of fast matrix multiplicationJournal of the ACM10.1145/2395116.239512159:6(1-23)Online publication date: 9-Jan-2013
https://dl.acm.org/doi/10.1145/2395116.2395121
Avron HGupta AHollingsworth J(2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389134(1-11)Online publication date: 10-Nov-2012
https://dl.acm.org/doi/10.5555/2388996.2389134
Avron HGupta A(2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.74(1-11)Online publication date: 10-Nov-2012
https://dl.acm.org/doi/10.1109/SC.2012.74

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Brief Announcement: Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations