skip to main content
10.1145/1810479.1810496acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
short-paper

Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Published: 13 June 2010 Publication History

Abstract

Previous work has shown that a lower bound on the number of words moved between large, slow memory and small, fast memory of size M by any conventional (non-Strassen like) direct linear algebra algorithm (matrix multiply, the LU, Cholesky, QR factorizations,...) is Ω(# flops / √ (M)). This holds for dense or sparse matrices. There are analogous lower bounds for the number of messages, and for parallel algorithms instead of sequential algorithms.
Our goal here is to find algorithms that attain these lower bounds on interesting classes of sparse matrices. We focus on matrices for which there is a lower bound on the number of flops of their Cholesky factorization. Our Cholesky lower bounds on communication hold for any possible ordering of the rows and columns of the matrix, and so are globally optimal in this sense. For matrices arising from discretization on two dimensional and three dimensional regular grids, we discuss sequential and parallel algorithms that are optimal in terms of communication. The algorithms turn out to require combining previously known sparse and dense Cholesky algorithms in simple ways

References

[1]
N. Ahmed and K. Pingali. Automatic generation of block-recursive codes. In Springer-Verlag, editor, Euro-Par, 2000, page. 368--378, 2000.
[2]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Communication-optimal parallel and sequential Cholesky decomposition. ACM SPAA, 2009.
[3]
G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in linear algebra. Technical Report UCB/EECS-2009-62, UC Berkeley, 2009.
[4]
J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential QR and LU factorizations. Technical Report UCB/EECS-2008-89, UC Berkeley, 2008. LAPACK Working Note 204.
[5]
S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Applications of an element model for Gaussian elimination. In J. Bunch and D. Rose, editors, Sparse Matrix Computations, pages 85--96. Academic Press, New York, 1976.
[6]
A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis}, 10:345--363, 1973.
[7]
A. George, J. W.-H. Liu, and E. G. Ng. Communication results for parallel sparse Cholesky factorization of a hypercube. Parallel Computing, 10(3):287--298, 1989.
[8]
L. Grigori, J. W. Demmel, and H. Xiang. Communication avoiding Gaussian elimination. Proceedings of the ACM/IEEE SC08 Conference, 2008.
[9]
A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5). 1995.
[10]
J.-W. Hong and H. T. Kung. I/O complexity: The Red-Blue Pebble Game. In STOC '81: Proceedings of the Thirteenth Annual ACM. Symposium on Theory of Computing, pages 326--333, New York, NY, USA, 1981. ACM.
[11]
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. Journal of Parallel and Distribed Computing}, 64(9): 1017--1026. 2004.
[12]
R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16: 346--358, 1979.

Cited By

View all
  • (2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
  • (2020)Numerical algorithms for high-performance computational sciencePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0066378:2166(20190066)Online publication date: 20-Jan-2020
  • (2013)Communication optimal parallel multiplication of sparse random matricesProceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures10.1145/2486159.2486196(222-231)Online publication date: 23-Jul-2013
  • Show More Cited By

Index Terms

  1. Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
    June 2010
    378 pages
    ISBN:9781450300797
    DOI:10.1145/1810479

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication bounds
    2. sparse Cholesky

    Qualifiers

    • Short-paper

    Conference

    SPAA 10

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Upcoming Conference

    SPAA '25
    37th ACM Symposium on Parallelism in Algorithms and Architectures
    July 28 - August 1, 2025
    Portland , OR , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Evaluating the potential of disaggregated memory systems for HPC applicationsConcurrency and Computation: Practice and Experience10.1002/cpe.814736:19Online publication date: 31-May-2024
    • (2020)Numerical algorithms for high-performance computational sciencePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences10.1098/rsta.2019.0066378:2166(20190066)Online publication date: 20-Jan-2020
    • (2013)Communication optimal parallel multiplication of sparse random matricesProceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures10.1145/2486159.2486196(222-231)Online publication date: 23-Jul-2013
    • (2013)Graph expansion and communication costs of fast matrix multiplicationJournal of the ACM10.1145/2395116.239512159:6(1-23)Online publication date: 9-Jan-2013
    • (2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389134(1-11)Online publication date: 10-Nov-2012
    • (2012)Managing data-movement for effective shared-memory parallelization of out-of-core sparse solversProceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2012.74(1-11)Online publication date: 10-Nov-2012

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media