abstract

Brief announcement: communication bounds for heterogeneous architectures

Authors:

Grey Ballard,

James Demmel,

Andrew GearhartAuthors Info & Claims

SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

Pages 257 - 258

https://doi.org/10.1145/1989493.1989531

Published: 04 June 2011 Publication History

Get Access

Abstract

As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asymptotic communication lower bounds for general memory models and classes of algorithms. Recent work has established lower bounds for a wide set of linear algebra algorithms on a sequential machine and on a parallel machine with identical processors. This work extends these previous bounds to a heterogeneous model in which processors access data and perform floating point operations at differing speeds. We also present an algorithm for dense matrix multiplication which attains the lower bound.

References

[1]

G. Ballard, J. Demmel, and A. Gearhart. Communication bounds for heterogeneous architectures. Technical report, UC Berkeley EECS-2011-13, Feb. 2011.

Google Scholar

[2]

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. Minimizing communication in linear algebra. Technical report, UC Berkeley EECS-2011-15, Feb. 2011.

Google Scholar

[3]

R. Blumofe, M. Frigo, C. Joerg, C. Leiserson, and K. Randall. DAG-consistent distributed shared memory. In IPPS '96: Proceedings of the 10th international parallel processing symposium, pages 132--141, 1996.

Digital Library

Google Scholar

[4]

J. W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proceedings of the thirteenth annual ACM symposium on theory of computing, pages 326--333, New York, NY, USA, 1981. ACM.

Digital Library

Google Scholar

[5]

D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004.

Digital Library

Google Scholar

[6]

D. Wise. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In A. Bode, T. Ludwig, W. Karl, and R. Wismüller, editors, Euro-Par 2000 Parallel Processing, volume 1900 of Lecture Notes in Computer Science, pages 774--783. Springer Berlin / Heidelberg, 2000.

Digital Library

Google Scholar

Cited By

View all

Nissim RSchwartz O(2023)Stragglers in Distributed Matrix MultiplicationJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_4(74-96)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-3-031-43943-8_4
Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Liu YShi LZhang JRobertazzi T(2019)Layer based partition for matrix multiplication on heterogeneous mesh networksProceedings of the High Performance Computing Symposium10.5555/3338075.3338079(1-12)Online publication date: 29-Apr-2019
https://dl.acm.org/doi/10.5555/3338075.3338079
Show More Cited By

Index Terms

Brief announcement: communication bounds for heterogeneous architectures
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
SPAA '12: Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

A parallel algorithm has perfect strong scaling if its running time on $P$ processors is linear in $1/P$, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently ...
Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

Previous work has shown that a lower bound on the number of words moved between large, slow memory and small, fast memory of size M by any conventional (non-Strassen like) direct linear algebra algorithm (matrix multiply, the LU, Cholesky, QR ...
Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored constant factors or not obtained the tightest possible values. The main result of this work is ...

Comments

Information & Contributors

Information

Published In

SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

June 2011

404 pages

ISBN:9781450307437

DOI:10.1145/1989493

Co-chairs:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Rajmohan Rajaraman
Northeastern University, USA

In-Cooperation

EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Abstract

Conference

SPAA '11

Sponsor:

SPAA '11: 23rd ACM Symposium on Parallelism in Algorithms and Architectures

June 4 - 6, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Nissim RSchwartz O(2023)Stragglers in Distributed Matrix MultiplicationJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_4(74-96)Online publication date: 15-Sep-2023
https://doi.org/10.1007/978-3-031-43943-8_4
Tang YGao W(2021)Processor-Aware Cache-Oblivious Algorithms✱Proceedings of the 50th International Conference on Parallel Processing10.1145/3472456.3472506(1-10)Online publication date: 9-Aug-2021
https://dl.acm.org/doi/10.1145/3472456.3472506
Liu YShi LZhang JRobertazzi T(2019)Layer based partition for matrix multiplication on heterogeneous mesh networksProceedings of the High Performance Computing Symposium10.5555/3338075.3338079(1-12)Online publication date: 29-Apr-2019
https://dl.acm.org/doi/10.5555/3338075.3338079
Ballard GDemmel JHoltz OSchwartz O(2013)Graph expansion and communication costs of fast matrix multiplicationJournal of the ACM10.1145/2395116.239512159:6(1-23)Online publication date: 9-Jan-2013
https://dl.acm.org/doi/10.1145/2395116.2395121
Demmel JGearhart ALipshitz BSchwartz O(2013)Perfect Strong Scaling Using No Additional EnergyProceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2013.32(649-660)Online publication date: 20-May-2013
https://dl.acm.org/doi/10.1109/IPDPS.2013.32
Ballard GDemmel JHoltz OSchwartz OMeyer auf der Heide FRajaraman R(2011)Graph expansion and communication costs of fast matrix multiplicationProceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures10.1145/1989493.1989495(1-12)Online publication date: 4-Jun-2011
https://dl.acm.org/doi/10.1145/1989493.1989495

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds

Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem

Brief Announcement: Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations