skip to main content
10.1145/1006209.1006230acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Multilevel hierarchical matrix multiplication on clusters

Published: 26 June 2004 Publication History

Abstract

Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and scalability of different versions on clusters. In this paper we present parallel algorithms for matrix-matrix multiplication which are built up from several algorithms in a multilevel structure. Each level is associated with a hierarchical partition of the set of available processors into disjoint subsets so that deeper levels of the algorithm employ smaller groups of processors in parallel. We perform runtime experiments on several parallel platforms and show that multilevel algorithms can lead to significant performance gains compared with state-of-the-art methods.

References

[1]
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI c coding methodology. In Proc. of the International Conference on Supercomputing -- ICS'97, pages 340--347, 1997.
[2]
C.-C. Chou, Y.-F. Deng, G. Li, and Y. Wang. Parallelizing Strassen's Method for Matrix Multiplication on Distributed-Memory MIMD Architectures. Computers and Mathematics with Applications, 30(2):49--69, 1995.
[3]
F. Desprez and F. Suter. Impact of Mixed-Parallelism on Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms. Technical Report RR2002-24, Laboratoire de l'Informatique du Parallélisme (LIP), June 2002. Also INRIA Research Report RR-4482.
[4]
B. Dumitrescu, J.-L. Roch, and D. Trystram. Fast matrix multiplications algorithms on MIMD architectures. Parallel Algorithms and Applications, 4(2):53--70, 1994.
[5]
G. Golub and C. Van Loan. Matrix Computations.The John Hopkins University Press, 1989.
[6]
B. Grayson, A. Shah, and R. van de Geijn. A High Performance Parallel Strassen Implementation. Technical Report CS-TR-95-24, Department of Computer Sciences, The Unversity of Texas, 1, 1995.
[7]
S. Hunold, T. Rauber,and G. Rünger. Hierarchical Matrix-Matrix Multiplication based on Multiprocessor Tasks. In Proc.of theInternationalConference on Computational Science -- ICCS 2004, LNCS. Springer, June 2004.
[8]
Q.LuoandJ.B.Drake.AScalableParallel Strassen's Matrix Multiplication Algorithm for Distributed-Memory Computers. In Proceedings of the 1995 ACM Symposium on Applied Computing, pages 221--226. ACM Press, 1995.
[9]
T. Rauber and G. Rünger. Library Support for Hierarchical Multi-Processor Tasks. In Proc. of the Supercomputing 2002, Baltimore, USA, 2002.
[10]
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13:354--356, 1969.
[11]
R. A. van de Geijn and J. Watts. SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience, 9(4):255--274, 1997.
[12]
R. C. Whaley and J. J. Dongarra. Automatically Tuned Linear Algebra Software. Technical Report UT-CS-97-366, University of Tennessee, 1997.

Cited By

View all
  • (2018)Utility Maximization of Cloud-Based In-Car Video Recording Over Vehicular Access NetworksIEEE Internet of Things Journal10.1109/JIOT.2018.28441695:6(5213-5226)Online publication date: Dec-2018
  • (2016)A high-performance matrix---matrix multiplication methodology for CPU and GPU architecturesThe Journal of Supercomputing10.1007/s11227-015-1613-772:3(804-844)Online publication date: 1-Mar-2016
  • (2014)A Matrix---Matrix Multiplication methodology for single/multi-core architectures using SIMDThe Journal of Supercomputing10.1007/s11227-014-1098-968:3(1418-1440)Online publication date: 1-Jun-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '04: Proceedings of the 18th annual international conference on Supercomputing
June 2004
360 pages
ISBN:1581138393
DOI:10.1145/1006209
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. matrix multiplication
  2. multiprocessor tasks
  3. strassen's algorithm
  4. task parallelism

Qualifiers

  • Article

Conference

ICS04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Utility Maximization of Cloud-Based In-Car Video Recording Over Vehicular Access NetworksIEEE Internet of Things Journal10.1109/JIOT.2018.28441695:6(5213-5226)Online publication date: Dec-2018
  • (2016)A high-performance matrix---matrix multiplication methodology for CPU and GPU architecturesThe Journal of Supercomputing10.1007/s11227-015-1613-772:3(804-844)Online publication date: 1-Mar-2016
  • (2014)A Matrix---Matrix Multiplication methodology for single/multi-core architectures using SIMDThe Journal of Supercomputing10.1007/s11227-014-1098-968:3(1418-1440)Online publication date: 1-Jun-2014
  • (2012)Comparison on efficiency of computational efforts between cluster computation (MapReduce) and single host computation2012 International Conference on Cloud Computing and Social Networking (ICCCSN)10.1109/ICCCSN.2012.6215743(1-5)Online publication date: Apr-2012
  • (2009)Adaptive approaches for efficient parallel algorithms on cluster-based systemsInternational Journal of Grid and Utility Computing10.1504/IJGUC.2009.0220261:2(98-108)Online publication date: 1-Dec-2009
  • (2009)Towards Scalable Parallel Numerical Algorithms and Dynamic Load Balancing StrategiesHigh Performance Computing in Science and Engineering, Garching/Munich 200710.1007/978-3-540-69182-2_40(503-516)Online publication date: 2009
  • (2007)Optimal solution to matrix parenthesization problem employing parallel processing approachProceedings of the 8th Conference on 8th WSEAS International Conference on Evolutionary Computing - Volume 810.5555/1347992.1347994(235-240)Online publication date: 19-Jun-2007
  • (2007)Mixed task and data parallel executions in general linear methodsScientific Programming10.1155/2007/68319815:3(137-155)Online publication date: 1-Aug-2007
  • (2007)Dedicated architecture for double precision matrix multiplication in supercomputing environment2007 IEEE Design and Diagnostics of Electronic Circuits and Systems10.1109/DDECS.2007.4295303(1-4)Online publication date: Apr-2007
  • (2006)Anticipated distributed task scheduling for grid environmentsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898883(337-337)Online publication date: 25-Apr-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media