Distributed general matrix multiply and add for a 2D mesh processor network

Kågström, Bo; Rännar, Mikael

doi:10.1007/3-540-60902-4_36

Distributed general matrix multiply and add for a 2D mesh processor network

Bo Kågström¹ &
Mikael Rännar¹

Conference paper
First Online: 01 January 2005

204 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1041))

Abstract

A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connectivity. The matrices involved are distributed with non-scattered blocked data distribution. The algorithm consists of two main parts, alignment and data movement of subarrays involved in the operation and a distributed blocked matrix multiplication algorithm on (sub)matrices using only a square submesh. Our general approach makes it possible to perform GEMM operations on non-overlapping submeshes simultaneously.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, 1992.
Google Scholar
V. Cherkassky and R. Smith. Efficient mapping and implementation of matrix algorithms on a hypercube. Journal of Supercomputing, 2(1):7–27, 1988.
Google Scholar
J. Choi, J. J. Dongarra, and D. W. Walker. Level 3 BLAS for distributed memory concurrent computers. In CNRS-NSF Workshop on Environments and Tools for Parallel Scientific Computing (Saint Hilaire du Touvet, France, September 7–8, 1992). Elsevier Science Publishers, 1992.
Google Scholar
J. Choi, J. J. Dongarra, and D. W. Walker. PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers. Technical Report ORNL/TM-12252, Oak Ridge National Laboratory, Oak Ridge, TN, April 1993.
Google Scholar
E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal of Computing, 10(4):657–675, November 1981.
Google Scholar
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 18(1):1–17, 1990.
Google Scholar
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, 1988.
Google Scholar
G. A. Geist, A. Beguelin, Dongarra J. J., R. Manchek, and V. Sunderam. PVM 3.0 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, Oak Ridge, TN, February 1993.
Google Scholar
G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley. A Users' Guide to PICL: A portable instrumented communication library. Technical Report ORNL/TM-11616, Oak Ridge National Laboratory, Oak Ridge, TN, September 1990.
Google Scholar
S. Huss-Lederman, E. M. Jacobson, and G. Tsao, A. Zhang. Matrix multiplication on the Intel Touchstone Delta. Technical Report SRC-TR-93-101 (Revised), Supercomputing Research Center, Bowie, MD, February 1994.
Google Scholar
B. Kågström, P. Ling, and C. Van Loan. High Performance GEMM-Based Level 3 BLAS: Sample Routines for Double Precision Real Data. In M. Durand and F. El Dabaghi, editors, High Performance Computing II, pages 269–281, Amsterdam, 1991. North-Holland.
Google Scholar
B. Kågström, P. Ling, and C. Van Loan. Portable High Performance GEMM-Based Level 3 BLAS. In Richard F. et al Sincovec, editor, Parallel Processing for Scientific Computing, pages 339–346, Philadelphia, 1993. SIAM Publications.
Google Scholar
M. Rännar. A Distributed, Portable and General GEMM Operation for a 2D Mesh Processor Network. Report UMINF-95.xx, Department of Computing Science, Umeå University, S-901 87 Umeå, Sweden, 1995.
Google Scholar
R. van de Geijn and J. Watts. SUMMA: Scalable universal matrix multiplication algorithm. Technical Report UT CS-95-286, LAPACK Working Note # 96, University of Tennessee, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Umeå, S-901 87, Umeå, Sweden
Bo Kågström & Mikael Rännar

Authors

Bo Kågström
View author publications
You can also search for this author in PubMed Google Scholar
Mikael Rännar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jack Dongarra Kaj Madsen Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kågström, B., Rännar, M. (1996). Distributed general matrix multiply and add for a 2D mesh processor network. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science. PARA 1995. Lecture Notes in Computer Science, vol 1041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60902-4_36

Download citation

DOI: https://doi.org/10.1007/3-540-60902-4_36
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60902-5
Online ISBN: 978-3-540-49670-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics