Abstract
A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connectivity. The matrices involved are distributed with non-scattered blocked data distribution. The algorithm consists of two main parts, alignment and data movement of subarrays involved in the operation and a distributed blocked matrix multiplication algorithm on (sub)matrices using only a square submesh. Our general approach makes it possible to perform GEMM operations on non-overlapping submeshes simultaneously.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, 1992.
V. Cherkassky and R. Smith. Efficient mapping and implementation of matrix algorithms on a hypercube. Journal of Supercomputing, 2(1):7–27, 1988.
J. Choi, J. J. Dongarra, and D. W. Walker. Level 3 BLAS for distributed memory concurrent computers. In CNRS-NSF Workshop on Environments and Tools for Parallel Scientific Computing (Saint Hilaire du Touvet, France, September 7–8, 1992). Elsevier Science Publishers, 1992.
J. Choi, J. J. Dongarra, and D. W. Walker. PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers. Technical Report ORNL/TM-12252, Oak Ridge National Laboratory, Oak Ridge, TN, April 1993.
E. Dekel, D. Nassimi, and S. Sahni. Parallel matrix and graph algorithms. SIAM Journal of Computing, 10(4):657–675, November 1981.
J. Dongarra, J. Du Croz, I. Duff, and S. Hammarling. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Software, 18(1):1–17, 1990.
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, 1988.
G. A. Geist, A. Beguelin, Dongarra J. J., R. Manchek, and V. Sunderam. PVM 3.0 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, Oak Ridge, TN, February 1993.
G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley. A Users' Guide to PICL: A portable instrumented communication library. Technical Report ORNL/TM-11616, Oak Ridge National Laboratory, Oak Ridge, TN, September 1990.
S. Huss-Lederman, E. M. Jacobson, and G. Tsao, A. Zhang. Matrix multiplication on the Intel Touchstone Delta. Technical Report SRC-TR-93-101 (Revised), Supercomputing Research Center, Bowie, MD, February 1994.
B. Kågström, P. Ling, and C. Van Loan. High Performance GEMM-Based Level 3 BLAS: Sample Routines for Double Precision Real Data. In M. Durand and F. El Dabaghi, editors, High Performance Computing II, pages 269–281, Amsterdam, 1991. North-Holland.
B. Kågström, P. Ling, and C. Van Loan. Portable High Performance GEMM-Based Level 3 BLAS. In Richard F. et al Sincovec, editor, Parallel Processing for Scientific Computing, pages 339–346, Philadelphia, 1993. SIAM Publications.
M. Rännar. A Distributed, Portable and General GEMM Operation for a 2D Mesh Processor Network. Report UMINF-95.xx, Department of Computing Science, Umeå University, S-901 87 Umeå, Sweden, 1995.
R. van de Geijn and J. Watts. SUMMA: Scalable universal matrix multiplication algorithm. Technical Report UT CS-95-286, LAPACK Working Note # 96, University of Tennessee, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kågström, B., Rännar, M. (1996). Distributed general matrix multiply and add for a 2D mesh processor network. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science. PARA 1995. Lecture Notes in Computer Science, vol 1041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60902-4_36
Download citation
DOI: https://doi.org/10.1007/3-540-60902-4_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60902-5
Online ISBN: 978-3-540-49670-0
eBook Packages: Springer Book Archive