A Family of High-Performance Matrix Multiplication Algorithms

Gunnels, John A.; Henry, Greg M.; van de Geijn, Robert A.

doi:10.1007/3-540-45545-0_15

John A. Gunnels⁵,
Greg M. Henry⁶ &
Robert A. van de Geijn⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2073))

Included in the following conference series:

International Conference on Computational Science

2994 Accesses

Abstract

During the last half-decade, a number of research efforts have centered around developing software for generating automatically tuned matrix multiplication kernels. These include the PHiPAC project and the ATLAS project. The software end-products of both projects employ brute force to search a parameter space for blockings that accommodate multiple levels of memory hierarchy. We take a different approach: using a simple model of hierarchical memories we employ mathematics to determine a locally-optimal strategy for blocking matrices. The theoretical results show that, depending on the shape of the matrices involved, different strategies are locally-optimal. Rather than determining a blocking strategy at library generation time, the theoretical results show that, ideally, one should pursue a heuristic that allows the blocking strategy to be determined dynamically at run-time as a function of the shapes of the operands. When the resulting family of algorithms is combined with a highly optimized inner-kernel for a small matrix multiplication, the approach yields performance that is superior to that of methods that automatically tune such kernels. Preliminary results, for the Intel Pentium (R) III processor, support the theoretical insights.

Download to read the full chapter text

Chapter PDF

Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral

An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512

Article 01 June 2018

Optimizing Matrix Multiplication on NERSC’s High Performance Computer Cori

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

R. C. Agarwal, F. G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM Journal of Research and Development, 38(5), Sept. 1994.
Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide-Release 2.0. SIAM, 1994.
Google Scholar
J. Bilmes, K. Asanovic, C. W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the International Conference on Supercomputing. ACM SIGARC, July 1997.
Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Soft., 16(1):1–17, March 1990.
Article MATH Google Scholar
John Gunnels, Calvin Lin, Greg Morrow, and Robert van de Geijn. A flexible class of parallel matrix multiplication algorithms. In Proceedings of First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (1998 IPPS/SPDP’ 98), pages 110–116, 1998.
Google Scholar
John A. Gunnels and Robert A. van de Geijn. Formal methods for high-performance linear algebra libraries. In Ronald F. Boisvert and Ping Tak Peter Tang, editors, The Architecture of Scientific Software. Kluwer Academic Press, 2001.
Google Scholar
F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström, and P. Ling. Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In B. Kågström et al., editor, Applied Parallel Computing, Large Scale Scientific and Industrial Problems, volume 1541 of Lecture Notes in Computer Science, pages 195–206. Springer-Verlag, 1998.
Chapter Google Scholar
F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development, 41(6):737–755, November 1997.
Article Google Scholar
Greg Henry. BLAS based on block data structures. Theory Center Technical Report CTC92TR89, Cornell University, Feb. 1992.
Google Scholar
B. Kågström, P. Ling, and C. Van Loan. GEMM-based level 3 BLAS: High performance model implementations and performance evaluation benchmark. Technical Report CS-95-315, Univ. of Tennessee, Nov. 1995.
Google Scholar
R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of SC98, Nov. 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, The University of Texas, Austin, TX, 78712
John A. Gunnels & Robert A. van de Geijn
Intel Corp., Bldg EY2-05, 5350 NE Elam Young Pkwy, Hillsboro, OR, 97124-6461
Greg M. Henry

Authors

John A. Gunnels
View author publications
You can also search for this author in PubMed Google Scholar
Greg M. Henry
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. van de Geijn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Cybernetics and Electronic Engineering, University of Reading, Whiteknights, P.O. Box 225, Reading, RG6 6AY, UK
Vassil N. Alexandrov
Innovative Computing Lab, Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, 37996-3450, USA
Jack J. Dongarra
Computer Science Department, California State University, Chico, CA, 95929-0410, USA
Benjoe A. Juliano & René S. Renner &
School of Computer Science, The Queen’s University of Belfast, Belfast, BT7 1NN, Northern Ireland, UK
C. J. Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gunnels, J.A., Henry, G.M., van de Geijn, R.A. (2001). A Family of High-Performance Matrix Multiplication Algorithms. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds) Computational Science — ICCS 2001. ICCS 2001. Lecture Notes in Computer Science, vol 2073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45545-0_15

Download citation

DOI: https://doi.org/10.1007/3-540-45545-0_15
Published: 17 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42232-7
Online ISBN: 978-3-540-45545-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics