A data locality methodology for matrix–matrix multiplication algorithm

Alachiotis, Nicolaos; Kelefouras, Vasileios I.; Athanasiou, George S.; Michail, Harris E.; Kritikakou, Angeliki S.; Goutis, Costas E.

doi:10.1007/s11227-010-0474-3

A data locality methodology for matrix–matrix multiplication algorithm

Published: 07 September 2010

Volume 59, pages 830–851, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Nicolaos Alachiotis¹,
Vasileios I. Kelefouras¹,
George S. Athanasiou¹,
Harris E. Michail¹,
Angeliki S. Kritikakou¹ &
…
Costas E. Goutis¹

252 Accesses
Explore all metrics

Abstract

Matrix-Matrix Multiplication (MMM) is a highly important kernel in linear algebra algorithms and the performance of its implementations depends on the memory utilization and data locality. There are MMM algorithms, such as standard, Strassen–Winograd variant, and many recursive array layouts, such as Z-Morton or U-Morton. However, their data locality is lower than that of the proposed methodology. Moreover, several SOA (state of the art) self-tuning libraries exist, such as ATLAS for MMM algorithm, which tests many MMM implementations. During the installation of ATLAS, on the one hand an extremely complex empirical tuning step is required, and on the other hand a large number of compiler options are used, both of which are not included in the scope of this paper. In this paper, a new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. Compared with (Chatterjee et al. in IEEE Trans. Parallel Distrib. Syst. 13:1105, 2002; Li and Garzaran in Proc. of Lang. Compil. Parallel Comput., 2005; Bilmes et al. in Proc. of the 11th ACM Int. Conf. Super-comput., 1997; Aberdeen and Baxter in Concurr. Comput. Pract. Exp. 13:103, 2001), the proposed methodology has two major advantages. Firstly, the scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Secondly, its exploration time is short, because it searches only for the number of the level of tiling used, and between (1, 2) (Sect. 4) for finding the best tile size for each cache level. A software tool (C-code) implementing the above methodology was developed, having the hardware model and the matrix sizes as input. This methodology has better performance against others at a wide range of architectures. Compared with the best existing related work, which we implemented, better performance up to 55% than the Standard MMM algorithm and up to 35% than Strassen’s is observed, both under recursive data array layouts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mathematical Approach to the Performance Evaluation of Matrix Multiply Algorithm

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

Design Principles for Sparse Matrix Multiplication on the GPU

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Aberdeen D, Baxter J (2001) Emmerald: a fast matrix—matrix multiply using Intel’s SSE instructions. Concurr Comput Pract Exp 13:103–119. doi:10.1002/cpe.549
Article MATH Google Scholar
Allen R, Kennedy K (2002) Optimizing compilers for modern architectures. A dependence based approach. Morgan Kaufmann, San Mateo, 454 pp
Google Scholar
ATLAS FAQ (2010) Available at http://math-atlas.sourceforge.net/faq.html#auth
Bilmes J, Asanovic K, Chin C, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proc of the 11th ACM Int Conf Supercomput (ICS), July, pp 340–347
Burger D, Austin TM (1997) The SimpleSalar tool set, Version 2.0. Technical Report #1342
Chatterjee S, Thottethodi M (1998) Tuning Strassen’s matrix multiplication for memory efficiency. In: Proc of 1998 ACM/IEEE conf supercomput, San Jose, CA, pp 1–14 (CD-ROM)
Chatterjee S, Lebeck AR, Patnala PK, Thotterhodi M (2002) Recursive array layouts and fast matrix multiplication. IEEE Trans Parallel Distrib Syst 13:1105–1123. doi:10.1109/TPDS.2002.1058095
Article Google Scholar
D’Alberto P, Nicolau A (2005) Adaptive Strassen and ATLAS’s DGEMM: a fast square-matrix multiply for modern high-performance systems. In: Proc of eighth int conf high-perform comput, Asia-Pacific region, November 30–December 03, p 45. doi:10.1109/HPCASIA.2005.18
Fischer PC, Probert RL (1974) Efficient procedures for using matrix algorithms. In: Proc of 2nd colloq autom, lang program and Lecture Notes in Computer Science, vol 14, pp 413–427
Frens JD, Wise DS (1997) Auto-blocking matrix-multiplication or tracking BLAS3 performance with source code. In: Proc of the 6th ACM SIGPLAN symp princ pract parallel program. Las Vegas, NV, June, pp 206–216
Frigo M (1999) A fast Fourier transform compiler. In: Proc of programing language design and implementation. Proc of ACM SIGPLAN 1999 conf program lang des implement, pp 169–180
Huss-Lederman S, Jacobson EM, Johnson JR, Tsao A, Turnbull T (1996) Implementation of Strassen’s algorithm for matrix multiplication. In: Proc of ACM/IEEE conf supercomput, Pittsburgh, Pennsylvania, USA (CD-ROM). doi:10.1145/369028.369096
Intel homepage (2010). Available at http://www.intel.com/cd/products/services/emea/eng/319641.htm
Li X, Garzaran MJ (2005) Optimizing matrix multiplication with a classifier learning system. In: Proc of lang compil parallel comput (LCPC 2005), Hawthorne, NY, USA, October 20–22
MIPS Technologies homepage (2010). Available at http://www.mips.com/products/cores/32-64-bit-cores/mips32-74k/
Peano G (1890) Sur une courbe qui remplit toute une aire plaine. Math Ann 36:157–160. doi:10.1007/BF01199438
Article MathSciNet Google Scholar
Price C (1995) MIPS IV Instruction set, revision 3.1. MIPS Technologies, Inc., Mountain View, CA, January
Sagan H (1994) Space-filling curves. Springer, London, ISBN 0-387-94265-3
Book MATH Google Scholar
Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13:354–356
Article MATH MathSciNet Google Scholar
Thottethodi M, Chatterjee S, Lebeck AR (1998) Tuning Strassen’s matrix multiplication for memory efficiency. In: Proc of SC98, Orlando, FL, Nov (CD-ROM). Available from http://www.supercomp.org/sc98/papers/
Ubuntu manuals homepage (2010). Available at http://manpages.ubuntu.com/manpages/karmic/man1/time.1.html
Whaley CR (2008) User contribution to ATLAS. Available at http://modular.math.washington.edu/home/kirkby/ATLAS/doc/atlas_contrib.pdf
Whaley CR (2008) ATLAS installation guide. Available at http://venom.cs.utsa.edu/dmz/techrep/2008/CS-TR-2008-002.pdf
Whaley RC, Dongarra JJ (1997) Automatically tuned linear algebra software. Technical report, http://www.netlib.org/utk/projects/atlas/
Whaley CR, Soendergaard P (2008) A collaborative guide to ATLAS development. Available at http://www.sfr-fresh.com/unix/misc/atlas3.9.24.tar.gz:a/ATLAS/doc/atlas_devel.pdf
Whaley CR, Petitet A, Dongarra JJ (2007) Automated empirical optimization of software and the ATLAS project. Available at http://www.sfr-fresh.com/unix/misc/atlas3.9.24.tar.gz:a/ATLAS/doc/atlas_over.pdf
Xiong J, Johnson J, Johnson R, Padua D (2001) SPL: A Language and a compiler for DSP algorithms. In: Proc of the int conf program lang des implement, pp 298–308

Download references

Author information

Authors and Affiliations

VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece
Nicolaos Alachiotis, Vasileios I. Kelefouras, George S. Athanasiou, Harris E. Michail, Angeliki S. Kritikakou & Costas E. Goutis

Authors

Nicolaos Alachiotis
View author publications
You can also search for this author inPubMed Google Scholar
Vasileios I. Kelefouras
View author publications
You can also search for this author inPubMed Google Scholar
George S. Athanasiou
View author publications
You can also search for this author inPubMed Google Scholar
Harris E. Michail
View author publications
You can also search for this author inPubMed Google Scholar
Angeliki S. Kritikakou
View author publications
You can also search for this author inPubMed Google Scholar
Costas E. Goutis
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Vasileios I. Kelefouras.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alachiotis, N., Kelefouras, V.I., Athanasiou, G.S. et al. A data locality methodology for matrix–matrix multiplication algorithm. J Supercomput 59, 830–851 (2012). https://doi.org/10.1007/s11227-010-0474-3

Download citation

Published: 07 September 2010
Issue Date: February 2012
DOI: https://doi.org/10.1007/s11227-010-0474-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A data locality methodology for matrix–matrix multiplication algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mathematical Approach to the Performance Evaluation of Matrix Multiply Algorithm

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

Design Principles for Sparse Matrix Multiplication on the GPU

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now