Abstract
Matrix multiplication is an example of application that is both easy to specify and to provide a simple implementation. There exist numerous sophisticated algorithms or very efficient complex implementations. In this study we are rather interested in the design/programming overhead with respect to performance benefits. Starting from the naive sequential implementation, the implementation is first optimised by improving data accesses, then by using vector units of modern processors, and we finally propose a parallel version for multi-core architectures. The various proposed optimisations are experimented on several architectures and the trade-off software complexity versus efficiency is evaluated using Halstead metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Albrecht, A.: Measuring Application Development Productivity. In: Press, I.B.M. (ed.) IBM Application Development Symp., pp. 83–92 (October 1979)
Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press (1989), http://homepages.inf.ed.ac.uk/mic/Pubs
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of Symbolic Computation 9(3), 251–280 (1990), http://www.sciencedirect.com/science/article/pii/S0747717108800132
D’Alberto, P., Nicolau, A.: Adaptive strassen and atlas’s dgemm: A fast squarematrix multiply for modern high-performance systems. In: Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region. HPCASIA 2005, p. 45. IEEE Computer Society, Washington, DC (2005), doi:10.1109/HPCASIA.2005.18
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990), http://doi.acm.org/10.1145/77626.79170
Dongarra, J.J., Luszczek, P., Petitet, A.: The linpack benchmark: past, present and future. Concurrency and Computation: Practice and Experience 15(9), 803–820 (2003), doi:10.1002/cpe.728
Gonzáalez-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: highlevel structured parallel programming enablers. Software, Practrice & Experience 40(12), 1135–1160 (2010)
Halstead, M.H.: Elements of Software Science. Operating and programming systems series. Elsevier Science Ltd. (1977)
Javed, N., Loulergue, F.: Parallel Programming and Performance Predictability with Orléans Skeleton Library. In: International Conference on High Performance Computing and Simulation (HPCS), pp. 257–263. IEEE (2011)
Kemerer, C.F.: An empirical validation of software cost estimation models. Commun. ACM 30(5), 416–429 (1987)
Mccabe, T.J.: A complexity measure. In: ICSE 1976: Proceedings of the 2nd International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos (1976)
Peleg, A., Weiser, U.: MMX technology extension to the intel architecture. IEEE Micro 16(4), 42–50 (1996)
Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13, 354–356 (1969), doi:10.1007/BF02165411, 10.1007/BF02165411
Strey, A., Bange, M.: Performance Analysis of Intel’s MMX and SSE: A Case Study. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 142–147. Springer, Heidelberg (2001)
Touati, S.A.A., Worms, J., Briais, S.: The Speedup Test. Tech. Rep. inria-00443839, INRIA Saclay - Ile de France (2010), http://hal.inria.fr/inria-00443839
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1-2), 3–35 (2001); also available as University of Tennessee LAPACKWorking Note #147, UT-CS-00-448 (2000), www.netlib.org/lapack/lawns/lawn147.ps
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Legaux, J., Jubertie, S., Loulergue, F. (2012). Experiments in Parallel Matrix Multiplication on Multi-core Systems. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-33078-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33077-3
Online ISBN: 978-3-642-33078-0
eBook Packages: Computer ScienceComputer Science (R0)