Abstract
In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture’s specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.
Similar content being viewed by others
References
Agarwal RC, Gustavson FG, Zubair M (1992) a high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In: Supercomputing’92, Minnesota, November 1992. IEEE, New York, pp 32–41
Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley
Athanasaki E, Anastopoulos N, Kourtis K, Koziris N (2008) Exploring the performance limits of simultaneous multithreading for memory intensive applications. J Supercomput 44(1):64–97
Barrett R, Berry M, Chan TF, Demmel J, Donato JM, Dongarra J, Eijkhout V, Pozo R, Romine C, der Vorst HV (1994) Templates for the solution of linear systems: building blocks for iterative methods. SIAM, Philadelphia
Buttari A, Eijkhout V, Langou J, Filippone S (2005) Performance optimization and modeling of blocked sparse kernels. Technical Report ICL-UT-04-05, Innovative Computing Laboratory, University of Tennessee
Catalyuerek UV, Aykanat C (1996) Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Lecture notes in computer science, vol 1117, pp 75–86
Davis T (1997) University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices. NA Digest 97(23)
Geus R, Röllin S (1999) Towards a fast parallel sparse matrix-vector multiplication. In: Parallel computing: fundamentals and applications, international conference ParCo. Imperial College Press, 1999, pp 308–315
Gropp W, Kaushik D, Keyes D, Smith B (1999) Toward realistic performance bounds for implicit cfd codes. In: Ecer A et al. (eds) Proceedings of parallel CFD’99. Elsevier, Amsterdam
Im E (2000) Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley
Im E, Yelick K (1999) Optimizing sparse matrix-vector multiplication on SMPs. In: 9th SIAM conference on parallel processing for scientific computing, SIAM, March 1999
Im E, Yelick K (2001) Optimizing sparse matrix computations for register reuse in SPARSITY. In: Lecture notes in computer science, vol 2073, pp 127–136
Kotakemori H, Hasegawa H, Kajiyama T, Nukada A, Suda R, Nishida A (2005) Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700. In: 1st International workshop on OpenMP (IWOMP), Eugene, OR, USA, June 2005
Lo JL, Eggers SJ, Emer JS, Levy HM, Stamm RL, Tullsen DM (1997) Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Trans Comput Syst 15(3):322–354
Mellor-Crummey J, Garvin J (2004) Optimizing sparse matrix-vector product computations using unroll and jam. Int J High Perform Comput Appl 18(2):225
Mitchell N, Carter L, Ferrante J, Tullsen D (1999) Instruction level parallelism vs. thread level parallelism on simultaneous multi-threading processors. In: Proceedings of supercomputing’99 (CD-ROM), Portland, OR, November 1999. ACM SIGARCH and IEEE
Paolini GV, Radicati di Brozolo G (1989) Data structures to vectorize CG algorithms for general sparsity patterns. BIT Numer Math 29(4):703–718
Pichel JC, Heras DB, Cabaleiro JC, Rivera FF (2004) Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In: PDP, IEEE Computer Society, 2004, pp 66–71
Pichel JC, Heras DB, Cabaleiro JC, Rivera FF (2005) Performance optimization of irregular codes based on the combination of reordering and blocking techniques. Parallel Comput 31(8–9):858–876
Pinar A, Heath MT (1999) Improving performance of sparse matrix-vector multiplication. In: Supercomputing’99, Portland, OR, November 1999. ACM SIGARCH and IEEE
Saad Y (1990) Sparskit: A basic tool kit for sparse matrix computation. Technical report, Center for Supercomputing Research and Development, University of Illinois at Urbana Champaign
Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia
Temam O, Jalby W (1992) Characterizing the behavior of sparse algorithms on caches. In: Supercomputing’92, Minnesota, November 1992. IEEE, New York, pp 578–587
Toledo S (1997) Improving the memory-system performance of sparse-matrix vector multiplication. IBM J Res Dev 41(6):711–725
Vuduc R, Demmel J, Yelick K, Kamil S, Nishtala R, Lee B (2002) Performance optimizations and bounds for sparse matrix-vector multiply. In: Supercomputing, Baltimore, MD, November, 2002
Vuduc RW, Moon H (2005) Fast sparse matrix-vector multiplication by exploiting variable block structure. In: High performance computing and communications. Lecture notes in computer science, vol 3726. Springer, Berlin, pp 807–816
White J, Sadayappan P (1997) On improving the performance of sparse matrix-vector multiplication. In: 4th International conference on high performance computing (HiPC ’97), 1997
Willcock J, Lumsdaine A (2006) Accelerating sparse matrix computations via data compression. In: ICS ’06: Proceedings of the 20th annual international conference on supercomputing, New York, NY, USA, 2006. ACM Press, New York, pp 307–316
Williams S, Oilker L, Vuduc R, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Supercomputing’07, Reno, NV, November 2007
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Goumas, G., Kourtis, K., Anastopoulos, N. et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J Supercomput 50, 36–77 (2009). https://doi.org/10.1007/s11227-008-0251-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0251-8