Skip to main content
Log in

Performance evaluation of the sparse matrix-vector multiplication on modern architectures

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we revisit the performance issues of the widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number of different factors that may significantly reduce performance. However, the interaction of these factors with the underlying architectural characteristics is not clearly understood, a fact that may lead to misguided, and thus unsuccessful attempts for optimization. In order to gain an insight into the details of SpMxV performance, we conduct a suite of experiments on a rich set of matrices for three different commodity hardware platforms. In addition, we investigate the parallel version of the kernel and report on the corresponding performance results and their relation to each architecture’s specific multithreaded configuration. Based on our experiments, we extract useful conclusions that can serve as guidelines for the optimization process of both single and multithreaded versions of the kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal RC, Gustavson FG, Zubair M (1992) a high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In: Supercomputing’92, Minnesota, November 1992. IEEE, New York, pp 32–41

    Google Scholar 

  2. Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Husbands P, Keutzer K, Patterson DA, Plishker WL, Shalf J, Williams SW, Yelick KA (2006) The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley

  3. Athanasaki E, Anastopoulos N, Kourtis K, Koziris N (2008) Exploring the performance limits of simultaneous multithreading for memory intensive applications. J Supercomput 44(1):64–97

    Article  Google Scholar 

  4. Barrett R, Berry M, Chan TF, Demmel J, Donato JM, Dongarra J, Eijkhout V, Pozo R, Romine C, der Vorst HV (1994) Templates for the solution of linear systems: building blocks for iterative methods. SIAM, Philadelphia

    Google Scholar 

  5. Buttari A, Eijkhout V, Langou J, Filippone S (2005) Performance optimization and modeling of blocked sparse kernels. Technical Report ICL-UT-04-05, Innovative Computing Laboratory, University of Tennessee

  6. Catalyuerek UV, Aykanat C (1996) Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Lecture notes in computer science, vol 1117, pp 75–86

  7. Davis T (1997) University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices. NA Digest 97(23)

  8. Geus R, Röllin S (1999) Towards a fast parallel sparse matrix-vector multiplication. In: Parallel computing: fundamentals and applications, international conference ParCo. Imperial College Press, 1999, pp 308–315

  9. Gropp W, Kaushik D, Keyes D, Smith B (1999) Toward realistic performance bounds for implicit cfd codes. In: Ecer A et al. (eds) Proceedings of parallel CFD’99. Elsevier, Amsterdam

    Google Scholar 

  10. Im E (2000) Optimizing the performance of sparse matrix-vector multiplication. PhD thesis, University of California, Berkeley

  11. Im E, Yelick K (1999) Optimizing sparse matrix-vector multiplication on SMPs. In: 9th SIAM conference on parallel processing for scientific computing, SIAM, March 1999

  12. Im E, Yelick K (2001) Optimizing sparse matrix computations for register reuse in SPARSITY. In: Lecture notes in computer science, vol 2073, pp 127–136

  13. Kotakemori H, Hasegawa H, Kajiyama T, Nukada A, Suda R, Nishida A (2005) Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700. In: 1st International workshop on OpenMP (IWOMP), Eugene, OR, USA, June 2005

  14. Lo JL, Eggers SJ, Emer JS, Levy HM, Stamm RL, Tullsen DM (1997) Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Trans Comput Syst 15(3):322–354

    Article  Google Scholar 

  15. Mellor-Crummey J, Garvin J (2004) Optimizing sparse matrix-vector product computations using unroll and jam. Int J High Perform Comput Appl 18(2):225

    Article  Google Scholar 

  16. Mitchell N, Carter L, Ferrante J, Tullsen D (1999) Instruction level parallelism vs. thread level parallelism on simultaneous multi-threading processors. In: Proceedings of supercomputing’99 (CD-ROM), Portland, OR, November 1999. ACM SIGARCH and IEEE

  17. Paolini GV, Radicati di Brozolo G (1989) Data structures to vectorize CG algorithms for general sparsity patterns. BIT Numer Math 29(4):703–718

    Article  MATH  MathSciNet  Google Scholar 

  18. Pichel JC, Heras DB, Cabaleiro JC, Rivera FF (2004) Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In: PDP, IEEE Computer Society, 2004, pp 66–71

  19. Pichel JC, Heras DB, Cabaleiro JC, Rivera FF (2005) Performance optimization of irregular codes based on the combination of reordering and blocking techniques. Parallel Comput 31(8–9):858–876

    Article  Google Scholar 

  20. Pinar A, Heath MT (1999) Improving performance of sparse matrix-vector multiplication. In: Supercomputing’99, Portland, OR, November 1999. ACM SIGARCH and IEEE

  21. Saad Y (1990) Sparskit: A basic tool kit for sparse matrix computation. Technical report, Center for Supercomputing Research and Development, University of Illinois at Urbana Champaign

  22. Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia

    MATH  Google Scholar 

  23. Temam O, Jalby W (1992) Characterizing the behavior of sparse algorithms on caches. In: Supercomputing’92, Minnesota, November 1992. IEEE, New York, pp 578–587

    Google Scholar 

  24. Toledo S (1997) Improving the memory-system performance of sparse-matrix vector multiplication. IBM J Res Dev 41(6):711–725

    Article  Google Scholar 

  25. Vuduc R, Demmel J, Yelick K, Kamil S, Nishtala R, Lee B (2002) Performance optimizations and bounds for sparse matrix-vector multiply. In: Supercomputing, Baltimore, MD, November, 2002

  26. Vuduc RW, Moon H (2005) Fast sparse matrix-vector multiplication by exploiting variable block structure. In: High performance computing and communications. Lecture notes in computer science, vol 3726. Springer, Berlin, pp 807–816

    Chapter  Google Scholar 

  27. White J, Sadayappan P (1997) On improving the performance of sparse matrix-vector multiplication. In: 4th International conference on high performance computing (HiPC ’97), 1997

  28. Willcock J, Lumsdaine A (2006) Accelerating sparse matrix computations via data compression. In: ICS ’06: Proceedings of the 20th annual international conference on supercomputing, New York, NY, USA, 2006. ACM Press, New York, pp 307–316

    Chapter  Google Scholar 

  29. Williams S, Oilker L, Vuduc R, Shalf J, Yelick K, Demmel J (2007) Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Supercomputing’07, Reno, NV, November 2007

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Goumas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goumas, G., Kourtis, K., Anastopoulos, N. et al. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J Supercomput 50, 36–77 (2009). https://doi.org/10.1007/s11227-008-0251-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-008-0251-8

Keywords

Navigation