skip to main content
10.1145/2464996.2465013acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Published:10 June 2013Publication History

ABSTRACT

Sparse matrix-vector multiplication (SpMV) is an important kernel in many scientific applications and is known to be memory bandwidth limited. On modern processors with wide SIMD and large numbers of cores, we identify and address several bottlenecks which may limit performance even before memory bandwidth: (a) low SIMD efficiency due to sparsity, (b) overhead due to irregular memory accesses, and (c) load-imbalance due to non-uniform matrix structures.

We describe an efficient implementation of SpMV on the IntelR Xeon PhiTM Coprocessor, codenamed Knights Corner (KNC), that addresses the above challenges. Our implementation exploits the salient architectural features of KNC, such as large caches and hardware support for irregular memory accesses. By using a specialized data structure with careful load balancing, we attain performance on average close to 90% of KNC's achievable memory bandwidth on a diverse set of sparse matrices. Furthermore, we demonstrate that our implementation is 3.52x and 1.32x faster, respectively, than the best available implementations on dual IntelR XeonR Processor E5-2680 and the NVIDIA Tesla K20X architecture.

References

  1. N. Bell and M. Garland. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In Proc. ACM/IEEE Conf. Supercomputing, SC '09, pp. 18. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bell and M. Garland. CUSP: Generic parallel algorithms for sparse matrix and graph computations, 2012. V0.3.0.Google ScholarGoogle Scholar
  3. A. Buluc, S. Williams, L. Oliker, and J. Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In Proc. 2011 IEEE Intl Parallel & Distributed Processing Symposium, IPDPS 2011, pp. 721--733, Washington, DC, USA, 2011. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Buttari, V. Eijkhout, J. Langou, and S. Filippone. Performance optimization and modeling of blocked sparse kernels. Intl J. High Perf. Comput. Appl., 21:467--484, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Choi, A. Singh, and R. Vuduc. Model-driven autotuning of sparse matrix-vector multiply on GPUs. In ACM SIGPLAN Notices, volume 45, pp. 115--126. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Davis and E. Chung. SpMV: A memory-bound application on the GPU stuck between a rock and a hard place. Microsoft Technical Report, 2012.Google ScholarGoogle Scholar
  7. G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. J. Supercomput., 50:36--77, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Im, K. Yelick, and R. Vuduc. SPARSITY: Optimization framework for sparse matrix kernels. Intl J. High Perf. Comput. Appl., 18:135--158, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Karakasis, G. Goumas, and N. Koziris. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In Proc. 2009 Intl Conf. Comput. Sci. and Eng., CSE '09, pp. 247--256. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Kourtis, G. Goumas, and N. Koziris. Exploiting compression opportunities to improve SpMxV performance on shared memory systems. ACM Trans. Architecture and Code Optimization, 7:16, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Kreutzer, G. Hager, G. Wellein, H. Fehske, A. Basermann, and A. R. Bishop. Sparse matrix-vectorGoogle ScholarGoogle Scholar

Index Terms

  1. Efficient sparse matrix-vector multiplication on x86-based many-core processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
          June 2013
          512 pages
          ISBN:9781450321303
          DOI:10.1145/2464996

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICS '13 Paper Acceptance Rate43of202submissions,21%Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader