Skip to main content

Comparing Different Programming Approaches for SpMV-Operations on GPUs

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9573))

  • 1258 Accesses

Abstract

There exist various different high- and low-level approaches for GPU programming. These include the newer directive based OpenACC programming model, Nvidia’s programming platform CUDA and existing libraries like cuSPARSE with a fixed functionality. This work compares the attained performance and development effort of different approaches based on the example of implementing the SpMV operation, which is an important and performance critical building block in many application fields. We show that the main differences in development effort using CUDA and OpenACC are related to the memory management and the thread mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    version 6.5 for Tesla K20m and M2050, version 7.0 for Tesla K80.

References

  1. LAMA - Library for Accelerated Math Applications. http://www.libama.org/. Accessed 5 August 2015

  2. Paralution - The library for iterative sparse methods on CPU and GPU. http://www.paralution.com/. Accessed 5 August 2015

  3. PGI Accelerator Compilers with OpenACC Directives. https://www.pgroup.com/resources/accel.htm. Accessed 6 September 2015

  4. Vienna Computing Library (ViennaCL). http://viennacl.sourceforge.net/. Accessed 5 August 2015

  5. Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)

    Book  Google Scholar 

  6. Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report NVR-2008-004, Nvidia Corp., December 2008

    Google Scholar 

  7. Christgau, S., Spazier, J., Schnor, B., Hammitsch, M., Babeyko, A., Wächter, J.: A comparison of CUDA and OpenACC: accelerating the tsunami simulation easywave. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–5. IEEE, February 2014

    Google Scholar 

  8. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1: 1–1: 25 (2010)

    MathSciNet  MATH  Google Scholar 

  9. GNU GCC: GCC 5 Release Series - Changes, new Features, and Fixes. https://gcc.gnu.org/gcc-5/changes.html. Accessed 5 August 2015

  10. Herdman, J., Gaudin, W., McIntosh-Smith, S., Boulton, M., Beckingsale, D., Mallison, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: 2012 SC Companion on High Performance Computing, Networking, Storage and Analysis (SCC), pp. 465–471. IEEE, November 2012

    Google Scholar 

  11. Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: Proceedings of 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 136–143. IEEE (2013)

    Google Scholar 

  12. Khronos OpenCL Working Group: The OpenCL Specification (API Specification), 2 edn. https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf. Accessed 5 August 2015

  13. Maggioni, M., Berger-Wolf, T.: An architecture-aware technique for optimizing sparse matrix-vector multiplication on GPUs. Procedia Comput. Sci. 18, 329–338 (2013). Proceedings of 2013 International Conference on Computational Science. Elsevier B.V

    Article  Google Scholar 

  14. Nvidia Corp: Nvidia cuSPARSE. https://developer.nvidia.com/cusparse. Accessed 5 August 2015

  15. Nvidia Corp: NVIDIA Visual Profiler. https://developer.nvidia.com/nvidia-visual-profiler. Accessed 5 August 2015

  16. Nvidia Corp: Parallel Thread Execution ISA - Application Guide. v4.1st edn., August 2014. http://docs.nvidia.com/cuda/pdf/ptx_isa_4.1.pdf. Accessed 5 August 2015

  17. Nvidia Corp: CUDA C Programming Guide. pg-02829-001_v7.0 edn., March 2015. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Accessed 06 December 2015

  18. OpenACC: OpenACC™Application Programming Interface, Version 2.0a, August 2013. http://www.openacc-standard.org/. Accessed 5 August 2015

  19. Rahman, R.: Intel\(^{\textregistered }\) Xeon Phi™Core Micro-architecture, pp. 1–15 (2013)

    Google Scholar 

  20. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)

    Book  Google Scholar 

  21. Society of Petroleum Engineers: SPE Comparative Solution Project. http://www.spe.org/web/csp/

  22. Sugawara, M., Hirasawa, S., Komatsu, K., Takizawa, H., Kubayashi, H.: A comparison of performance tunabilities between OpenCL and OpenACC. In: Proceedings of 2013 IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, pp. 147–152. IEEE (2013)

    Google Scholar 

  23. Tang, W., Tan, W., Ray, R., Wong, Y., Chen, W., Kuo, S., Goh, R., Turner, S., Wong, W.: Accelerating sparse matrix-vector multiplication on gpus using bit-representation-optimized schemes. In: Proceedings of SC 2013 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. No. 26 in Proceedings of ACM/IEEE Supercomputing. ACM (2013)

    Google Scholar 

  24. Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank the CMT team at Saudi Aramco EXPEC ARC for their support and input. Especially we want to thank Ali H. Dogru for making this research project possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan P. Ecker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ecker, J.P., Berrendorf, R., Razzaq, J., Scholl, S.E., Mannuss, F. (2016). Comparing Different Programming Approaches for SpMV-Operations on GPUs. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32149-3_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32148-6

  • Online ISBN: 978-3-319-32149-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics