Abstract
There exist various different high- and low-level approaches for GPU programming. These include the newer directive based OpenACC programming model, Nvidia’s programming platform CUDA and existing libraries like cuSPARSE with a fixed functionality. This work compares the attained performance and development effort of different approaches based on the example of implementing the SpMV operation, which is an important and performance critical building block in many application fields. We show that the main differences in development effort using CUDA and OpenACC are related to the memory management and the thread mapping.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
version 6.5 for Tesla K20m and M2050, version 7.0 for Tesla K80.
References
LAMA - Library for Accelerated Math Applications. http://www.libama.org/. Accessed 5 August 2015
Paralution - The library for iterative sparse methods on CPU and GPU. http://www.paralution.com/. Accessed 5 August 2015
PGI Accelerator Compilers with OpenACC Directives. https://www.pgroup.com/resources/accel.htm. Accessed 6 September 2015
Vienna Computing Library (ViennaCL). http://viennacl.sourceforge.net/. Accessed 5 August 2015
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., der Vorst, H.V.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. Technical report NVR-2008-004, Nvidia Corp., December 2008
Christgau, S., Spazier, J., Schnor, B., Hammitsch, M., Babeyko, A., Wächter, J.: A comparison of CUDA and OpenACC: accelerating the tsunami simulation easywave. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–5. IEEE, February 2014
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1: 1–1: 25 (2010)
GNU GCC: GCC 5 Release Series - Changes, new Features, and Fixes. https://gcc.gnu.org/gcc-5/changes.html. Accessed 5 August 2015
Herdman, J., Gaudin, W., McIntosh-Smith, S., Boulton, M., Beckingsale, D., Mallison, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: 2012 SC Companion on High Performance Computing, Networking, Storage and Analysis (SCC), pp. 465–471. IEEE, November 2012
Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: Proceedings of 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 136–143. IEEE (2013)
Khronos OpenCL Working Group: The OpenCL Specification (API Specification), 2 edn. https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf. Accessed 5 August 2015
Maggioni, M., Berger-Wolf, T.: An architecture-aware technique for optimizing sparse matrix-vector multiplication on GPUs. Procedia Comput. Sci. 18, 329–338 (2013). Proceedings of 2013 International Conference on Computational Science. Elsevier B.V
Nvidia Corp: Nvidia cuSPARSE. https://developer.nvidia.com/cusparse. Accessed 5 August 2015
Nvidia Corp: NVIDIA Visual Profiler. https://developer.nvidia.com/nvidia-visual-profiler. Accessed 5 August 2015
Nvidia Corp: Parallel Thread Execution ISA - Application Guide. v4.1st edn., August 2014. http://docs.nvidia.com/cuda/pdf/ptx_isa_4.1.pdf. Accessed 5 August 2015
Nvidia Corp: CUDA C Programming Guide. pg-02829-001_v7.0 edn., March 2015. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Accessed 06 December 2015
OpenACC: OpenACC™Application Programming Interface, Version 2.0a, August 2013. http://www.openacc-standard.org/. Accessed 5 August 2015
Rahman, R.: Intel\(^{\textregistered }\) Xeon Phi™Core Micro-architecture, pp. 1–15 (2013)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003)
Society of Petroleum Engineers: SPE Comparative Solution Project. http://www.spe.org/web/csp/
Sugawara, M., Hirasawa, S., Komatsu, K., Takizawa, H., Kubayashi, H.: A comparison of performance tunabilities between OpenCL and OpenACC. In: Proceedings of 2013 IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip, pp. 147–152. IEEE (2013)
Tang, W., Tan, W., Ray, R., Wong, Y., Chen, W., Kuo, S., Goh, R., Turner, S., Wong, W.: Accelerating sparse matrix-vector multiplication on gpus using bit-representation-optimized schemes. In: Proceedings of SC 2013 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. No. 26 in Proceedings of ACM/IEEE Supercomputing. ACM (2013)
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Acknowledgements
We would like to thank the CMT team at Saudi Aramco EXPEC ARC for their support and input. Especially we want to thank Ali H. Dogru for making this research project possible.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ecker, J.P., Berrendorf, R., Razzaq, J., Scholl, S.E., Mannuss, F. (2016). Comparing Different Programming Approaches for SpMV-Operations on GPUs. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-32149-3_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)