Efficient Triangular Matrix Vector Multiplication on the GPU

Inoue, Takahiro; Tokura, Hiroki; Nakano, Koji; Ito, Yasuaki

doi:10.1007/978-3-030-43229-4_42

Takahiro Inoue¹²,
Hiroki Tokura¹²,
Koji Nakano¹² &
…
Yasuaki Ito¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12043))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

990 Accesses
2 Citations

Abstract

The main purpose of this paper is to present a very efficient GPU implementation to compute the trmv, the product of a triangular matrix and a vector. Usually, developers use cuBLAS, a linear algebra library optimized for each of various generations of GPUs, to compute the trmv. To attain better performance than cuBLAS, our GPU implementation of the trmv uses various acceleration technique for latest GPUs. More specifically, our GPU implementation has the following features: (1) only one kernel is called; (2) maximum number of threads are invoked; (3) all memory access to the global memory is coalesced; (4) all memory access to the shared memory has no bank conflict; and (5) shared memory access is minimized by a warp shuffle function. Experimental results for five generations of NVIDIA GPUs for matrices of sizes from \(32\times 32\) to \(\mathrm {16K}\times \mathrm {16K}\) for fp32 show that our GPU implementation is faster than cuBLAS and muBLAS for almost all matrix sizes and GPU generations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Charara, A., Ltaief, H., Keyes, D.: Redesigning triangular dense matrix computations on GPUs. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 477–489. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_35
Chapter Google Scholar
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: Proceedings of International Symposium on Parallel and Distributed Processing, April 2008
Google Scholar
He, G., Gao, J., Wang, J.: Efficient dense matrix-vector multiplication on GPU. Concurr. Comput. Pract. Exp. 30(19), e4705 (2018)
Article Google Scholar
Honda, T., Yamamoto, S., Honda, H., Nakano, K., Ito, Y.: Simple and fast parallel algorithms for the Voronoi map and the Euclidean distance map, with GPU implementations. In: Proceedings of International Conference on Parallel Processing, pp. 362–371, August 2017
Google Scholar
Hwu, W.W.: GPU Computing Gems Emerald Edition. Morgan Kaufmann, Burlington (2011)
Google Scholar
Karwacki, M., Stpiczynski, P.: Improving performance of triangular matrix-vector BLAS routines on GPUs. Adv. Parallel Comput. 22, 405–412 (2012)
Google Scholar
Matsumura, N., Tokura, H., Kuroda, Y., Ito, Y., Nakano, K.: Tile art image generation using conditional generative adversarial networks. In: Proceedings of International Symposium on Computing and Networking Workshops, pp. 209–215 (2018)
Google Scholar
Mukunoki, D., Imamura, T., Takahashi, D.: Automatic thread-block size adjustment for memory-bound BLAS kernels on GPUs. In: Proceedings of International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, June 2016
Google Scholar
Muramatsu, J., Fukaya, T., Zhang, S.L., Kimura, K., Yamamoto, Y.: Acceleration of Hessenberg reduction for nonsymmetric eigenvalue problems in a hybrid CPU-GPU computing environment. Int. J. Netw. Comput. 1(2), 132–143 (2011)
Google Scholar
NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)
Google Scholar
NVIDIA Corporation: CUBLAS LIBRARY user guide, February 2019. https://docs.nvidia.com/cuda/cublas/index.html
Ogawa, K., Ito, Y., Nakano, K.: Efficient Canny edge detection using a GPU. In: Proceedings of International Conference on Networking and Computing, pp. 279–280. IEEE CS Press, November 2010
Google Scholar
Takeuchi, Y., Takafuji, D., Ito, Y., Nakano, K.: ASCII art generation using the local exhaustive search on the GPU. In: Proceedings of International Symposium on Computing and Networking, pp. 194–200, December 2013
Google Scholar
Tokura, H., et al.: An efficient GPU implementation of bulk computation of the eigenvalue problem for many small real non-symmetric matrices. Int. J. Netw. Comput. 7(2), 227–247 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan
Takahiro Inoue, Hiroki Tokura, Koji Nakano & Yasuaki Ito

Authors

Takahiro Inoue
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Tokura
View author publications
You can also search for this author in PubMed Google Scholar
Koji Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Yasuaki Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koji Nakano .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Inoue, T., Tokura, H., Nakano, K., Ito, Y. (2020). Efficient Triangular Matrix Vector Multiplication on the GPU. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-43229-4_42
Published: 19 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics