Abstract
In this paper, we proposed a parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format on the graphic processing units. By utilizing the tremendous memory bandwidth and the texture memory, the performance of this algorithm can be efficiently improved. Experimental results show that the performance of the proposed algorithm can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU. It also can be concluded that it is not wise to accelerate the transposition algorithm for the matrices in the ELLPACK-R format with violent divergence in the number of nonzero elements among the rows.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vazquez, F., Fernandez, J.J., Garzon, E.M.: A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency Comput.: Pract. Experimence. 23, 815–826 (2011)
Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C.C., Sadayappan, P.: Efficient parallel out-of-core matrix transposition. Int. J. High Perform. Comput. Netw. 2, 110–119 (2004)
Mateescu, G., Bauer, G.H., Fiedler, R.A.: Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the Second International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Seattle, 12-18, pp. 5–6 (2011)
Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM. Trans. Math. Software. 4(3), 250–269 (1978)
Stathis, P., Cheresiz, D., Vassiliadis, S., Juurlink, B.: Sparse matrix transpose unit. In: Proceedings of the 18th International Parallel and Distribute Processing Symposium (IPDPS04) (2004)
Weng, T.H., Batjargal, D., Pham, H., Hsieh, M.Y., Li, K.C.: Parallel matrix transposition and vector multiplication using OpenMP. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 243–249 (2013)
Weng, T.H., Pham, H., Jiang, H., Li, K.C.: Designing parallel sparse matrix transposition algorithm using CSR for GPUs. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 251–257 (2013)
Davis, T.: The University of Florida Sparse Matrix Collection. Technical report, University of Florida (2011)
Acknowledgments
This work was supported by the National Science Foundation of China under Grants 61402499 and 61202127, and the National High Technology Research and Development Program of China under Grants 2012AA012706.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, S., Dou, Y., Lei, Y., Wang, Q., Xia, F., Chen, J. (2016). Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs. In: Xu, W., Xiao, L., Li, J., Zhang, C. (eds) Computer Engineering and Technology. NCCET 2015. Communications in Computer and Information Science, vol 592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49283-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-49283-3_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49282-6
Online ISBN: 978-3-662-49283-3
eBook Packages: Computer ScienceComputer Science (R0)