Skip to main content

Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs

  • Conference paper
  • First Online:
Computer Engineering and Technology (NCCET 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 592))

Included in the following conference series:

  • 720 Accesses

Abstract

In this paper, we proposed a parallel algorithm to implement the sparse matrix transposition using ELLPACK-R format on the graphic processing units. By utilizing the tremendous memory bandwidth and the texture memory, the performance of this algorithm can be efficiently improved. Experimental results show that the performance of the proposed algorithm can be improved up to 8x times on Nvidia Tesla C2070, compared with the implementation on the Intel Xeon E5-2650 CPU. It also can be concluded that it is not wise to accelerate the transposition algorithm for the matrices in the ELLPACK-R format with violent divergence in the number of nonzero elements among the rows.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vazquez, F., Fernandez, J.J., Garzon, E.M.: A new approach for sparse matrix vector product on NVIDIA GPUs. Concurrency Comput.: Pract. Experimence. 23, 815–826 (2011)

    Article  Google Scholar 

  2. Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C.C., Sadayappan, P.: Efficient parallel out-of-core matrix transposition. Int. J. High Perform. Comput. Netw. 2, 110–119 (2004)

    Article  Google Scholar 

  3. Mateescu, G., Bauer, G.H., Fiedler, R.A.: Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the Second International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Seattle, 12-18, pp. 5–6 (2011)

    Google Scholar 

  4. Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM. Trans. Math. Software. 4(3), 250–269 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  5. Stathis, P., Cheresiz, D., Vassiliadis, S., Juurlink, B.: Sparse matrix transpose unit. In: Proceedings of the 18th International Parallel and Distribute Processing Symposium (IPDPS04) (2004)

    Google Scholar 

  6. Weng, T.H., Batjargal, D., Pham, H., Hsieh, M.Y., Li, K.C.: Parallel matrix transposition and vector multiplication using OpenMP. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 243–249 (2013)

    Chapter  Google Scholar 

  7. Weng, T.H., Pham, H., Jiang, H., Li, K.C.: Designing parallel sparse matrix transposition algorithm using CSR for GPUs. In: Juang, J., Huang, Y.C. (eds.) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol. 234, pp. 251–257 (2013)

    Chapter  Google Scholar 

  8. Davis, T.: The University of Florida Sparse Matrix Collection. Technical report, University of Florida (2011)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Science Foundation of China under Grants 61402499 and 61202127, and the National High Technology Research and Development Program of China under Grants 2012AA012706.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, S., Dou, Y., Lei, Y., Wang, Q., Xia, F., Chen, J. (2016). Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs. In: Xu, W., Xiao, L., Li, J., Zhang, C. (eds) Computer Engineering and Technology. NCCET 2015. Communications in Computer and Information Science, vol 592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49283-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49283-3_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49282-6

  • Online ISBN: 978-3-662-49283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics