Skip to main content

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4395))

Abstract

GPUs for numerical computations are becoming an attractive alternative in research. In this paper, we propose a new parallel processing environment for matrix multiplications by using both CPUs and GPUs. The execution time of matrix multiplications can be decreased to 40.1% by our method, compared with using the fastest of either CPU only case or GPU only case. Our method performs well when matrix sizes are large.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. gpgpu.org: General-Purpose computation on GPUs(GPGPU), http://gpgpu.org/

  2. Thompson, C.J., Hahn, S., Oskin, M.: Using Modern Graphics Architectures for General-Purpose Computing: A Framework and Analysis. In: Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture, pp. 306–317. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  3. Owens, J.D., et al.: A Survey of General-Purpose Computation on Graphics Hardware. In: Eurographics 2005, State of the Art Reports, Dublin, Ireland, pp. 21–51 (2005)

    Google Scholar 

  4. Higham, N.J.: Exploiting Fast Matrix Multiplication Within the Level 3 BLAS. ACM Transactions on Mathematical Software 16, 352–368 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  5. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. Parallel Computing 27(1–2), 3–35 (2001)

    Article  MATH  Google Scholar 

  6. John Montrym, H.M.: THE GEFORCE 6800. IEEE MICRO 2005 25(2) (2005)

    Google Scholar 

  7. Fernando, R.: GPU Gems: Programming Techniques, Tips and Tricks for Real-Time Graphics. Addison-Wesley, Reading (2004)

    Google Scholar 

  8. Shinomoto, Y., et al.: Consideration for Speculative Rendering in PVR. In: IPSJ SIG Technical Reports, 2005-ARC-164, pp. 145–150 (2005)

    Google Scholar 

  9. Amada, T., et al.: Partivle-Based Fluid Simulation on GPU. In: ACM Workshop on General-Purpose Computing on Graphics Processors, ACM Press, New York (2004)

    Google Scholar 

  10. Moravánszky, A.: Dense Matrix Algebra on the GPU, ShaderX2 (2003)

    Google Scholar 

  11. Krüger, J., Westermann, R.: Linear Algebra Operators for GPU Implementation of Numerical Algorithms. In: Proceedings of ACM SIGGRAPH 2003, pp. 908–916. ACM Press, New York (2003)

    Chapter  Google Scholar 

  12. Moreland, K., Angel, E.: The FFT on a GPU. In: Proc. SIGGRAPH / EUROGRAPHICS Workshop Graphics Hardware, pp. 112–119 (2003)

    Google Scholar 

  13. Hillesland, K., Lastra, A.: GPU floating-point paranoia. In: Proceedings of GP2 (2004)

    Google Scholar 

  14. Larsen, E.S., McAllister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2001 ACM/IEEE conference on Supercomputing, IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  15. Fatahalian, K., Sugerman, J., Hanrahan, P.: Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication. In: Graphics Hardware 2004 (2004)

    Google Scholar 

  16. Hall, J.D., Carr, N.A., Hart, J.C.: Cache and Bandwidth Aware Matrix Multiplication on the GPU. Technical report, University of Illinois Dept. of Computer Science (2003)

    Google Scholar 

  17. Jiang, C., Snir, M.: Automatic Tuning Matrix Multiplication Performance on Graphics Hardware. In: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05), pp. 185–196 (2005)

    Google Scholar 

  18. Blackford, L.S., et al.: Practical experience in the numerical dangers of heterogeneous computing. ACM Transactions on Mathematical Software (TOMS) 23, 133–147 (1997)

    Article  MATH  Google Scholar 

  19. Microsoft: DirectX Developer Center, http://msdn.microsoft.com/directx/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michel Daydé José M. L. M. Palma Álvaro L. G. A. Coutinho Esther Pacitti João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Ohshima, S., Kise, K., Katagiri, T., Yuba, T. (2007). Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2006. VECPAR 2006. Lecture Notes in Computer Science, vol 4395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71351-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71351-7_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71350-0

  • Online ISBN: 978-3-540-71351-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics