Skip to main content
Log in

The BiConjugate gradient method on GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In a wide variety of applications from different scientific and engineering fields, the solution of complex and/or nonsymmetric linear systems of equations is required. To solve this kind of linear systems the BiConjugate Gradient method (BCG) is especially relevant. Nevertheless, BCG has a enormous computational cost. GPU computing is useful for accelerating this kind of algorithms but it is necessary to develop suitable implementations to optimally exploit the GPU architecture. In this paper, we show how BCG can be effectively accelerated when all operations are computed on a GPU. So, BCG has been implemented with two alternative routines of the Sparse Matrix Vector product (SpMV): the CUSPARSE library and the ELLR-T routine. Although our interest is focused on complex matrices, our implementation has been evaluated on a GPU for two sets of test matrices: complex and real, in single and double precision data. Experimental results show that BCG based on ELLR-T routine achieves the best performance, particularly for the set of complex test matrices. Consequently, this method can be useful as a tool to efficiently solve large linear system of equations (complex and/or nonsymmetric) involved in a broad range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1

Similar content being viewed by others

References

  1. Baskaran MM, Bordawekar R (2009) Optimizing sparse matrix–vector multiplication on GPUs. Tech rep research report RC24704, IBM

  2. Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proc of the conf on high performance computing networking, storage and analysis, pp 1–11

    Chapter  Google Scholar 

  3. Bisseling RH (2004) Parallel scientific computation. Oxford University Press, Oxford

    Book  MATH  Google Scholar 

  4. Choi JW, Singh A, Vuduc R (2010) Model-driven autotuning of sparse matrix–vector multiply on GPUs. In: PPoPP’10, pp 115–126

    Google Scholar 

  5. De Donno D Alessandra E et al (2011) Iterative solution of linear systems in electromagnetics (and not only): experiences with CUDA. In: Euro-Par 2010 parallel processing workshops. LNCS, vol 6586. Springer, Berlin, pp 329–337

    Chapter  Google Scholar 

  6. Gaikwad A, Toke IM (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proc of the 2010 18th euromicro conference on parallel, distributed and network-based processing, pp 607–614

    Chapter  Google Scholar 

  7. Garcia N (2010) Parallel power flow solutions using a biconjugate gradient algorithm and a newton method: a GPU-based approach. In: Power and energy society general meeting. IEEE Press, New York, pp 1–4

    Google Scholar 

  8. Golub GH, van Van Loan CF (1996) Matrix computations (Johns Hopkins studies in mathematical sciences), 3rd edn. Johns Hopkins University Press, Baltimore

    Google Scholar 

  9. INTEL (2009) Math kernel library

  10. Lanczos C (1952) Solution of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49:33–53

    Article  MathSciNet  Google Scholar 

  11. Lee VW Kim C Chhugani J Deisher et al (2010) Debunking the 100× GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. Comput Archit News 38:451–460

    Article  Google Scholar 

  12. Lobera J, Coupland JM (2008) Optical diffraction tomography in fluid velocimetry: the use of a priori information. Meas Sci Technol 19(7):074,013

    Article  Google Scholar 

  13. NVIDIA (2010) CUDA CUSPARSE library. Tech rep. http://www.nvidia.com/content/GTC-2010/pdfs/2070_GTC2010.pdf

  14. NVIDIA (2010) Cusp library. Tech rep

  15. NVIDIA (2010) Next generation CUDA architecture. Fermi architecture

  16. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  17. Vázquez F, Fernández JJ, Garzón EM (2011) Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Comput. doi:10.1016/j.parco.2011.08.003

    Google Scholar 

  18. Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput 23:815–826

    Article  Google Scholar 

  19. Vázquez F, Garzón E, Fernández J (2011) Matrix implementation of simultaneous iterative reconstruction technique (SIRT) on GPUs. Comput J 55(11):1861–1868

    Article  Google Scholar 

  20. Vázquez F, Ortega G, Fernández JJ, Garzón EM (2010) Improving the performance of the sparse matrix vector product with GPUs. In: CIT 2010. IEEE Comput Soc, Los Alamitos, pp 1146–1151. doi:10.1109/CIT.2010.208

    Google Scholar 

Download references

Acknowledgements

This work has been funded by grants from the Spanish Ministry of Science and Innovation (TIN2008-01117) and Junta de Andalucia (P08-TIC-3518, P10-TIC-6002), in part financed by the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Ortega.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortega, G., Garzón, E.M., Vázquez, F. et al. The BiConjugate gradient method on GPUs. J Supercomput 64, 49–58 (2013). https://doi.org/10.1007/s11227-012-0761-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0761-2

Keywords

Navigation