Skip to main content
Log in

Solving finite difference linear systems on GPUs: CUDA based Parallel Explicit Preconditioned Biconjugate Conjugate Gradient type Methods

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

During the last decades, explicit approximate inverse preconditioning methods have been used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. A new parallel computational technique is proposed for the parallelization of the explicit preconditioned conjugate gradient type method on a Graphics Processing Unit (GPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. The inherently parallel operations between vectors and matrices involved in the explicit preconditioned biconjugate conjugate gradient type schemes exhibit significant amounts of loop-level parallelism because of the matrix–vector and the vector–vector products that can lead to high performance gain on the GPU systems, specifically designed for such computations. Finally, numerical results for the performance of the explicit preconditioned biconjugate conjugate gradient type method for solving characteristic two dimensional boundary value problems, using the finite difference method, on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed method are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bolz J, Farmer I, Grinspun E, Schörder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924

    Article  Google Scholar 

  2. Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int J Parallel Emergent Distributed Syst 24(3):205–223

    Article  MathSciNet  Google Scholar 

  3. Evans DJ, Lipitakis EA (1979) A sparse LU factorization procedure for the solution of parabolic differential equations. In: Lewis RW, Morgan K (eds) Proceedings of international conference on numerical methods in thermal problems. Pineridge, Swansea, pp 954–966

    Google Scholar 

  4. Giannoutakis KM, Gravvanis GA (2008) High performance finite element approximate inverse preconditioning. Appl Math Comput 201:293–304

    Article  MathSciNet  MATH  Google Scholar 

  5. Gravvanis GA (2000) Generalized approximate inverse preconditioning for solving non-linear elliptic boundary-value problems. I. J Appl Math 2(11):1363–1378

    MathSciNet  MATH  Google Scholar 

  6. Gravvanis GA (2002) Explicit approximate inverse preconditioning techniques. Arch Comput Methods Eng 9(4):371–402

    Article  MATH  Google Scholar 

  7. Gravvanis GA (2009) High performance inverse preconditioning. Arch Comput Methods Eng 16(1):77–108

    Article  MathSciNet  MATH  Google Scholar 

  8. Gravvanis GA, Giannoutakis KM (2008) Fast parallel finite element approximate inverses. Comput Model Eng Sci 32(1):35–44

    MathSciNet  MATH  Google Scholar 

  9. Gravvanis GA, Filelis-Papadopoulos CK, Giannoutakis KM, Lipitakis EA (2010) Approximate inverse preconditioning using POSIX threads on multicore systems. In: Dougalis V, Gallopoulos E, Hadjidimos A, Kotsireas IS, Noutsos D, Saridakis YG, Vrahatis MN (eds) Proceedings of conference in numerical analysis (NumAn 2010)—recent approaches to numerical analysis: theory, methods and applications, pp 93–99

    Google Scholar 

  10. Griebel M, Zaspel P (2010) A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier–Stokes equations. Comput Sci Res Dev 25(1–2):65–73

    Article  Google Scholar 

  11. Grote MJ, Huckle T (1997) Parallel preconditioning with sparse approximate inverses. SIAM J Sci Comput 18:838–853

    Article  MathSciNet  MATH  Google Scholar 

  12. Harris M Optimizing Parallel Reduction in Cuda. http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/reduction/doc/reduction.pdf

  13. Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, San Mateo

    Google Scholar 

  14. Krüger J, Westermann R (2003) Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans Graph 22(3):908–916

    Article  Google Scholar 

  15. Lipitakis EA, Evans DJ (1987) Explicit semi-direct methods based on approximate inverse matrix techniques for solving boundary-value problems on parallel processors. Math Comput Simul 29:1–17

    Article  MATH  Google Scholar 

  16. NVIDIA CUDA Programming Guide, http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf

  17. NVIDIA Occupancy Calculator, http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls

  18. Saad Y (1996) Iterative methods for sparse linear systems. PWS Publishing, Boston

    MATH  Google Scholar 

  19. Saad Y, van der Vorst HA (2000) Iterative solution of linear systems in the 20th century. J Comput Appl Math 123:1–33

    Article  MathSciNet  MATH  Google Scholar 

  20. Sanders J, Kandrot E (2010) CUDA by example: an introduction to general purpose GPU programming. Addison-Wesley, Reading

    Google Scholar 

  21. Smith IM, Margets L (2006) The convergence variability of parallel iterative solvers. Eng Comput 23(2):154–165

    Article  MATH  Google Scholar 

  22. Yun JH, Kim SW (1997) Parallel implementation of hybrid iterative methods for non-symmetric linear systems. Korean J Comput Appl Math 4(1):1–16

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. A. Gravvanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gravvanis, G.A., Filelis-Papadopoulos, C.K. & Giannoutakis, K.M. Solving finite difference linear systems on GPUs: CUDA based Parallel Explicit Preconditioned Biconjugate Conjugate Gradient type Methods. J Supercomput 61, 590–604 (2012). https://doi.org/10.1007/s11227-011-0619-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0619-z

Keywords

Navigation