Abstract
During the last decades, explicit approximate inverse preconditioning methods have been used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. A new parallel computational technique is proposed for the parallelization of the explicit preconditioned conjugate gradient type method on a Graphics Processing Unit (GPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. The inherently parallel operations between vectors and matrices involved in the explicit preconditioned biconjugate conjugate gradient type schemes exhibit significant amounts of loop-level parallelism because of the matrix–vector and the vector–vector products that can lead to high performance gain on the GPU systems, specifically designed for such computations. Finally, numerical results for the performance of the explicit preconditioned biconjugate conjugate gradient type method for solving characteristic two dimensional boundary value problems, using the finite difference method, on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed method are also discussed.
Similar content being viewed by others
References
Bolz J, Farmer I, Grinspun E, Schörder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924
Buatois L, Caumon G, Levy B (2009) Concurrent number cruncher: a GPU implementation of a general sparse linear solver. Int J Parallel Emergent Distributed Syst 24(3):205–223
Evans DJ, Lipitakis EA (1979) A sparse LU factorization procedure for the solution of parabolic differential equations. In: Lewis RW, Morgan K (eds) Proceedings of international conference on numerical methods in thermal problems. Pineridge, Swansea, pp 954–966
Giannoutakis KM, Gravvanis GA (2008) High performance finite element approximate inverse preconditioning. Appl Math Comput 201:293–304
Gravvanis GA (2000) Generalized approximate inverse preconditioning for solving non-linear elliptic boundary-value problems. I. J Appl Math 2(11):1363–1378
Gravvanis GA (2002) Explicit approximate inverse preconditioning techniques. Arch Comput Methods Eng 9(4):371–402
Gravvanis GA (2009) High performance inverse preconditioning. Arch Comput Methods Eng 16(1):77–108
Gravvanis GA, Giannoutakis KM (2008) Fast parallel finite element approximate inverses. Comput Model Eng Sci 32(1):35–44
Gravvanis GA, Filelis-Papadopoulos CK, Giannoutakis KM, Lipitakis EA (2010) Approximate inverse preconditioning using POSIX threads on multicore systems. In: Dougalis V, Gallopoulos E, Hadjidimos A, Kotsireas IS, Noutsos D, Saridakis YG, Vrahatis MN (eds) Proceedings of conference in numerical analysis (NumAn 2010)—recent approaches to numerical analysis: theory, methods and applications, pp 93–99
Griebel M, Zaspel P (2010) A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier–Stokes equations. Comput Sci Res Dev 25(1–2):65–73
Grote MJ, Huckle T (1997) Parallel preconditioning with sparse approximate inverses. SIAM J Sci Comput 18:838–853
Harris M Optimizing Parallel Reduction in Cuda. http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/reduction/doc/reduction.pdf
Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, San Mateo
Krüger J, Westermann R (2003) Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans Graph 22(3):908–916
Lipitakis EA, Evans DJ (1987) Explicit semi-direct methods based on approximate inverse matrix techniques for solving boundary-value problems on parallel processors. Math Comput Simul 29:1–17
NVIDIA CUDA Programming Guide, http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf
NVIDIA Occupancy Calculator, http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls
Saad Y (1996) Iterative methods for sparse linear systems. PWS Publishing, Boston
Saad Y, van der Vorst HA (2000) Iterative solution of linear systems in the 20th century. J Comput Appl Math 123:1–33
Sanders J, Kandrot E (2010) CUDA by example: an introduction to general purpose GPU programming. Addison-Wesley, Reading
Smith IM, Margets L (2006) The convergence variability of parallel iterative solvers. Eng Comput 23(2):154–165
Yun JH, Kim SW (1997) Parallel implementation of hybrid iterative methods for non-symmetric linear systems. Korean J Comput Appl Math 4(1):1–16
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gravvanis, G.A., Filelis-Papadopoulos, C.K. & Giannoutakis, K.M. Solving finite difference linear systems on GPUs: CUDA based Parallel Explicit Preconditioned Biconjugate Conjugate Gradient type Methods. J Supercomput 61, 590–604 (2012). https://doi.org/10.1007/s11227-011-0619-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0619-z