Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

Aksari, Yigitcan; Artuner, Harun

doi:10.1007/s11227-011-0736-8

Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

Published: 17 January 2012

Volume 62, pages 550–572, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yigitcan Aksari¹ &
Harun Artuner¹

461 Accesses
2 Citations
Explore all metrics

Abstract

Forward and back substitution algorithms are widely used for solving linear systems of equations after performing LU decomposition on the coefficient matrix. They are also essential in the implementation of high performance preconditioners which improve the convergence properties of the various iterative methods. In this paper, we describe an efficient approach to implementing forward and back substitution algorithms on a GPU and provide the implementation details of these algorithms on a Modified Incomplete Cholesky Preconditioner for the Conjugate Gradient (CG) algorithm. The resulting forward and back substitution algorithms are then used on a Modified Incomplete Cholesky Preconditioned Conjugate Gradient method to solve the sparse, symmetric, positive definite and linear systems of equations arising from the discretization of three dimensional finite difference ground-water flow models. By utilizing multiple threads, the proposed method yields speedups up to 60 times on GeForce GTX 280 compared to CPU implementation and up to 4.8 times speedup compared to cuSPARSE library function optimized for GPU by NVIDIA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

Article 04 September 2019

The Peridigm Meshfree Peridynamics Code

Article Open access 08 May 2023

Block Preconditioning Methods for Asymptotic Preserving Scheme Arising in Anisotropic Elliptic Problems

Article 20 April 2024

References

Aji AM, Feng WC (2008) Accelerating data-serial applications on data-parallel GPGPUs: a systems approach. Tech rep. http://eprints.cs.vt.edu/archive/00001052/01/ipdps08.pdf
Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-gpu platform. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, PDP ’10. IEEE Computer Society, Washington, pp 583–592
Google Scholar
Balevic A, Rockstroh L, Tausendfreund A, Patzelt S, Goch G, Simon S (2008) Accelerating simulations of light scattering based on finite-difference time-domain method with general purpose gpus. In: Computational science and engineering, CSE ’08. 11th IEEE international conference on, pp 327–334. doi:10.1109/CSE.2008.16
Google Scholar
Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477
Article MathSciNet MATH Google Scholar
Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the gpu: conjugate gradients and multigrid. ACM Trans Graph 22:917–924
Article Google Scholar
Fung J, Mann S (2008) Using graphics devices in reverse: Gpu-based image processing and computer vision. In: Multimedia and expo, IEEE international conference on, pp 9–12. doi:10.1109/ICME.2008.4607358
Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Hill MC (1990) Preconditioned conjugate gradient 2 (pcg2), a computer program for solving ground-water flow equations. Tech rep, United States Geological Survey
Jang H, Park A, Jung K (2008) Neural network implementation using cuda and openmp. In: Computing: techniques and applications, 2008. DICTA ’08. Digital image, pp 155–161. doi:10.1109/DICTA.2008.82
Chapter Google Scholar
Jung JH (2006) Cholesky decomposition and linear programming on a gpu. Tech rep
Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to nVIDIA graphics cards using cuda. J Parallel Distrib Comput 69(5):451–460
Article Google Scholar
Micikevicius P (2009) 3d finite difference computation on gpus using cuda. In: GPGPU-2: proceedings of 2nd workshop on general purpose processing on graphics processing units. ACM Press, New York, pp 79–84
Chapter Google Scholar
NVIDIA (2008) Nvidia GeForce GTX 200 gpu architectural overview, second-generation unified gpu architecture for visual computing. www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf
NVIDIA (2009) Cuda programming guide 2.3. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf
NVIDIA (2009) Nvidia cuda c programming best practices guide. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf
NVIDIA (2009) Nvidia’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
NVIDIA (2011) Cuda cusparse library
NVIDIA (2011) Cuda programming guide 4.0. http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/docs/CUDA_C_Programming_Guide.pdf
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech rep
Volkov V, Demmel J (2008) Lu, qr and Cholesky factorizations using vector capabilities of gpus. Tech Rep UCB/EECS-2008-49, EECS Department. University of California, Berkeley
Yang C, Ge Z, Chen J, Wang F, Wu Q (2009) Accelerating pqmrcgstab algorithm on gpu. In: UCHPC-MAW ’09: proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop. ACM Press, New York, pp 11–16. doi:10.1145/1531666.1531670
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Hacettepe University, Beytepe, Ankara, Turkey
Yigitcan Aksari & Harun Artuner

Authors

Yigitcan Aksari
View author publications
You can also search for this author in PubMed Google Scholar
Harun Artuner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yigitcan Aksari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aksari, Y., Artuner, H. Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method. J Supercomput 62, 550–572 (2012). https://doi.org/10.1007/s11227-011-0736-8

Download citation

Published: 17 January 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11227-011-0736-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

Abstract

Access this article

Similar content being viewed by others

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

The Peridigm Meshfree Peridynamics Code

Block Preconditioning Methods for Asymptotic Preserving Scheme Arising in Anisotropic Elliptic Problems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

Abstract

Access this article

Similar content being viewed by others

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

The Peridigm Meshfree Peridynamics Code

Block Preconditioning Methods for Asymptotic Preserving Scheme Arising in Anisotropic Elliptic Problems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation