Skip to main content
Log in

Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Forward and back substitution algorithms are widely used for solving linear systems of equations after performing LU decomposition on the coefficient matrix. They are also essential in the implementation of high performance preconditioners which improve the convergence properties of the various iterative methods. In this paper, we describe an efficient approach to implementing forward and back substitution algorithms on a GPU and provide the implementation details of these algorithms on a Modified Incomplete Cholesky Preconditioner for the Conjugate Gradient (CG) algorithm. The resulting forward and back substitution algorithms are then used on a Modified Incomplete Cholesky Preconditioned Conjugate Gradient method to solve the sparse, symmetric, positive definite and linear systems of equations arising from the discretization of three dimensional finite difference ground-water flow models. By utilizing multiple threads, the proposed method yields speedups up to 60 times on GeForce GTX 280 compared to CPU implementation and up to 4.8 times speedup compared to cuSPARSE library function optimized for GPU by NVIDIA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aji AM, Feng WC (2008) Accelerating data-serial applications on data-parallel GPGPUs: a systems approach. Tech rep. http://eprints.cs.vt.edu/archive/00001052/01/ipdps08.pdf

  2. Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-gpu platform. In: Proceedings of the 2010 18th Euromicro conference on parallel, distributed and network-based processing, PDP ’10. IEEE Computer Society, Washington, pp 583–592

    Google Scholar 

  3. Balevic A, Rockstroh L, Tausendfreund A, Patzelt S, Goch G, Simon S (2008) Accelerating simulations of light scattering based on finite-difference time-domain method with general purpose gpus. In: Computational science and engineering, CSE ’08. 11th IEEE international conference on, pp 327–334. doi:10.1109/CSE.2008.16

    Google Scholar 

  4. Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477

    Article  MathSciNet  MATH  Google Scholar 

  5. Bolz J, Farmer I, Grinspun E, Schröder P (2003) Sparse matrix solvers on the gpu: conjugate gradients and multigrid. ACM Trans Graph 22:917–924

    Article  Google Scholar 

  6. Fung J, Mann S (2008) Using graphics devices in reverse: Gpu-based image processing and computer vision. In: Multimedia and expo, IEEE international conference on, pp 9–12. doi:10.1109/ICME.2008.4607358

    Google Scholar 

  7. Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  8. Hill MC (1990) Preconditioned conjugate gradient 2 (pcg2), a computer program for solving ground-water flow equations. Tech rep, United States Geological Survey

  9. Jang H, Park A, Jung K (2008) Neural network implementation using cuda and openmp. In: Computing: techniques and applications, 2008. DICTA ’08. Digital image, pp 155–161. doi:10.1109/DICTA.2008.82

    Chapter  Google Scholar 

  10. Jung JH (2006) Cholesky decomposition and linear programming on a gpu. Tech rep

  11. Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to nVIDIA graphics cards using cuda. J Parallel Distrib Comput 69(5):451–460

    Article  Google Scholar 

  12. Micikevicius P (2009) 3d finite difference computation on gpus using cuda. In: GPGPU-2: proceedings of 2nd workshop on general purpose processing on graphics processing units. ACM Press, New York, pp 79–84

    Chapter  Google Scholar 

  13. NVIDIA (2008) Nvidia GeForce GTX 200 gpu architectural overview, second-generation unified gpu architecture for visual computing. www.nvidia.com/docs/IO/55506/GeForce_GTX_200_GPU_Technical_Brief.pdf

  14. NVIDIA (2009) Cuda programming guide 2.3. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf

  15. NVIDIA (2009) Nvidia cuda c programming best practices guide. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf

  16. NVIDIA (2009) Nvidia’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

  17. NVIDIA (2011) Cuda cusparse library

  18. NVIDIA (2011) Cuda programming guide 4.0. http://developer.download.nvidia.com/compute/cuda/4_0/toolkit/docs/CUDA_C_Programming_Guide.pdf

  19. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  20. Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech rep

  21. Volkov V, Demmel J (2008) Lu, qr and Cholesky factorizations using vector capabilities of gpus. Tech Rep UCB/EECS-2008-49, EECS Department. University of California, Berkeley

  22. Yang C, Ge Z, Chen J, Wang F, Wu Q (2009) Accelerating pqmrcgstab algorithm on gpu. In: UCHPC-MAW ’09: proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop. ACM Press, New York, pp 11–16. doi:10.1145/1531666.1531670

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yigitcan Aksari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aksari, Y., Artuner, H. Forward and back substitution algorithms on GPU: a case study on modified incomplete Cholesky Preconditioner for three-dimensional finite difference method. J Supercomput 62, 550–572 (2012). https://doi.org/10.1007/s11227-011-0736-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0736-8

Keywords

Navigation