Parallel ILU preconditioners in GPU computation

Chen, Yan; Tian, Xuhong; Liu, Hui; Chen, Zhangxin; Yang, Bo; Liao, Wenyuan; Zhang, Peng; He, Ruijian; Yang, Min

doi:10.1007/s00500-017-2764-7

Parallel ILU preconditioners in GPU computation

Methodologies and Application
Published: 12 August 2017

Volume 22, pages 8187–8205, (2018)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yan Chen³,
Xuhong Tian³,
Hui Liu¹,
Zhangxin Chen¹,
Bo Yang¹,
Wenyuan Liao²,
Peng Zhang⁴,
Ruijian He¹ &
…
Min Yang¹

770 Accesses
5 Citations
Explore all metrics

Abstract

Accelerating large-scale linear solvers is always crucial for scientific research and industrial applications. In this regard, preconditioners play a key role in improving the performance of iterative linear solvers. This paper presents a summary and review of our work about the development of parallel ILU preconditioners on GPUs. The mechanisms of ILU(0), ILU(k), ILUT, enhanced ILUT, and block-wise ILU(k) are reviewed and analyzed, which give a clear guidance in the development of iterative linear solvers. ILU(0) is the most commonly used preconditioner, and the nonzero pattern of its matrix is exactly the same as the original matrix to be solved. ILU(k) uses k levels to control the pattern of its preconditioner matrix. ILUT selects entries for its preconditioner matrix by setting thresholds without considering its original matrix pattern. In addition to point-wise ILU preconditioners, a block-wise ILU(k) preconditioner is designed delicately in support of block-wise matrices. In implementation, the RAS (Restricted Additive Schwarz) method is adopted to optimize the parallel structure of a preconditioner matrix. Coupling with the configuration parameters of ILU preconditioners, a complex situation appears in the parallel solution process, so decoupled algorithms are adopted. These algorithms are implemented and tested on NVIDIA GPUs. The experiment results show that a single-GPU implementation can speed up an ILU preconditioner by a factor of 10, compared to traditional CPU implementation. The results also show that the ILU(0) has better speedup than ILU(k) but slower convergence than ILU(k). Level k of ILU(k) and threshold (p, t) of ILUT are effective adjustment factors for controlling the equilibrium point between acceleration and convergence for ILU(k) and ILUT, respectively. All these ILU preconditioners are characterized and compared in this work, which shows a clear picture and numerical insights for practitioners in the ILU family.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems

References

Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Vander VH (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, Philadelphia
Book Google Scholar
Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report, NVR-2008-004, NVIDIA Corporation
Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the supercomputing
Bell N, Dalton S, Olson L (2011) Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods, NVIDIA Technical Report NVR-2011-002
Cai X-C, Sarkis M (1999) A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J Sci Comput 21:792–797
Article MathSciNet Google Scholar
Cao H, Tchelepi HA, Wallis JR, Yardumian HE (2005) Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE annual technical conference and exhibition
Chen Z, Zhang Y (2008) Development, analysis and numerical tests of a compositional reservoir simulator. Int J Numer Anal Model 4:86–100
MathSciNet MATH Google Scholar
Chen Z, Ewing RE, Lazarov RD, Maliassov S, Kuznetsov YA (1996) Multilevel preconditioners for mixed methods for second order elliptic problems. Numer Linear Algebra Appl 3(5):427–453
Article MathSciNet Google Scholar
Chen Z, Huan G, Ma Y (2006) Computational methods for multiphase flows in porous media. In: The computational science and engineering series, vol 2. SIAM, Philadelphia
Chen Z, Liu H, Yang B (2013a) Parallel triangular solvers on GPU. In: Proceedings of international workshop on data-intensive scientific discovery (DISD), Shanghai University, Shanghai, China
Chen Z, Liu H, Yu S (2013b) Development of algebraic multigrid solvers using GPUs, SPE-163661-MS. In: SPE reservoir simulation symposium, 18–20 February. The Woodlands, TX, USA
Chen Z, Liu H, Yang B (2015) Accelerating iterative linear solvers using multiple graphical processing units. Int J Comput Math 97:1422–1438
Article MathSciNet Google Scholar
Chen Y, Liu H, Wang K, Chen Z, Zhang P (2016) Large-scale reservoir simulations on parallel computers. In: Proceedings of the 2nd IEEE international conference on high performance and smart computing (HPSC 2016), New York, NY, April 9–10. doi:10.1109/BigDataSecurity-HPSC-IDS.2016.20
Davis TA (1994) University of Florida sparse matrix collection, NA digest
Deng W, Zhao H, Liu J, Yan X, Li Y, Yin L, Ding C (2015) An improved CACO algorithm based on adaptive method and multi-variant strategies. Soft Comput 19(3):701–713
Article Google Scholar
Deng W, Zhao H, Zou L, Li G, Yang X, Wu D (2016) A novel collaborative optimization algorithm in solving complex optimization problems. Soft Comput. doi:10.1007/s00500-016-2071-8
Article Google Scholar
Fu Z, Sun X, Liu Q, Zhou L, Shu J (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun E98–B(1):190–200. doi:10.1587/transcom.E98.B.190
Article Google Scholar
Fu Z, Ren K, Shu J, Sun X, Huang F (2016) Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans Parallel Distrib Syst 27(9):2546–2559. doi:10.1109/TPDS.2015.2506573
Article Google Scholar
Gu B, Sun X, Victor S (2016) Sheng, structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2544779
Article Google Scholar
Haase G, Liebmann M, Douglas CC, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units, high performance computing and applications, pp 38–47
Google Scholar
Heuveline et al. V (2011) Enhanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs, The Power(q)-pattern Method, EMCL Preprint 2011-08
Hu X, Liu W, Qin G, Xu J, Yan Y, Zhang C (2011) Development of a fast auxiliary subspace pre-conditioner for numerical reservoir simulators. In: SPE reservoir characterisation and simulation conference and exhibition, 9C11 October, Abu Dhabi, UAE, SPE-148388-MS
Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, ISBN: 978-0-12-381472-2
Klie H, Sudan H, Li R, Saad Y (2011) Exploiting capabilities of many core platforms in reservoir simulation. In: SPE RSS reservoir simulation symposium, 21–23 February
Kong Y, Zhang M, Ye D (2016) A belief propagation-based method for task allocation in open and dynamic cloud environments. Knowl Based Syst 115:123–132. doi:10.1016/j.knosys.2016.10.016
Article Google Scholar
Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers, Technical Report umsi-2010-112. University of Minnesota, Minneapolis, MN, Minnesota Supercomputer Institute
Google Scholar
Liu H, Yu S, Chen Z, Hsieh B, Shao L (2012) Sparse matrix-vector multiplication on NVIDIA GPU. Int J Numer Anal Model Ser B 3(2):185–191
MathSciNet MATH Google Scholar
Liu H, Chen Z, Yu S, Hsieh B, Shao L (2014) Development of a restricted additive Schwarz preconditioner for sparse linear systems on NVIDIA GPU. Int J Numer Anal Model Ser B Comput Inf 5(1–2):13–20
MathSciNet MATH Google Scholar
Liu H, Yang B, Chen Z (2015) Accelerating the GMRES solver with block ILU (K) preconditioner on GPUs in reservoir simulation. J Geol Geosci 4:199. doi:10.4172/2329-6755.1000199
Liu H, Wang K, Chen Z (2016a) A family of constrained pressure residual preconditioners for parallel reservoir simulations. Numer Linear Algebra Appl 23(1):120–146
Article MathSciNet Google Scholar
Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016b) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012. doi:10.1002/sec.1582
Article Google Scholar
Liu H, Zhang P, Wang K, Yang B, Chen Z (2016c) Performance and scalability analysis for parallel reservoir simulations on three supercomputer architectures. In: Proceedings of the 2016 XSEDE conference: diversity, big data, & science at scale, Miami, FL, USA. doi:10.1145/2949550.2949577
Lukarski D, Anzt H, Tomov S, Dongarra J (2014) Multi-elimination ILU preconditioners on GPUs, Technical report UT-CS-14-723. University of Tennessee, Innovative Computing Laboratory
Google Scholar
NVIDIA Corporation (2008) CUSP: generic parallel algorithms for sparse matrix and graph. http://code.google.com/p/cusp-library/
NVIDIA Corporation (2010) Nvidia CUDA programming guide (version 3.2)
NVIDIA Developer Zone (2008) http://developer.nvidia.com/about-cuda
NVIDIA Official Website (2008) http://www.nvidia.com/object/cuda_home_new.html
Qu Z, Keeney J, Robitzsch S, Zaman F, Wang X (2016) Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun 13(7):108–116. doi:10.1109/CC.2016.7559082
Article Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Book Google Scholar
Tian Q, Chen S (2017) Cross-heterogeneous-database age estimation through correlation representation learning. Neurocomputing 238:286–295
Article Google Scholar
Vinsome PKW (1976) An iterative method for solving sparse sets of simultaneous linear equations. In: SPE symposium on numerical simulation of reservoir performance, Los Angeles, CA
Xue Y, Jiang J, Zhao B, Ma T (2017) A self-adaptive articial bee colony algorithm based on global best for global optimization. Soft Comput. doi:10.1007/s00500-017-2547-1
Article Google Scholar
Yang B, Liu H, Chen Z (2016) GPU-accelerated preconditioned GMRES solver. In: The 2nd IEEE international conference on high performance and smart computing, IEEE HPSC 2016, 8–10 April, Columbia University, New York, USA
Yuan C, Xia Z, Sun X (2017) Coverless image steganography based on SIFT and BOF. J Internet Technol 18(2):209–216
Google Scholar
Zhang P, Gao Y (2015) Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations. Lect Notes Comput Sci 9137(20):17–30
Article Google Scholar
Zhou Z, Yang CN, Chen B, Sun X, Liu Q, Wu QMJ (2016) Effective and efficient image copy detection with resistance to arbitrary rotation. IEICE Trans Inf Syst E99–D(6):1531–1540. doi:10.1587/transinf.2015EDP7341
Article Google Scholar

Download references

Acknowledgements

The support of Department of Chemical and Petroleum Engineering, University of Calgary and Reservoir Simulation Group, is gratefully acknowledged. The research is partly supported by NSERC/AIEE/Foundation CMG and AITF Chairs.

Author information

Authors and Affiliations

Department of Chemical and Petroleum Engineering, University of Calgary, Calgary, AB, T2N 1N4, Canada
Hui Liu, Zhangxin Chen, Bo Yang, Ruijian He & Min Yang
Department of Mathematics and Statistics, University of Calgary, Calgary, AB, T2N 1N4, Canada
Wenyuan Liao
College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, China
Yan Chen & Xuhong Tian
Biomedical Engineering Department, Stony Brook University, Stony Brook, NY, 11794, USA
Peng Zhang

Authors

Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuhong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Hui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhangxin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyuan Liao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruijian He
View author publications
You can also search for this author in PubMed Google Scholar
Min Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhangxin Chen or Bo Yang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Tian, X., Liu, H. et al. Parallel ILU preconditioners in GPU computation. Soft Comput 22, 8187–8205 (2018). https://doi.org/10.1007/s00500-017-2764-7

Download citation

Published: 12 August 2017
Issue Date: December 2018
DOI: https://doi.org/10.1007/s00500-017-2764-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Parallel ILU preconditioners in GPU computation

Abstract

Access this article

Similar content being viewed by others

Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel ILU preconditioners in GPU computation

Abstract

Access this article

Similar content being viewed by others

Hybrid Parallel ILU Preconditioner in Linear Solver Library GaspiLS

Development of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation