Abstract
Tridiagonal system solver is a basic kernel and has been well-supported in mainstream numerical libraries. The purpose of this paper is to devise an efficient parallel algorithm to solve a large-scale tridiagonal system. Based on the performance analysis of the classic Thomas algorithm and matrix splitting method, we propose a parallel Thomas split (PTS) algorithm. Compared with the matrix splitting method, the PTS algorithm can achieve an acceleration of 10.34\(\times \). Furthermore, we propose a Sunway parallel Thomas split (swPTS) algorithm based on the sw26010pro manycore processor. In the swPTS algorithm, we propose a specific data partitioning scheme to implement MPI+Athread parallelism. In the reduced set of equations, a new reduction approach for the Sunway architecture is proposed. Experiments show that the parallel elimination stage of our swPTS algorithm achieves up to 38.31\(\times \) speedup over a PTS algorithm, and overall reaches 5.74\(\times \) speedup over a Thomas algorithm.
Similar content being viewed by others
Availability of data and materials
The data used to support the findings of this study are available from the corresponding author upon request.
References
Lefohn A, Davis UC, Owens J, Davis UC (2006) Interactive depth of field using simulated diffusion. Pixar Animation Studios Tech Report
Sengupta S, Harris M, Yao Z, Owens J.D (2007) Scan primitives for gpu computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware
Kim J, Moin P (1985) Application of a fractional-step method to incompressible navier-stokes equations. J Comput Phys 59(2):308–323
Kass M, Miller G (1990) Rapid, stable fluid dynamics for computer graphics. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, pp 49–57
Vanka S (2013) 2012 freeman scholar lecture: computational fluid dynamics on graphics processing units. J Fluids Eng 135(6):061401
Tay WC, Tan EL (2014) Pentadiagonal alternating-direction-implicit finite-difference time-domain method for two-dimensional schrödinger equation. Comput Phys Commun 185(7):1886–1892
Li LZ, Sun H-W, Tam S-C (2015) A spatial sixth-order alternating direction implicit method for two-dimensional cubic nonlinear schrödinger equations. Comput Phys Commun 187:38–48
Egloff D (2012) Chapter 23 - pricing financial derivatives with high performance finite difference solvers on gpus. In: Hwu, W.-m.W. (eds.) GPU Computing Gems Jade Edition. Applications of GPU Computing Series, pp. 309–322. Morgan Kaufmann, Boston
Sak H, Özekici S (2007) İlkay Bodurog\(^{\sim }\)lu: parallel computing in Asian option pricing. Parallel Comput 33(2):92–108
Zhukov VT, Novikova ND, Feodoritova OB (2014) Parallel multigrid method for solving elliptic equations. Math Models Comput Simul 6(4):425–434
Göddeke D, Strzodka R (2010) Cyclic reduction tridiagonal solvers on gpus applied to mixed-precision multigrid. IEEE Trans Parallel Distrib Syst 22(1):22–32
Wang H (1981) A parallel method for tridiagonal equations. ACM Trans Math Softw (TOMS) 7(2):170–183
Li F, Liu X, Liu Y, Zhao P, Yang Y, Shang H, Sun W, Wang Z, Dong E, Chen D (2021) Sw_qsim: A minimize-memory quantum simulator with high-performance on a new sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13
Thomas LH (1949) Elliptic problems in linear differential equations over a network
Valero-Lara P, Martínez-Pérez I, Sirvent R, Martorell X, Pena AJ (2018) cuthomasbatch and cuthomasvbatch, cuda routines to compute batch of tridiagonal systems on nvidia gpus. Concurr Comput Pract Exp 30(24):4909
Souri M, Akbarzadeh P, Darian HM (2020) Parallel thomas approach development for solving tridiagonal systems in gpu programming- steady and unsteady flow simulation. Mech Ind 21(3):303
Parker JT, Hill PA, Dickinson D, Dudson BD (2022) Parallel tridiagonal matrix inversion with a hybrid multigrid-thomas algorithm method. J Comput Appl Math 399:113706
Kim K-H, Kang J-H, Pan X, Choi J-I (2021) Pascal_tdma: a library of parallel and scalable solvers for massive tridiagonal systems. Comput Phys Commun 260:107722
Buzbee BL, Golub GH, Nielson CW (1970) On direct methods for solving poisson’s equations. SIAM J Numer Anal 7(4):627–656
Hockney RW (1965) A fast direct solution of Poisson’s equation using Fourier analysis. J ACM 12(1):95–113
Stone HS (1973) An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J ACM 20(1):27–38
Hockney RW, Jesshope CR (1981) Parallel computers : architecture, programming, and algorithms. Adam Hilger
Müller SM, Scheerer D (1991) A method to parallelize tridiagonal solvers. Parallel Comput 17(2–3):181–188
Kim H.-S, Wu S, Chang L.-w, Wen-mei W.H (2011) A scalable tridiagonal solver for gpus. In: 2011 International Conference on Parallel Processing, pp 444–453 . IEEE
Liu K, Wang X, Xue W (2022) Model guided algorithm optimization for tridiagonal solver on many-core architectures. CCF Transactions on High Performance Computing, 1–13
Li S, Rouet F-H, Liu J, Huang C, Gao X, Chi X (2018) An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures. J Comput Appl Math 344:512–520
Chang L-W, Stratton JA, Kim H-S, Hwu W-MW (2012) A scalable, numerically stable, high-performance tridiagonal solver using gpus. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–11. IEEE
Xiao G, Li K, Chen Y, He W, Zomaya AY, Li T (2019) Caspmv: a customized and accelerative spmv framework for the sunway taihulight. IEEE Trans Parallel Distrib Syst 32(1):131–146
Zhong X, Li M, Yang H, Liu Y, Qian D (2018) swmr: a framework for accelerating mapreduce applications on sunway taihulight. IEEE Trans Emerg Topics Computi 9(2):1020–1030
Xiao Z, Liu X, Xu J, Sun Q, Gan L (2021) Highly scalable parallel genetic algorithm on sunway many-core processors. Future Gener Comput Syst 114:679–691
Liu S, Gao J, Liu X, Huang Z, Zheng T (2021) Establishing high performance ai ecosystem on sunway platform. CCF Trans High Perform Comput 3:224–241
Shang H, Chen X, Gao X, Lin R, Wang L, Li F, Xiao Q, Xu L, Sun Q, Zhu L (2021) Tensorkmc: kinetic monte carlo simulation of 50 trillion atoms driven by deep learning on a new generation of sunway supercomputer
Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X. Enabling and scaling the hpcg benchmark on the newest generation sunway supercomputer with 42 million heterogeneous cores. In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swsuperlu: a highly scalable sparse direct solver on sunway manycore architecture. J Supercomput 78(9):11441–11463
Fang J, Fu H, Zhao W, Chen B, Yang G (2017) swdnn: a library for accelerating deep learning applications on sunway taihulight. IEEE
Liu F, Ma W, Zhao Y, Chen D, Hu Y, Lu Q, Yin W, Yuan X, Jiang L, Yan H, et al (2022) xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor. CCF Transactions on High Performance Computing, 1–16
Wang X, Liu W, Wei X, Li W (2018) swsptrsv: a fast sparse triangular solve with sparse level tile layout on sunway architectures. In: the 23rd ACM SIGPLAN Symposium
Higham N.J. Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898718027
Ji J, Huang K, Suo X, Zhao J, Yan W (2022) Research on parallel algorithms for solving tridiagonal sparse linear equations. In: International Conference on Algorithms, Microchips and Network Applications, vol. 12176, pp. 192–198 . SPIE
Kan L, Xinliang W, Ping XU, Wei X (2019) Parallel tridiagonal solver on sunway many-core processors. J Front Comput Sci Technol 13(10):1654–1663. https://doi.org/10.3778/j.issn.1673-9418.1811030
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).
Funding
This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).
Author information
Authors and Affiliations
Contributions
MT conceived and designed the algorithm, and QL implemented and tested it. MT and QL wrote the main manuscript text and others prepared Figures 1–10. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
In this paper, the declaration is not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tian, M., Liu, Q., Pan, J. et al. swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors. J Supercomput 80, 4682–4706 (2024). https://doi.org/10.1007/s11227-023-05641-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05641-1