swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors

Tian, Min; Liu, Qi; Pan, Jingshan; Gou, Ying; Zhang, Zanjun

doi:10.1007/s11227-023-05641-1

swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors

Published: 19 September 2023

Volume 80, pages 4682–4706, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Min Tian^1,2,
Qi Liu^1,2,
Jingshan Pan^1,2,
Ying Gou^1,2 &
…
Zanjun Zhang^1,2,3

157 Accesses
Explore all metrics

Abstract

Tridiagonal system solver is a basic kernel and has been well-supported in mainstream numerical libraries. The purpose of this paper is to devise an efficient parallel algorithm to solve a large-scale tridiagonal system. Based on the performance analysis of the classic Thomas algorithm and matrix splitting method, we propose a parallel Thomas split (PTS) algorithm. Compared with the matrix splitting method, the PTS algorithm can achieve an acceleration of 10.34\(\times \). Furthermore, we propose a Sunway parallel Thomas split (swPTS) algorithm based on the sw26010pro manycore processor. In the swPTS algorithm, we propose a specific data partitioning scheme to implement MPI+Athread parallelism. In the reduced set of equations, a new reduction approach for the Sunway architecture is proposed. Experiments show that the parallel elimination stage of our swPTS algorithm achieves up to 38.31\(\times \) speedup over a PTS algorithm, and overall reaches 5.74\(\times \) speedup over a Thomas algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Exudyn – a C++-based Python package for flexible multibody systems

Article Open access 09 October 2023

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

References

Lefohn A, Davis UC, Owens J, Davis UC (2006) Interactive depth of field using simulated diffusion. Pixar Animation Studios Tech Report
Sengupta S, Harris M, Yao Z, Owens J.D (2007) Scan primitives for gpu computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware
Kim J, Moin P (1985) Application of a fractional-step method to incompressible navier-stokes equations. J Comput Phys 59(2):308–323
Article MathSciNet ADS Google Scholar
Kass M, Miller G (1990) Rapid, stable fluid dynamics for computer graphics. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, pp 49–57
Vanka S (2013) 2012 freeman scholar lecture: computational fluid dynamics on graphics processing units. J Fluids Eng 135(6):061401
Article Google Scholar
Tay WC, Tan EL (2014) Pentadiagonal alternating-direction-implicit finite-difference time-domain method for two-dimensional schrödinger equation. Comput Phys Commun 185(7):1886–1892
Article MathSciNet CAS ADS Google Scholar
Li LZ, Sun H-W, Tam S-C (2015) A spatial sixth-order alternating direction implicit method for two-dimensional cubic nonlinear schrödinger equations. Comput Phys Commun 187:38–48
Article MathSciNet CAS ADS Google Scholar
Egloff D (2012) Chapter 23 - pricing financial derivatives with high performance finite difference solvers on gpus. In: Hwu, W.-m.W. (eds.) GPU Computing Gems Jade Edition. Applications of GPU Computing Series, pp. 309–322. Morgan Kaufmann, Boston
Sak H, Özekici S (2007) İlkay Bodurog\(^{\sim }\)lu: parallel computing in Asian option pricing. Parallel Comput 33(2):92–108
Article MathSciNet Google Scholar
Zhukov VT, Novikova ND, Feodoritova OB (2014) Parallel multigrid method for solving elliptic equations. Math Models Comput Simul 6(4):425–434
Article MathSciNet Google Scholar
Göddeke D, Strzodka R (2010) Cyclic reduction tridiagonal solvers on gpus applied to mixed-precision multigrid. IEEE Trans Parallel Distrib Syst 22(1):22–32
Article Google Scholar
Wang H (1981) A parallel method for tridiagonal equations. ACM Trans Math Softw (TOMS) 7(2):170–183
Article MathSciNet Google Scholar
Li F, Liu X, Liu Y, Zhao P, Yang Y, Shang H, Sun W, Wang Z, Dong E, Chen D (2021) Sw_qsim: A minimize-memory quantum simulator with high-performance on a new sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13
Thomas LH (1949) Elliptic problems in linear differential equations over a network
Valero-Lara P, Martínez-Pérez I, Sirvent R, Martorell X, Pena AJ (2018) cuthomasbatch and cuthomasvbatch, cuda routines to compute batch of tridiagonal systems on nvidia gpus. Concurr Comput Pract Exp 30(24):4909
Article Google Scholar
Souri M, Akbarzadeh P, Darian HM (2020) Parallel thomas approach development for solving tridiagonal systems in gpu programming- steady and unsteady flow simulation. Mech Ind 21(3):303
Article CAS Google Scholar
Parker JT, Hill PA, Dickinson D, Dudson BD (2022) Parallel tridiagonal matrix inversion with a hybrid multigrid-thomas algorithm method. J Comput Appl Math 399:113706
Article MathSciNet Google Scholar
Kim K-H, Kang J-H, Pan X, Choi J-I (2021) Pascal_tdma: a library of parallel and scalable solvers for massive tridiagonal systems. Comput Phys Commun 260:107722
Article CAS Google Scholar
Buzbee BL, Golub GH, Nielson CW (1970) On direct methods for solving poisson’s equations. SIAM J Numer Anal 7(4):627–656
Article MathSciNet Google Scholar
Hockney RW (1965) A fast direct solution of Poisson’s equation using Fourier analysis. J ACM 12(1):95–113
Article MathSciNet Google Scholar
Stone HS (1973) An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J ACM 20(1):27–38
Article MathSciNet Google Scholar
Hockney RW, Jesshope CR (1981) Parallel computers : architecture, programming, and algorithms. Adam Hilger
Google Scholar
Müller SM, Scheerer D (1991) A method to parallelize tridiagonal solvers. Parallel Comput 17(2–3):181–188
Article Google Scholar
Kim H.-S, Wu S, Chang L.-w, Wen-mei W.H (2011) A scalable tridiagonal solver for gpus. In: 2011 International Conference on Parallel Processing, pp 444–453 . IEEE
Liu K, Wang X, Xue W (2022) Model guided algorithm optimization for tridiagonal solver on many-core architectures. CCF Transactions on High Performance Computing, 1–13
Li S, Rouet F-H, Liu J, Huang C, Gao X, Chi X (2018) An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures. J Comput Appl Math 344:512–520
Article MathSciNet Google Scholar
Chang L-W, Stratton JA, Kim H-S, Hwu W-MW (2012) A scalable, numerically stable, high-performance tridiagonal solver using gpus. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–11. IEEE
Xiao G, Li K, Chen Y, He W, Zomaya AY, Li T (2019) Caspmv: a customized and accelerative spmv framework for the sunway taihulight. IEEE Trans Parallel Distrib Syst 32(1):131–146
Article Google Scholar
Zhong X, Li M, Yang H, Liu Y, Qian D (2018) swmr: a framework for accelerating mapreduce applications on sunway taihulight. IEEE Trans Emerg Topics Computi 9(2):1020–1030
Article Google Scholar
Xiao Z, Liu X, Xu J, Sun Q, Gan L (2021) Highly scalable parallel genetic algorithm on sunway many-core processors. Future Gener Comput Syst 114:679–691
Article Google Scholar
Liu S, Gao J, Liu X, Huang Z, Zheng T (2021) Establishing high performance ai ecosystem on sunway platform. CCF Trans High Perform Comput 3:224–241
Article Google Scholar
Shang H, Chen X, Gao X, Lin R, Wang L, Li F, Xiao Q, Xu L, Sun Q, Zhu L (2021) Tensorkmc: kinetic monte carlo simulation of 50 trillion atoms driven by deep learning on a new generation of sunway supercomputer
Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X. Enabling and scaling the hpcg benchmark on the newest generation sunway supercomputer with 42 million heterogeneous cores. In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swsuperlu: a highly scalable sparse direct solver on sunway manycore architecture. J Supercomput 78(9):11441–11463
Article Google Scholar
Fang J, Fu H, Zhao W, Chen B, Yang G (2017) swdnn: a library for accelerating deep learning applications on sunway taihulight. IEEE
Liu F, Ma W, Zhao Y, Chen D, Hu Y, Lu Q, Yin W, Yuan X, Jiang L, Yan H, et al (2022) xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor. CCF Transactions on High Performance Computing, 1–16
Wang X, Liu W, Wei X, Li W (2018) swsptrsv: a fast sparse triangular solve with sparse level tile layout on sunway architectures. In: the 23rd ACM SIGPLAN Symposium
Higham N.J. Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898718027
Ji J, Huang K, Suo X, Zhao J, Yan W (2022) Research on parallel algorithms for solving tridiagonal sparse linear equations. In: International Conference on Algorithms, Microchips and Network Applications, vol. 12176, pp. 192–198 . SPIE
Kan L, Xinliang W, Ping XU, Wei X (2019) Parallel tridiagonal solver on sunway many-core processors. J Front Comput Sci Technol 13(10):1654–1663. https://doi.org/10.3778/j.issn.1673-9418.1811030
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).

Funding

This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).

Author information

Authors and Affiliations

Key Laboratory of Computing Power Network and Information Security, Ministry of Education Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
Min Tian, Qi Liu, Jingshan Pan, Ying Gou & Zanjun Zhang
Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan, China
Min Tian, Qi Liu, Jingshan Pan, Ying Gou & Zanjun Zhang
Shanxi Key Laboratory of Large Scale Electromagnetic Computing, Xidian University, Xi’an, Shaanxi, China
Zanjun Zhang

Authors

Min Tian
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingshan Pan
View author publications
You can also search for this author in PubMed Google Scholar
Ying Gou
View author publications
You can also search for this author in PubMed Google Scholar
Zanjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MT conceived and designed the algorithm, and QL implemented and tested it. MT and QL wrote the main manuscript text and others prepared Figures 1–10. All authors reviewed the manuscript.

Corresponding author

Correspondence to Qi Liu.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

In this paper, the declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, M., Liu, Q., Pan, J. et al. swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors. J Supercomput 80, 4682–4706 (2024). https://doi.org/10.1007/s11227-023-05641-1

Download citation

Accepted: 30 August 2023
Published: 19 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11227-023-05641-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Shared Memory Parallelism in Modern C++ and HPX

Exudyn – a C++-based Python package for flexible multibody systems

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Shared Memory Parallelism in Modern C++ and HPX

Exudyn – a C++-based Python package for flexible multibody systems

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation