Skip to main content
Log in

swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Tridiagonal system solver is a basic kernel and has been well-supported in mainstream numerical libraries. The purpose of this paper is to devise an efficient parallel algorithm to solve a large-scale tridiagonal system. Based on the performance analysis of the classic Thomas algorithm and matrix splitting method, we propose a parallel Thomas split (PTS) algorithm. Compared with the matrix splitting method, the PTS algorithm can achieve an acceleration of 10.34\(\times \). Furthermore, we propose a Sunway parallel Thomas split (swPTS) algorithm based on the sw26010pro manycore processor. In the swPTS algorithm, we propose a specific data partitioning scheme to implement MPI+Athread parallelism. In the reduced set of equations, a new reduction approach for the Sunway architecture is proposed. Experiments show that the parallel elimination stage of our swPTS algorithm achieves up to 38.31\(\times \) speedup over a PTS algorithm, and overall reaches 5.74\(\times \) speedup over a Thomas algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

References

  1. Lefohn A, Davis UC, Owens J, Davis UC (2006) Interactive depth of field using simulated diffusion. Pixar Animation Studios Tech Report

  2. Sengupta S, Harris M, Yao Z, Owens J.D (2007) Scan primitives for gpu computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware

  3. Kim J, Moin P (1985) Application of a fractional-step method to incompressible navier-stokes equations. J Comput Phys 59(2):308–323

    Article  MathSciNet  ADS  Google Scholar 

  4. Kass M, Miller G (1990) Rapid, stable fluid dynamics for computer graphics. In: Proceedings of the 17th Annual Conference on Computer Graphics and Interactive Techniques, pp 49–57

  5. Vanka S (2013) 2012 freeman scholar lecture: computational fluid dynamics on graphics processing units. J Fluids Eng 135(6):061401

    Article  Google Scholar 

  6. Tay WC, Tan EL (2014) Pentadiagonal alternating-direction-implicit finite-difference time-domain method for two-dimensional schrödinger equation. Comput Phys Commun 185(7):1886–1892

    Article  MathSciNet  CAS  ADS  Google Scholar 

  7. Li LZ, Sun H-W, Tam S-C (2015) A spatial sixth-order alternating direction implicit method for two-dimensional cubic nonlinear schrödinger equations. Comput Phys Commun 187:38–48

    Article  MathSciNet  CAS  ADS  Google Scholar 

  8. Egloff D (2012) Chapter 23 - pricing financial derivatives with high performance finite difference solvers on gpus. In: Hwu, W.-m.W. (eds.) GPU Computing Gems Jade Edition. Applications of GPU Computing Series, pp. 309–322. Morgan Kaufmann, Boston

  9. Sak H, Özekici S (2007) İlkay Bodurog\(^{\sim }\)lu: parallel computing in Asian option pricing. Parallel Comput 33(2):92–108

    Article  MathSciNet  Google Scholar 

  10. Zhukov VT, Novikova ND, Feodoritova OB (2014) Parallel multigrid method for solving elliptic equations. Math Models Comput Simul 6(4):425–434

    Article  MathSciNet  Google Scholar 

  11. Göddeke D, Strzodka R (2010) Cyclic reduction tridiagonal solvers on gpus applied to mixed-precision multigrid. IEEE Trans Parallel Distrib Syst 22(1):22–32

    Article  Google Scholar 

  12. Wang H (1981) A parallel method for tridiagonal equations. ACM Trans Math Softw (TOMS) 7(2):170–183

    Article  MathSciNet  Google Scholar 

  13. Li F, Liu X, Liu Y, Zhao P, Yang Y, Shang H, Sun W, Wang Z, Dong E, Chen D (2021) Sw_qsim: A minimize-memory quantum simulator with high-performance on a new sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–13

  14. Thomas LH (1949) Elliptic problems in linear differential equations over a network

  15. Valero-Lara P, Martínez-Pérez I, Sirvent R, Martorell X, Pena AJ (2018) cuthomasbatch and cuthomasvbatch, cuda routines to compute batch of tridiagonal systems on nvidia gpus. Concurr Comput Pract Exp 30(24):4909

    Article  Google Scholar 

  16. Souri M, Akbarzadeh P, Darian HM (2020) Parallel thomas approach development for solving tridiagonal systems in gpu programming- steady and unsteady flow simulation. Mech Ind 21(3):303

    Article  CAS  Google Scholar 

  17. Parker JT, Hill PA, Dickinson D, Dudson BD (2022) Parallel tridiagonal matrix inversion with a hybrid multigrid-thomas algorithm method. J Comput Appl Math 399:113706

    Article  MathSciNet  Google Scholar 

  18. Kim K-H, Kang J-H, Pan X, Choi J-I (2021) Pascal_tdma: a library of parallel and scalable solvers for massive tridiagonal systems. Comput Phys Commun 260:107722

    Article  CAS  Google Scholar 

  19. Buzbee BL, Golub GH, Nielson CW (1970) On direct methods for solving poisson’s equations. SIAM J Numer Anal 7(4):627–656

    Article  MathSciNet  Google Scholar 

  20. Hockney RW (1965) A fast direct solution of Poisson’s equation using Fourier analysis. J ACM 12(1):95–113

    Article  MathSciNet  Google Scholar 

  21. Stone HS (1973) An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J ACM 20(1):27–38

    Article  MathSciNet  Google Scholar 

  22. Hockney RW, Jesshope CR (1981) Parallel computers : architecture, programming, and algorithms. Adam Hilger

    Google Scholar 

  23. Müller SM, Scheerer D (1991) A method to parallelize tridiagonal solvers. Parallel Comput 17(2–3):181–188

    Article  Google Scholar 

  24. Kim H.-S, Wu S, Chang L.-w, Wen-mei W.H (2011) A scalable tridiagonal solver for gpus. In: 2011 International Conference on Parallel Processing, pp 444–453 . IEEE

  25. Liu K, Wang X, Xue W (2022) Model guided algorithm optimization for tridiagonal solver on many-core architectures. CCF Transactions on High Performance Computing, 1–13

  26. Li S, Rouet F-H, Liu J, Huang C, Gao X, Chi X (2018) An efficient hybrid tridiagonal divide-and-conquer algorithm on distributed memory architectures. J Comput Appl Math 344:512–520

    Article  MathSciNet  Google Scholar 

  27. Chang L-W, Stratton JA, Kim H-S, Hwu W-MW (2012) A scalable, numerically stable, high-performance tridiagonal solver using gpus. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–11. IEEE

  28. Xiao G, Li K, Chen Y, He W, Zomaya AY, Li T (2019) Caspmv: a customized and accelerative spmv framework for the sunway taihulight. IEEE Trans Parallel Distrib Syst 32(1):131–146

    Article  Google Scholar 

  29. Zhong X, Li M, Yang H, Liu Y, Qian D (2018) swmr: a framework for accelerating mapreduce applications on sunway taihulight. IEEE Trans Emerg Topics Computi 9(2):1020–1030

    Article  Google Scholar 

  30. Xiao Z, Liu X, Xu J, Sun Q, Gan L (2021) Highly scalable parallel genetic algorithm on sunway many-core processors. Future Gener Comput Syst 114:679–691

    Article  Google Scholar 

  31. Liu S, Gao J, Liu X, Huang Z, Zheng T (2021) Establishing high performance ai ecosystem on sunway platform. CCF Trans High Perform Comput 3:224–241

    Article  Google Scholar 

  32. Shang H, Chen X, Gao X, Lin R, Wang L, Li F, Xiao Q, Xu L, Sun Q, Zhu L (2021) Tensorkmc: kinetic monte carlo simulation of 50 trillion atoms driven by deep learning on a new generation of sunway supercomputer

  33. Zhu Q, Luo H, Yang C, Ding M, Yin W, Yuan X. Enabling and scaling the hpcg benchmark on the newest generation sunway supercomputer with 42 million heterogeneous cores. In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis

  34. Tian M, Wang J, Zhang Z, Du W, Pan J, Liu T (2022) swsuperlu: a highly scalable sparse direct solver on sunway manycore architecture. J Supercomput 78(9):11441–11463

    Article  Google Scholar 

  35. Fang J, Fu H, Zhao W, Chen B, Yang G (2017) swdnn: a library for accelerating deep learning applications on sunway taihulight. IEEE

  36. Liu F, Ma W, Zhao Y, Chen D, Hu Y, Lu Q, Yin W, Yuan X, Jiang L, Yan H, et al (2022) xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor. CCF Transactions on High Performance Computing, 1–16

  37. Wang X, Liu W, Wei X, Li W (2018) swsptrsv: a fast sparse triangular solve with sparse level tile layout on sunway architectures. In: the 23rd ACM SIGPLAN Symposium

  38. Higham N.J. Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898718027

  39. Ji J, Huang K, Suo X, Zhao J, Yan W (2022) Research on parallel algorithms for solving tridiagonal sparse linear equations. In: International Conference on Algorithms, Microchips and Network Applications, vol. 12176, pp. 192–198 . SPIE

  40. Kan L, Xinliang W, Ping XU, Wei X (2019) Parallel tridiagonal solver on sunway many-core processors. J Front Comput Sci Technol 13(10):1654–1663. https://doi.org/10.3778/j.issn.1673-9418.1811030

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).

Funding

This work was supported by the National Natural Science Foundation of China under Grant (62002186) and the Unveiling Project of Qilu University of Technology (Shandong Academy of Sciences) (2022JBZ01-01).

Author information

Authors and Affiliations

Authors

Contributions

MT conceived and designed the algorithm, and QL implemented and tested it. MT and QL wrote the main manuscript text and others prepared Figures 1–10. All authors reviewed the manuscript.

Corresponding author

Correspondence to Qi Liu.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

In this paper, the declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, M., Liu, Q., Pan, J. et al. swPTS: an efficient parallel Thomas split algorithm for tridiagonal systems on Sunway manycore processors. J Supercomput 80, 4682–4706 (2024). https://doi.org/10.1007/s11227-023-05641-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05641-1

Keywords

Navigation