Skip to main content
Log in

swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Sparse LU factorization is essential for scientific and engineering simulations. In this work, we present swSuperLU, a highly scalable sparse direct solver on Sunway manycore architecture based on sparse LU factorization. To improve the parallelism of sparse LU factorization, we introduce the hierarchical scheme to exploit the hierarchy of Sunway manycore architecture in process-level parallelism between MPEs and thread-level parallelism between the CPE arrays. A task-based hierarchical scheme and a series of highly optimized computation kernels are designed to map processor loads and memory access well to this hierarchy. Moreover, we compared various ordering strategies and several machine-dependent parameter settings to find the most suitable ordering strategies and parameter settings for Sunway manycore architecture. We present performance and scalability experiments of swSuperLU on Newest Generation Sunway Supercomputer and Sunway TaihuLight. swSuperLU achieves 9.02\(\times\) speedup on average compared to state-of-the-art packages and strong scalability from 10 thousand cores to million cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Harrington RF (1993) Field Computation by Moment Methods. Wiley-IEEE Press, Hoboken

    Book  Google Scholar 

  2. Jin JM (2011) Theory and computation of electromagnetic fields. John Wiley & Sons, Hoboken

    Google Scholar 

  3. Wu YS (2015) Multiphase fluid flow in porous and fractured reservoirs. Gulf professional publishing, Oxford

    Google Scholar 

  4. Blazek J (2015) Computational fluid dynamics: principles and applications. Butterworth-Heinemann, Oxford

    MATH  Google Scholar 

  5. Davis TA (2006) Direct methods for sparse linear systems. SIAM, Philadelphia

    Book  Google Scholar 

  6. Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia

    Book  Google Scholar 

  7. Demmel JW, Eisenstat SC, Gilbert JR, Li XS, Liu JW (1999) A supernodal approach to sparse partial pivoting. SIAM J Matrix Anal Appl 20(3):720–755

    Article  MathSciNet  Google Scholar 

  8. Gilbert JR, Liu JW (1993) Elimination structures for unsymmetric sparse lu factors. SIAM J Matrix Anal Appl 14(2):334–352

    Article  MathSciNet  Google Scholar 

  9. Blackford LS, Petitet A, Pozo R, Remington K, Whaley RC, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G et al (2002) An updated set of basic linear algebra subprograms (blas). ACM Trans Math Softw 28(2):135–151

    Article  MathSciNet  Google Scholar 

  10. Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The sunway taihulight supercomputer: system and applications. Sci China Inform Sci 59(7):1–16

    Article  Google Scholar 

  11. Liu Y, Jacquelin M, Ghysels P, Li XS (2018) Highly scalable distributed-memory sparse triangular solution algorithms. In: 2018 Proceedings of the Seventh SIAM Workshop on Combinatorial Scientific Computing, pp. 87–96. SIAM

  12. Yamazaki I, Li XS (2012) New scheduling strategies and hybrid programming for a parallel right-looking sparse lu factorization algorithm on multicore cluster systems. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 619–630. IEEE

  13. Sao P, Li XS, Vuduc R (2018) A communication-avoiding 3d lu factorization algorithm for sparse matrices. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 908–919. IEEE

  14. Sao P, Vuduc R, Li XS (2014) A distributed cpu-gpu sparse direct solver. In: European Conference on Parallel Processing, pp. 487–498. Springer

  15. Sao P, Liu X, Vuduc R, Li X (2015) A sparse direct solver for distributed memory xeon phi-accelerated systems. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 71–81. IEEE

  16. Niu Y, Lu Z, Dong M, Jin Z, Liu W, Tan G (2021) Tilespmv: A tiled algorithm for sparse matrix-vector multiplication on gpus. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 68–78. IEEE

  17. Su J, Zhang F, Liu W, He B, Wu R, Du X, Wang R (2020) Capellinisptrsv: A thread-level synchronization-free sparse triangular solve on gpus. In: 49th International Conference on Parallel Processing-ICPP, pp. 1–11

  18. Lu Z, Niu Y, Liu W (2020) Efficient block algorithms for parallel sparse triangular solve. In: 49th International Conference on Parallel Processing-ICPP, pp. 1–11

  19. Duan X, Gao P, Zhang T, Zhang M, Liu W, Zhang W, Xue W, Fu H, Gan L, Chen D et al (2018) Redesigning lammps for peta-scale and hundred-billion-atom simulation on sunway taihulight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 148–159. IEEE

  20. Chen B, Fu H, Wei Y, He C, Zhang W, Li Y, Wan W, Zhang W, Gan L, Zhang Z et al (2018) Simulating the wenchuan earthquake with accurate surface topography on sunway taihulight. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 517–528. IEEE

  21. Fu H, Liao J, Ding N, Duan X, Gan L, Liang Y, Wang X, Yang J, Zheng Y, Liu W et al (2017) Redesigning cam-se for peta-scale climate modeling performance and ultra-high resolution on sunway taihulight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12

  22. Lin H, Zhu X, Yu B, Tang X, Xue W, Chen W, Zhang L, Hoefler T, Ma X, Liu X et al (2018)Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 706–716. IEEE

  23. Zhong X, Li M, Yang H, Liu Y, Qian D (2018) swmr: a framework for accelerating mapreduce applications on sunway taihulight. IEEE Transactions on Emerging Topics in Computing

  24. Li L, Fang J, Fu H, Jiang J, Zhao W, He C, You X, Yang G (2018) swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 413–422. IEEE

  25. Liu C, Xie B, Liu X, Xue W, Yang H, Liu X (2018) Towards efficient spmv on sunway manycore architectures. In: Proceedings of the 2018 International Conference on Supercomputing, pp. 363–373

  26. Li M, Liu Y, Yang H, Luan Z, Qian D (2018) Multi-role sptrsv on sunway many-core architecture. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 594–601. IEEE

  27. Wang X, Liu W, Xue W, Wu L (2018) swsptrsv: A fast sparse triangular solve with sparse level tile layout on sunway architectures. In: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 338–353

  28. Fang J, Fu H, Zhao W, Chen B, Zheng W, Yang G (2017) swdnn: A library for accelerating deep learning applications on sunway taihulight. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 615–624. IEEE

  29. Li M, Liu Y, Yang H, Luan Z, Gan L, Yang G, Qian D (2019) Accelerating sparse cholesky factorization on sunway manycore architecture. IEEE Trans Parallel Distrib Syst 31(7):1636–1650

    Article  Google Scholar 

  30. Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1–25

    MathSciNet  MATH  Google Scholar 

  31. Rose DJ, Tarjan RE, Lueker GS (1976) Algorithmic aspects of vertex elimination on graphs. SIAM J Comput 5(2):266–283

    Article  MathSciNet  Google Scholar 

  32. Rose DJ, Tarjan RE (1978) Algorithmic aspects of vertex elimination on directed graphs. SIAM J Appl Math 34(1):176–197

    Article  MathSciNet  Google Scholar 

  33. Gilbert JR (1980) A note on the np-completeness of vertex elimination on directed graphs. SIAM J Algebraic Discrete Methods 1(3):292–294

    Article  MathSciNet  Google Scholar 

  34. Yannakakis M (1981) Computing the minimum fill-in is np-complete. SIAM J Algeb Discrete Methods 2(1):77–79

    Article  MathSciNet  Google Scholar 

  35. Tinney WF, Walker JW (1967) Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc IEEE 55(11):1801–1809

    Article  Google Scholar 

  36. Rose DJ (1972) A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations In Graph Theory and Computing. Elsevier, New York, pp 183–217

    Google Scholar 

  37. Amestoy PR, Davis TA, Duff IS (1996) An approximate minimum degree ordering algorithm. SIAM J Matrix Anal Appl 17(4):886–905

    Article  MathSciNet  Google Scholar 

  38. Eisenstat SC, Schultz MH, Sherman AH (1981) Algorithms and data structures for sparse symmetric gaussian elimination. SIAM J Sc Statist Comput 2(2):225–237

    Article  MathSciNet  Google Scholar 

  39. Liu JW (1985) Modification of the minimum-degree algorithm by multiple elimination. ACM Trans Math Softw (TOMS) 11(2):141–153

    Article  MathSciNet  Google Scholar 

  40. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  41. Karypis G, Kumar V (1998) Multilevelk-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129

    Article  Google Scholar 

  42. Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: SC’98: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 28–28. IEEE

  43. Karypis G, Kumar V (1997) A coarse-grain parallel formulation of multilevel k-way graph partitioning algorithm. In: PPSC

  44. Schloegel K, Karypis G, Kumar V (2000) Parallel multilevel algorithms for multi-constraint graph partitioning. In: European Conference on Parallel Processing, pp. 296–310. Springer

  45. Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference, pp. 157–172

  46. George A, Liu JW (1981) Computer solution of large sparse positive definite. Prentice Hall Professional Technical Reference, Englewood Cliffs

    MATH  Google Scholar 

  47. Li XS, Demmel JW (2003) Superlu\_dist: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans Math Softw (TOMS) 29(2):110–140

    Article  Google Scholar 

  48. Amestoy PR, Duff IS, L’Excellent J-Y, Koster J (2001) A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J Matrix Anal Appl 23(1):15–41

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62002186), the Shandong Provincial Natural Science Foundation (Grant No. ZR2019PF015), the “Colleges and Universities 20 Terms” Foundation of Jinan City, China (2018GXRC015) and the Research and Application Demonstration of Key Technologies of Autonomous Controllable Supercomputing Software Ecosystem Project (2020KJC-ZD01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zanjun Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, M., Wang, J., Zhang, Z. et al. swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture. J Supercomput 78, 11441–11463 (2022). https://doi.org/10.1007/s11227-021-04270-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04270-w

Keywords

Navigation