Skip to main content
Log in

A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Sparse triangular solve (SpTRSV) is an important kernel in many scientific computing applications. In traditional viewpoints, accelerating SpTRSV by parallelizing the solution process is a challenging task. Dependencies among the variables that exist in the solution process not only restrict the parallelism that can be achieved, but also introduce large synchronization overhead among the parallel tasks. Moreover, a time-consuming pre-processing phase is commonly required to identify calculations that can be parallelized. However, we have observed that a large number of dependencies among the variables can be eliminated if we only calculate partial values of the variables first and add them together to obtain the final values later. By using such a strategy, starting to solve a variable does not need to wait for all of its prerequisite variables having been solved. In consequence, parallelism of the SpTRSV can be increased significantly. In this paper, we transform above mentioned observations into a subtree-based parallel algorithm to accelerate SpTRSV. The proposed algorithm calculates partial values of the variable along with an implicit subtree traversal and utilizes hardware atomic operation to implement accumulation of the partial values. This not only introduces no pre-processing overhead, but also avoids any barrier synchronization among the parallel threads. We evaluate the proposed algorithm on 2135 matrices from SuiteSparse Matrix Collection based on a generic GPU platform. Experimental results demonstrate that our scheme outperforms the state-of-the-art GPU and CPU vendor libraries in 1949 and 1782 matrices, respectively. Compared with the latest synchronization-free method, our scheme outperforms in 1779 matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Duff, I.S., Erisman, A.M., Reid, J.K.: Direct Methods for Sparse Matrices, 2nd edn. Oxford University Press, Oxford (2017)

    Book  Google Scholar 

  2. Saad, Y.: Iterative Methods for Sparse Linear Systems. Siam, Philadelphia (2003)

    Book  Google Scholar 

  3. Park, J., Smelyanskiy, M., Sundaram, N., et al.: Sparsifying synchronization for high performance shared-memory sparse triangular solver. In: Proceedings of the International Supercomputing Conference, pp. 124–140 (2014)

  4. Bai, Z., Wu, W.: On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput. 40(1), 592–606 (2018)

    Article  MathSciNet  Google Scholar 

  5. Josephson, J., Ramesh, R.: A novel algorithm for real time task scheduling in multiprocessor environment. Clust. Comput. 22, 13761–13771 (2019)

    Article  Google Scholar 

  6. Xue, C., Lin, C., Hu, J.: Scalability analysis of request scheduling in cloud computing. TSINGHUA Sci. Technol. 24(3), 249–261 (2019)

    Article  Google Scholar 

  7. Chen, C., Pouransarib, H., Rajamanickam, S., et al.: A distributed-memory hierarchical solver for general sparse linear systems. Parallel Comput. 74, 49–64 (2018)

    Article  MathSciNet  Google Scholar 

  8. Mayer, J.: Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86(4), 291–312 (2009)

    Article  MathSciNet  Google Scholar 

  9. Ma, W., Ao, Y., Yang, C., et al.: Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight. Clust. Comput. 23, 493–507 (2020)

    Article  Google Scholar 

  10. Vuduc, R., Kamil, S., Hsu, J., et al.: Automatic performance tuning and analysis of sparse triangular solve. In: Proceedings of the Workshop on Performance Optimization Via High-level Languages & Libraries, vol. 1, p. 12 (2002)

  11. Suchoski, B., Severn, C., Shantharam, M., et al.: Adapting sparse triangular solution to GPUs. In: Proceedings of the International Conference on Parallel Processing Workshops, pp. 140–148 (2012)

  12. Edward, R.: Alternatives for solving sparse triangular systems on distributed-memory multiprocessors. Parallel Comput. 21(7), 1121–1136 (1995)

    Article  MathSciNet  Google Scholar 

  13. Picciau, A., Inggs, G. E., Wickerson, J., et al.: Balancing locality and concurrency: Solving sparse triangular systems on GPUs. In: Proceedings of the IEEE International Conference on High Performance Computing (HiPC), pp. 183–192 (2016)

  14. Totoni, E., Heath, M.T., Kale, L.V.: Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Comput. 40(9), 454–470 (2014)

    Article  MathSciNet  Google Scholar 

  15. Anderson, E., Saad, Y.: Solving sparse triangular linear systems on parallel computers. Int. J. High Speed Comput. 1(1), 73–95 (1989)

    Article  Google Scholar 

  16. Saltz, J.H.: Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J. Sci. Stat. Comput. 11(1), 123–144 (1990)

    Article  MathSciNet  Google Scholar 

  17. Catalan, S., Castello, A., Igual, F.D., et al.: Programming parallel dense matrix factorizations with look-ahead and OpenMP. Clust. Comput. 23, 359–375 (2020)

    Article  Google Scholar 

  18. Liu, W., Li, A., Hogg, J., et al.: A synchronization-free algorithm for parallel sparse triangular solves. In: Proceedings of the European Conference on Parallel Processing, pp. 617–630 (2016)

  19. Kabir, H., Booth, J. D., Aupy, G., et al.: STS-k: A multilevel sparse triangular solution scheme for NUMA multicores. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 55–66 (2015)

  20. Wolf, M.M., Heroux, M.A., Boman, E.G.: Factors impacting performance of multithreaded sparse triangular solve. High Perform. Comput. Comput. Sci. 6449, 32 (2011)

    MATH  Google Scholar 

  21. Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. NVIDIA Technical Report NVR-2011-001 (2011)

  22. Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomput. 63(2), 443–466 (2013)

    Article  Google Scholar 

  23. Anzt, H., Chow, E., Dongarra, J.: Iterative sparse triangular solves for preconditioning. In: Proceedings of the European Conference on Parallel Processing, pp. 650–661 (2015)

  24. Wang, X., Liu, W., Xue, W., et al.: swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures. In: Proceedings of the ACM Sigplan Symposium on Principles & Practice of Parallel Programming, pp. 338–353 (2018)

  25. Marrakchi, S., Jemni, M.: Fine-grained parallel solution for solving sparse triangular systems on multicore platform using OpenMP interface. In: Proceedings of the International Conference on High Performance Computing and Simulation, pp. 659–666 (2017)

  26. Liu, W., Li, A., Hogg, J., et al.: Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurr. Comput. Pract. Exp. 29(21), 4244 (2017)

    Article  Google Scholar 

  27. Davis, T., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (NSFC) under Grant No. 61772061, in part by Natural Science Foundation of Hebei province of China under Grant No. F2017502043, in part by the Fundamental Research Funds for the Central Universities under Grant Nos. (2017MS114, 2019RC41), in part by Open Subject of State Key Laboratory of Computer Architecture under Grant No. CARCH201802.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, S., Pei, S., Wang, Y. et al. A parallel sparse triangular solve algorithm based on dependency elimination of the solution vector. Cluster Comput 24, 1317–1330 (2021). https://doi.org/10.1007/s10586-020-03188-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-020-03188-x

Keywords

Navigation