skip to main content
10.1145/3368474.3368486acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems

Published:15 January 2020Publication History

ABSTRACT

Sparse triangular solve (SpTRSV) is an important scientific kernel used in several applications such as preconditioners for Krylov methods. Parallelizing SpTRSV on multi-core systems is challenging since it exhibits limited parallelism due to computational dependencies and introduces high parallelization overhead due to finegrained and unbalanced nature of workloads. We propose a novel method, named Adaptive Level Binning (ALB), that addresses these challenges by eliminating redundant synchronization points and adapting the work granularity with an efficient load balancing strategy. Similar to the commonly used level-set methods for solving SpTRSV, ALB constructs level-sets of rows, where each level can be computed in parallel. Differently, ALB bins rows to levels adaptively and reduces redundant dependencies between rows. On an Intel® Xeon® Gold 6148 processor and NVIDIA® Tesla V100 GPU, ALB obtains 1.83x speedup on average and up to 5.28x speedup over Intel MKL and, over NVIDIA cuSPARSE, an average speedup of 2.80x and a maximum speedup of 39.40x for 29 matrices selected from Suite Sparse Matrix Collection.

References

  1. JosÃl' I. Aliaga, Ernesto Dufrechou, Pablo Ezzatti, and Enrique S. Quintana-OrtÃη. 2019. Accelerating the task/data-parallel version of ILUPACKs BiCG in multi-CPU/GPU configurations. Parallel Comput. 85 (2019), 79 -- 87. https://doi.org/10.1016/j.parco.2019.02.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Edward Anderson and Yousef Saad. 1989. Solving Sparse Triangular Linear Systems on Parallel Computers. International Journal of High Speed Computing 1, 1 (1989), 73--95. https://doi.org/10.1142/S0129053389000056Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hartwig Anzt, Edmond Chow, and Jack Dongarra. 2015. Iterative Sparse Triangular Solves for Preconditioning. In EuroPar 2015. Springer Berlin, Springer Berlin, Vienna, Austria. https://doi.org/10.1007/978-3-662-48096-0_50Google ScholarGoogle ScholarCross RefCross Ref
  4. Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzer, Gerhard Wellein, and Martin Köhler. 2017. Preconditioned Krylov solvers on GPUs. Parallel Comput. 68 (oct 2017), 32--44. https://doi.org/10.1016/j.parco.2017.05.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. The OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface.Google ScholarGoogle Scholar
  6. Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 13, 13 pages. https://doi.org/10.1145/3126908.3126936Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2018. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 62, 15 pages. https://doi.org/10.1109/SC.2018.00065Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. In SIAM Journal on Scientific Computing, Vol. 37(2), C169âĂrŞC193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. NVIDIA Coporation. 2012. CUDA Toolkit 4.2, cuSPARSE library.Google ScholarGoogle Scholar
  10. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Steven W. Hammond and Robert Schreiber. 1992. Efficient ICCG on a Shared Memory Multiprocessor. International Journal of High Speed Computing 04, 01 (1992), 1--21. https://doi.org/10.1142/S0129053392000183 arXiv:https://doi.org/10.1142/S0129053392000183Google ScholarGoogle ScholarCross RefCross Ref
  13. Intel Incorporated. 2019. Intel® MKL | Intel® Software. https://software.mtel.com/en-us/mkl/documentation/view-allGoogle ScholarGoogle Scholar
  14. T. Iwashita, H. Nakashima, and Y. Takahashi. 2012. Algebraic Block Multi-Color Ordering Method for Parallel Multi-Threaded Sparse Triangular Solver in ICCG Method. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium. 474--483. https://doi.org/10.1109/IPDPS.2012.51Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Martin KÃűhler. 2017. libUFget - The UF Sparse Collection C interface. https://doi.org/10.5281/zenodo.897632Google ScholarGoogle ScholarCross RefCross Ref
  16. Ruipeng Li. 2017. ON PARALLEL SOLUTION OF SPARSE TRIANGULAR LINEAR SYSTEMS IN CUDA. Technical Report. arXiv:1710.04985v1 https://arxiv.org/pdf/1710.04985.pdfGoogle ScholarGoogle Scholar
  17. Ruipeng Li and Yousef Saad. 2013. GPU-accelerated preconditioned iterative linear solvers. The Journal of Supercomputing 63, 2 (feb 2013), 443--466. https://doi.org/10.1007/s11227-012-0825-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, and Brian Vinter. 2016. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. In Proceedings of the 22Nd International Conference on Euro Par 2016: Parallel Processing - Volume 9833. Springer-Verlag New York, Inc., New York, NY, USA, 617--630. https://doi.org/10.1007/978-3-319-43659-3_45Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. 2017. Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurrency and Computation: Practice and Experience 29, 21 (2017), e4244. https://doi.org/10.1002/cpe.4244 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4244 e4244 cpe.4244.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jan Mayer. 2009. Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86, 4 (16 Sep 2009), 291. https://doi.org/10.1007/s00607-009-0066-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Maxim Naumov. 2011. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU. Technical Report.Google ScholarGoogle Scholar
  22. Maxim Naumov, Patrice Castonguay, and Jonathan Cohen. 2015. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Technical Report.Google ScholarGoogle Scholar
  23. Alan GeorgeMichael T. HeathJoseph LiuEsmond Ng. 1986. Solution of sparse positive definite systems on a shared-memory multiprocessor. International Journal of Parallel Programming Volume 15, Issue 4, pp (1986), 309âĂrŞ325.Google ScholarGoogle Scholar
  24. NVIDIA. 2019. NVIDIA cuSPARSE library. https://docs.nvidia.com/cuda/cusparse/index.htmlGoogle ScholarGoogle Scholar
  25. Jongsoo Park, Mikhail Smelyanskiy, Narayanan Sundaram, and Pradeep Dubey. 2014. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver. In Proceedings of the 29th International Conference on Supercomputing - Volume 8488 (ISC 2014). Springer-Verlag New York, Inc., New York, NY, USA, 124--140. https://doi.org/10.1007/978-3-319-07518-1_8Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Rong, J. Park, L. Xiang, T. A. Anderson, and M. Smelyanskiy. 2016. Sparso: Context-driven optimizations of sparse linear algebra. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT). 247--259. https://doi.org/10.1145/2967938.2967943Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Edward Rothberg and Anoop Gupta. 1992. Parallel ICCG on a hierarchical memory multiprocessor Addressing the triangular solve bottleneck. Parallel Comput. 18, 7 (1992), 719 -- 741. https://doi.org/10.1016/0167-8191(92)90041-5Google ScholarGoogle ScholarCross RefCross Ref
  28. Joel H. Saltz. 1990. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors. SIAM J. Sci. Stat. Comput. 11, 1 (Jan. 1990), 123--144. https://doi.org/10.1137/0911008Google ScholarGoogle ScholarCross RefCross Ref
  29. Barry Smith and Hong Zhang. 2011. Sparse Triangular Solves for ILU Revisited: Data Layout Crucial to Better Performance. Int. J. High Perform. Comput. Appl. 25, 4 (Nov. 2011), 386--391. https://doi.org/10.1177/1094342010389857Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Suchoski, C. Severn, M. Shantharam, and P. Raghavan. 2012. Adapting Sparse Triangular Solution to GPUs. In 2012 41st International Conference on Parallel Processing Workshops. 140--148. https://doi.org/10.1109/ICPPW.2012.23Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ehsan Totoni, Michael T. Heath, and Laxmikant V. Kale. 2014. Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Comput. 40, 9 (2014), 454 -- 470. https://doi.org/10.1016/j.parco.2014.06.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xinliang Wang, Weifeng Liu, Wei Xue, and Li Wu. 2018. swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. SIGPLAN Not. 53, 1 (Feb. 2018), 338--353. https://doi.org/10.1145/3200691.3178513Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
          January 2020
          247 pages
          ISBN:9781450372367
          DOI:10.1145/3368474

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 January 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate69of143submissions,48%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader