research-article

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems

Authors:
Buse Yılmaz

Koç University, Istanbul, Turkey

Koç University, Istanbul, Turkey
View Profile

,
Buğrra Sipahioğrlu

Koç University, Istanbul, Turkey

Koç University, Istanbul, Turkey
View Profile

,
Najeeb Ahmad

Koç University, Istanbul, Turkey

Koç University, Istanbul, Turkey
View Profile

,
Didem Unat

Koç University, Istanbul, Turkey

Koç University, Istanbul, Turkey
View Profile

HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific RegionJanuary 2020Pages 188–198https://doi.org/10.1145/3368474.3368486

Published:15 January 2020Publication History

HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

Pages 188–198

ABSTRACT

Sparse triangular solve (SpTRSV) is an important scientific kernel used in several applications such as preconditioners for Krylov methods. Parallelizing SpTRSV on multi-core systems is challenging since it exhibits limited parallelism due to computational dependencies and introduces high parallelization overhead due to finegrained and unbalanced nature of workloads. We propose a novel method, named Adaptive Level Binning (ALB), that addresses these challenges by eliminating redundant synchronization points and adapting the work granularity with an efficient load balancing strategy. Similar to the commonly used level-set methods for solving SpTRSV, ALB constructs level-sets of rows, where each level can be computed in parallel. Differently, ALB bins rows to levels adaptively and reduces redundant dependencies between rows. On an Intel® Xeon® Gold 6148 processor and NVIDIA® Tesla V100 GPU, ALB obtains 1.83x speedup on average and up to 5.28x speedup over Intel MKL and, over NVIDIA cuSPARSE, an average speedup of 2.80x and a maximum speedup of 39.40x for 29 matrices selected from Suite Sparse Matrix Collection.

References

JosÃl' I. Aliaga, Ernesto Dufrechou, Pablo Ezzatti, and Enrique S. Quintana-OrtÃη. 2019. Accelerating the task/data-parallel version of ILUPACKs BiCG in multi-CPU/GPU configurations. Parallel Comput. 85 (2019), 79 -- 87. https://doi.org/10.1016/j.parco.2019.02.005Google ScholarDigital Library
Edward Anderson and Yousef Saad. 1989. Solving Sparse Triangular Linear Systems on Parallel Computers. International Journal of High Speed Computing 1, 1 (1989), 73--95. https://doi.org/10.1142/S0129053389000056Google ScholarDigital Library
Hartwig Anzt, Edmond Chow, and Jack Dongarra. 2015. Iterative Sparse Triangular Solves for Preconditioning. In EuroPar 2015. Springer Berlin, Springer Berlin, Vienna, Austria. https://doi.org/10.1007/978-3-662-48096-0_50Google ScholarCross Ref
Hartwig Anzt, Mark Gates, Jack Dongarra, Moritz Kreutzer, Gerhard Wellein, and Martin Köhler. 2017. Preconditioned Krylov solvers on GPUs. Parallel Comput. 68 (oct 2017), 32--44. https://doi.org/10.1016/j.parco.2017.05.006Google ScholarDigital Library
The OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface.Google Scholar
Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 13, 13 pages. https://doi.org/10.1145/3126908.3126936Google ScholarDigital Library
Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2018. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 62, 15 pages. https://doi.org/10.1109/SC.2018.00065Google ScholarDigital Library
Edmond Chow and Aftab Patel. 2015. Fine-Grained Parallel Incomplete LU Factorization. In SIAM Journal on Scientific Computing, Vol. 37(2), C169âĂrŞC193.Google ScholarDigital Library
NVIDIA Coporation. 2012. CUDA Toolkit 4.2, cuSPARSE library.Google Scholar
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663Google ScholarDigital Library
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663Google ScholarDigital Library
Steven W. Hammond and Robert Schreiber. 1992. Efficient ICCG on a Shared Memory Multiprocessor. International Journal of High Speed Computing 04, 01 (1992), 1--21. https://doi.org/10.1142/S0129053392000183 arXiv:https://doi.org/10.1142/S0129053392000183Google ScholarCross Ref
Intel Incorporated. 2019. Intel® MKL | Intel® Software. https://software.mtel.com/en-us/mkl/documentation/view-allGoogle Scholar
T. Iwashita, H. Nakashima, and Y. Takahashi. 2012. Algebraic Block Multi-Color Ordering Method for Parallel Multi-Threaded Sparse Triangular Solver in ICCG Method. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium. 474--483. https://doi.org/10.1109/IPDPS.2012.51Google ScholarDigital Library
Martin KÃűhler. 2017. libUFget - The UF Sparse Collection C interface. https://doi.org/10.5281/zenodo.897632Google ScholarCross Ref
Ruipeng Li. 2017. ON PARALLEL SOLUTION OF SPARSE TRIANGULAR LINEAR SYSTEMS IN CUDA. Technical Report. arXiv:1710.04985v1 https://arxiv.org/pdf/1710.04985.pdfGoogle Scholar
Ruipeng Li and Yousef Saad. 2013. GPU-accelerated preconditioned iterative linear solvers. The Journal of Supercomputing 63, 2 (feb 2013), 443--466. https://doi.org/10.1007/s11227-012-0825-3Google ScholarDigital Library
Weifeng Liu, Ang Li, Jonathan Hogg, Iain S. Duff, and Brian Vinter. 2016. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. In Proceedings of the 22Nd International Conference on Euro Par 2016: Parallel Processing - Volume 9833. Springer-Verlag New York, Inc., New York, NY, USA, 617--630. https://doi.org/10.1007/978-3-319-43659-3_45Google ScholarDigital Library
Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. 2017. Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides. Concurrency and Computation: Practice and Experience 29, 21 (2017), e4244. https://doi.org/10.1002/cpe.4244 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4244 e4244 cpe.4244.Google ScholarCross Ref
Jan Mayer. 2009. Parallel algorithms for solving linear systems with sparse triangular matrices. Computing 86, 4 (16 Sep 2009), 291. https://doi.org/10.1007/s00607-009-0066-3Google ScholarDigital Library
Maxim Naumov. 2011. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU. Technical Report.Google Scholar
Maxim Naumov, Patrice Castonguay, and Jonathan Cohen. 2015. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Technical Report.Google Scholar
Alan GeorgeMichael T. HeathJoseph LiuEsmond Ng. 1986. Solution of sparse positive definite systems on a shared-memory multiprocessor. International Journal of Parallel Programming Volume 15, Issue 4, pp (1986), 309âĂrŞ325.Google Scholar
NVIDIA. 2019. NVIDIA cuSPARSE library. https://docs.nvidia.com/cuda/cusparse/index.htmlGoogle Scholar
Jongsoo Park, Mikhail Smelyanskiy, Narayanan Sundaram, and Pradeep Dubey. 2014. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver. In Proceedings of the 29th International Conference on Supercomputing - Volume 8488 (ISC 2014). Springer-Verlag New York, Inc., New York, NY, USA, 124--140. https://doi.org/10.1007/978-3-319-07518-1_8Google ScholarDigital Library
H. Rong, J. Park, L. Xiang, T. A. Anderson, and M. Smelyanskiy. 2016. Sparso: Context-driven optimizations of sparse linear algebra. In 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT). 247--259. https://doi.org/10.1145/2967938.2967943Google ScholarDigital Library
Edward Rothberg and Anoop Gupta. 1992. Parallel ICCG on a hierarchical memory multiprocessor Addressing the triangular solve bottleneck. Parallel Comput. 18, 7 (1992), 719 -- 741. https://doi.org/10.1016/0167-8191(92)90041-5Google ScholarCross Ref
Joel H. Saltz. 1990. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors. SIAM J. Sci. Stat. Comput. 11, 1 (Jan. 1990), 123--144. https://doi.org/10.1137/0911008Google ScholarCross Ref
Barry Smith and Hong Zhang. 2011. Sparse Triangular Solves for ILU Revisited: Data Layout Crucial to Better Performance. Int. J. High Perform. Comput. Appl. 25, 4 (Nov. 2011), 386--391. https://doi.org/10.1177/1094342010389857Google ScholarDigital Library
B. Suchoski, C. Severn, M. Shantharam, and P. Raghavan. 2012. Adapting Sparse Triangular Solution to GPUs. In 2012 41st International Conference on Parallel Processing Workshops. 140--148. https://doi.org/10.1109/ICPPW.2012.23Google ScholarDigital Library
Ehsan Totoni, Michael T. Heath, and Laxmikant V. Kale. 2014. Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Comput. 40, 9 (2014), 454 -- 470. https://doi.org/10.1016/j.parco.2014.06.006Google ScholarDigital Library
Xinliang Wang, Weifeng Liu, Wei Xue, and Li Wu. 2018. swSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. SIGPLAN Not. 53, 1 (Feb. 2018), 338--353. https://doi.org/10.1145/3200691.3178513Google ScholarDigital Library

Index Terms

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems

Index terms have been assigned to the content through auto-classification.

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
Read More
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, ...
Read More
Performance Gaps between OpenMP and OpenCL for Multi-core CPUs
ICPPW '12: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops

OpenCL and OpenMP are the most commonly used programming models for multi-core processors. They are also fundamentally different in their approach to parallelization. In this paper, we focus on comparing the performance of OpenCL and OpenMP. We select ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
January 2020
247 pages
ISBN:9781450372367
DOI:10.1145/3368474

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 January 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CPU
fine-grained parallelism
level-set
sparse triangular solvers
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate69of143submissions,48%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 188
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems

HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

ABSTRACT

References

Cited By

Index Terms

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems

HPCAsia '20: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

ABSTRACT

References

Cited By

Index Terms

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media