A Dynamic Parameter Tuning Method for High Performance SpMM

Qi, Bin; Komatsu, Kazuhiko; Sato, Masayuki; Kobayashi, Hiroaki

doi:10.1007/978-3-030-69244-5_28

Bin Qi¹¹,
Kazuhiko Komatsu¹¹,
Masayuki Sato¹¹ &
…
Hiroaki Kobayashi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12606))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

1067 Accesses
1 Citations

Abstract

Sparse matrix-matrix multiplication (SpMM) is a basic kernel that is used by many algorithms. Several researches focus on various optimizations for SpMM parallel execution. However, a division of a task for parallelization is not well considered yet. Generally, a matrix is equally divided into blocks for processes even though the sparsities of input matrices are different. The parameter that divides a task into multiple processes for parallelization is fixed. As a result, load imbalance among the processes occurs. To balance the loads among the processes, this paper proposes a dynamic parameter tuning method by analyzing the sparsities of input matrices. The experimental results show that the proposed method improves the performance of SpMM for examined matrices by up to 39.5% and 12.3% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bell, N., Dalton, S., Olson, L.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012). https://doi.org/10.1137/110838844
Article MathSciNet MATH Google Scholar
Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, pp. 233–244. SPAA 2009, Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1583991.1584053
Chen, Y., Li, K., Yang, W., Xiao, G., Xie, X., Li, T.: Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer. IEEE Trans. Parallel Distrib. Syst. 30(4), 923–938 (2019). https://doi.org/10.1109/TPDS.2018.2871189
Article Google Scholar
Dalton, S., Olson, L., Bell, N.: Optimizing sparse matrix–matrix multiplication for the gpu. ACM Trans. Math. Softw. 41(4) (2015). https://doi.org/10.1145/2699470
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. textbf38(1) (2011). https://doi.org/10.1145/2049662.2049663
Deveci, M., Trott, C., Rajamanickam, S.: Performance-portable sparse matrix-matrix multiplication for many-core architectures. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 693–702 (2017). https://doi.org/10.1109/IPDPSW.2017.8
Forum, M.P.: MPI: a message-passing interface standard. Technical report, USA (1994)
Google Scholar
Gilbert, J., Reinhardt, S., Shah, V.: A unified framework for numerical and combinatorial computing. Comput. Sci. Eng. 10, 20–25 (2008). https://doi.org/10.1109/MCSE.2008.45
Article Google Scholar
Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992). https://doi.org/10.1137/0613024
Graf, D., Labib, K., Uznański, P.: Hamming distance completeness and sparse matrix multiplication (2018)
Google Scholar
Green, O., Mccoll, R., Bader, D.: GPU merge path: a GPU merging algorithm (2014). https://doi.org/10.1145/2304576.2304621
Gremse, F., Höfter, A., Schwen, L.O., Kiessling, F., Naumann, U.:Gpu-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM J. Sci. Comput. 37(1), C54–C71 (2015).https://doi.org/10.1137/130948811
Komatsu, K., et al.: Performance evaluation of a vector supercomputer sx-aurora tsubasa. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. SC 2018, IEEE Press (2018)
Google Scholar
Li, J., Wang, F., Araki, T., Qiu, J.: Generalized sparse matrix-matrix multiplication for vector engines and graph applications. In: 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). pp. 33–42 (2019)
Google Scholar
Li, K., Yang, W., Li, K.: Performance analysis and optimization for SPMV on GPU using probabilistic modeling. IEEE Trans. Parallel and Distrib. Syst. 26(1), 196–205 (2015). https://doi.org/10.1109/TPDS.2014.2308221
Article Google Scholar
Liu, W., Vinter, B.: A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors. J. Parallel Distrib. Comput. 85, 47–61 (2015). https://doi.org/10.1016/j.jpdc.2015.06.010
Matam, K., Krishna Bharadwaj Indarapu, S.R., Kothapalli, K.: Sparse matrix-matrix multiplication on modern architectures. In: 2012 19th International Conference on High Performance Computing, pp. 1–10 (2012). https://doi.org/10.1109/HiPC.2012.6507483
Nagasaka, Y., Nukada, A., Matsuoka, S.: High-performance and memory-saving sparse general matrix-matrix multiplication for NVIDIA pascal GPU. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 101–110 (2017). https://doi.org/10.1109/ICPP.2017.19
Vingelmann, P., Fitzek, F.H.: NVIDIA Cuda, release: 10.2.89 (2020). https://developer.nvidia.com/cuda-toolkit
Ordonez, C., Zhang, Y., Cabrera, W.: The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans. Knowl. Data Eng. 28(7), 1905–1918 (2016). https://doi.org/10.1109/TKDE.2016.2545664
Article Google Scholar
Parger, M., Winter, M., Mlakar, D., Steinberger, M.: Speck: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 362–375. PPoPP 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3332466.3374521
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009). https://doi.org/10.1109/IPDPS.2009.5161005
Schaub, M.T., Trefois, M., van Dooren, P., Delvenne, J.C.: Sparse matrix factorizations for fast linear solvers with application to laplacian systems. SIAM J. Matrix Anal. Appl. 38(2), 505–529 (2017). https://doi.org/10.1137/16m1077398
Xie, Z., Tan, G., Liu, W., Sun, N.: IA-SPGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In: Proceedings of the ACM International Conference on Supercomputing, pp. 94–105. ICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3330345.3330354
Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015). https://doi.org/10.1109/TC.2014.2366731
Article MathSciNet MATH Google Scholar

Download references

Acknowledgment

This research was partially supported by MEXT Next Generation High Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications.”

Author information

Authors and Affiliations

Tohoku University, Sendai, Miyagi, Japan
Bin Qi, Kazuhiko Komatsu, Masayuki Sato & Hiroaki Kobayashi

Authors

Bin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiko Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Sato
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Qi .

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology, Shenzhen, China
Yong Zhang
Shenzhen Institutes of Advanced Technology, Shenzhen, China
Yicheng Xu
Griffith University, Gold Coast, QLD, Australia
Hui Tian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, B., Komatsu, K., Sato, M., Kobayashi, H. (2021). A Dynamic Parameter Tuning Method for High Performance SpMM. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-69244-5_28
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69243-8
Online ISBN: 978-3-030-69244-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics