Abstract
Sparse matrix-matrix multiplication (SpMM) is a basic kernel that is used by many algorithms. Several researches focus on various optimizations for SpMM parallel execution. However, a division of a task for parallelization is not well considered yet. Generally, a matrix is equally divided into blocks for processes even though the sparsities of input matrices are different. The parameter that divides a task into multiple processes for parallelization is fixed. As a result, load imbalance among the processes occurs. To balance the loads among the processes, this paper proposes a dynamic parameter tuning method by analyzing the sparsities of input matrices. The experimental results show that the proposed method improves the performance of SpMM for examined matrices by up to 39.5% and 12.3% on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bell, N., Dalton, S., Olson, L.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012). https://doi.org/10.1137/110838844
Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks, pp. 233–244. SPAA 2009, Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1583991.1584053
Chen, Y., Li, K., Yang, W., Xiao, G., Xie, X., Li, T.: Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer. IEEE Trans. Parallel Distrib. Syst. 30(4), 923–938 (2019). https://doi.org/10.1109/TPDS.2018.2871189
Dalton, S., Olson, L., Bell, N.: Optimizing sparse matrix–matrix multiplication for the gpu. ACM Trans. Math. Softw. 41(4) (2015). https://doi.org/10.1145/2699470
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. textbf38(1) (2011). https://doi.org/10.1145/2049662.2049663
Deveci, M., Trott, C., Rajamanickam, S.: Performance-portable sparse matrix-matrix multiplication for many-core architectures. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 693–702 (2017). https://doi.org/10.1109/IPDPSW.2017.8
Forum, M.P.: MPI: a message-passing interface standard. Technical report, USA (1994)
Gilbert, J., Reinhardt, S., Shah, V.: A unified framework for numerical and combinatorial computing. Comput. Sci. Eng. 10, 20–25 (2008). https://doi.org/10.1109/MCSE.2008.45
Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992). https://doi.org/10.1137/0613024
Graf, D., Labib, K., Uznański, P.: Hamming distance completeness and sparse matrix multiplication (2018)
Green, O., Mccoll, R., Bader, D.: GPU merge path: a GPU merging algorithm (2014). https://doi.org/10.1145/2304576.2304621
Gremse, F., Höfter, A., Schwen, L.O., Kiessling, F., Naumann, U.:Gpu-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM J. Sci. Comput. 37(1), C54–C71 (2015).https://doi.org/10.1137/130948811
Komatsu, K., et al.: Performance evaluation of a vector supercomputer sx-aurora tsubasa. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. SC 2018, IEEE Press (2018)
Li, J., Wang, F., Araki, T., Qiu, J.: Generalized sparse matrix-matrix multiplication for vector engines and graph applications. In: 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). pp. 33–42 (2019)
Li, K., Yang, W., Li, K.: Performance analysis and optimization for SPMV on GPU using probabilistic modeling. IEEE Trans. Parallel and Distrib. Syst. 26(1), 196–205 (2015). https://doi.org/10.1109/TPDS.2014.2308221
Liu, W., Vinter, B.: A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors. J. Parallel Distrib. Comput. 85, 47–61 (2015). https://doi.org/10.1016/j.jpdc.2015.06.010
Matam, K., Krishna Bharadwaj Indarapu, S.R., Kothapalli, K.: Sparse matrix-matrix multiplication on modern architectures. In: 2012 19th International Conference on High Performance Computing, pp. 1–10 (2012). https://doi.org/10.1109/HiPC.2012.6507483
Nagasaka, Y., Nukada, A., Matsuoka, S.: High-performance and memory-saving sparse general matrix-matrix multiplication for NVIDIA pascal GPU. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 101–110 (2017). https://doi.org/10.1109/ICPP.2017.19
Vingelmann, P., Fitzek, F.H.: NVIDIA Cuda, release: 10.2.89 (2020). https://developer.nvidia.com/cuda-toolkit
Ordonez, C., Zhang, Y., Cabrera, W.: The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans. Knowl. Data Eng. 28(7), 1905–1918 (2016). https://doi.org/10.1109/TKDE.2016.2545664
Parger, M., Winter, M., Mlakar, D., Steinberger, M.: Speck: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 362–375. PPoPP 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3332466.3374521
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009). https://doi.org/10.1109/IPDPS.2009.5161005
Schaub, M.T., Trefois, M., van Dooren, P., Delvenne, J.C.: Sparse matrix factorizations for fast linear solvers with application to laplacian systems. SIAM J. Matrix Anal. Appl. 38(2), 505–529 (2017). https://doi.org/10.1137/16m1077398
Xie, Z., Tan, G., Liu, W., Sun, N.: IA-SPGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In: Proceedings of the ACM International Conference on Supercomputing, pp. 94–105. ICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3330345.3330354
Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SPMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015). https://doi.org/10.1109/TC.2014.2366731
Acknowledgment
This research was partially supported by MEXT Next Generation High Performance Computing Infrastructures and Applications R&D Program, entitled “R&D of A Quantum-Annealing-Assisted Next Generation HPC Infrastructure and its Applications.”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Qi, B., Komatsu, K., Sato, M., Kobayashi, H. (2021). A Dynamic Parameter Tuning Method for High Performance SpMM. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-69244-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69243-8
Online ISBN: 978-3-030-69244-5
eBook Packages: Computer ScienceComputer Science (R0)