Skip to main content
Log in

Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight

  • Advances in Parallel and Distributed Computing for Neural Computing
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

General sparse matrix-sparse matrix (SpGEMM) multiplication is one of the basic kernels in a great many applications. Several works focus on various optimizations for SpGEMM. To fully exploit the powerful computing capability of the Sunway TaihuLight supercomputer for SpGEMM, this paper designs the partitioning method and parallelization of CSR-based SpGEMM to make it well match to the Sunway architecture. In addition, this paper optimizes the partitioning method based on the distribution of the floating-point calculations of the CSR-based SpGEMM to achieve the load balance and performance improvement on the Sunway. We, respectively, analyze the performance, including the memory footprint and the execution time, of the parallel CSR-based SpGEMM and the optimized CSR-based SpGEMM on the Sunway. The experimental results show that the optimized CSR-based SpGEMM outperforms over the parallel CSR-based SpGEMM and has good scalability on the Sunway.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Akbudak K, Aykanat C (2014) Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix–matrix multiplication. SIAM J. Sci. Comput. 36(5):C568–C590

    Article  MathSciNet  MATH  Google Scholar 

  2. Akbudak K, Selvitopi RO, Aykanat C (2018) Partitioning models for scaling parallel sparse matrix–matrix multiplication. TOPC 4(3):13:1–13:34

    Article  Google Scholar 

  3. Ballard G, Druinsky A, Knight N, Schwartz O (2015) Brief announcement: hypergraph partitioning for parallel sparse matrix-matrix multiplication. In: Proceedings of the 27th ACM on symposium on parallelism in algorithms and architectures, SPAA 2015, Portland, OR, USA, June 13–15, pp 86–88

  4. Chen J, Li K, Bilal K, Metwally AA, Li K, Yu P (2018) Parallel protein community detection in large-scale ppi networks based on multi-source learning. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2868088

  5. Chen J, Li K, Bilal K, Zhou X, Li K, Yu P (2018) A bi-layered parallel training architecture for large-scale convolutional neural networks. IEEE Trans Parallel Distrib Syst. https://doi.org/10.1109/TPDS.2018.2877359

  6. Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K (2018) A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans Parallel Distrib Syst 28(4):919–933

    Article  Google Scholar 

  7. Chen Y, Li K, Fei X, Quan Z, Li K (2018) Implementation and optimization of a data protecting model on the sunway taihulight supercomputer with heterogeneous many-core processors. Concurr Comput Pract Exp. https://doi.org/10.1002/cpe.4758

  8. Chen Y, Li K, Yang W, Xiao G, Xie X, Li T (2018) Performance-aware model for sparse matrix–matrix multiplication on the sunway taihulight supercomputer. IEEE Trans Parallel Distrib Syst. https://doi.org/10.1109/TPDS.2018.2871189

  9. Cheshmi K, Kamil S, Strout M.M, Dehnavi M.M (2018) Parsy: inspection and transformation of sparse matrix computations for parallelism. In: Proceedings of the international conference for high performance computing, networking, storage, and analysis, SC 2018, Dallas, TX, USA, November 11–16, 2018, pp 62:1–62:15

  10. Graf D, Labib K, Uznanski P (2018) Brief announcement: Hamming distance completeness and sparse matrix multiplication. In: 45th International colloquium on automata, languages, and programming, ICALP 2018, July 9–13, 2018, Prague, Czech Republic, pp 109:1–109:4

  11. Greathouse JL, Daga M (2014) Efficient sparse matrix-vector multiplication on gpus using the CSR storage format. In: International conference for high performance computing, networking, storage and analysis, SC 2014, New Orleans, LA, USA, November 16–21, 2014, pp 769–780

  12. Hong C, Sukumaran-Rajam A, Bandyopadhyay B, Kim J, Kurt SE, Nisa I, Sabhlok S, Çatalyürek ÜV, Parthasarathy S, Sadayappan P (2018) Efficient sparse-matrix multi-vector product on gpus. In: Proceedings of the 27th international symposium on high-performance parallel and distributed computing, HPDC 2018, Tempe, AZ, USA, June 11–15, 2018, pp 66–79

  13. Kannan R, Ballard G, Park H (2018) MPI-FAUN: an mpi-based framework for alternating–updating nonnegative matrix factorization. IEEE Trans Knowl Data Eng 30(3):544–558

    Article  Google Scholar 

  14. Kaya O, Kannan R, Ballard G (2018) Partitioning and communication strategies for sparse non-negative matrix factorization. In: Proceedings of the 47th international conference on parallel processing, ICPP 2018, Eugene, OR, USA, August 13–16, 2018, pp 90:1–90:10

  15. Kaya O, Uçar B (2015) Scalable sparse tensor decompositions in distributed memory systems. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC 2015, Austin, TX, USA, November 15-20, 2015, pp 77:1–77:11

  16. Li K, Yang W, Li K (2015) Performance analysis and optimization for spmv on GPU using probabilistic modeling. IEEE Trans Parallel Distrib Syst 26(1):196–205

    Article  MathSciNet  Google Scholar 

  17. Li K, Yang W, Li K (2016) A hybrid parallel solving algorithm on GPU for quasi-tridiagonal system of linear equations. IEEE Trans Parallel Distrib Syst 27(10):2795–2808

    Article  Google Scholar 

  18. Liu C, Xie B, Liu X, Xue W, Yang H, Liu X (2018) Towards efficient spmv on sunway manycore architectures. In: Proceedings of the 32nd international conference on supercomputing, ICS 2018, Beijing, China, June 12–15, 2018, pp 363–373

  19. Liu J, He X, Liu W, Tan G (2018) Register-based implementation of the sparse general matrix–matrix multiplication on gpus. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2018, Vienna, Austria, February 24–28, 2018, pp 407–408

  20. Ordonez C, Zhang Y, Cabrera W (2016) The gamma matrix to summarize dense and sparse data sets for big data analytics. IEEE Trans Knowl Data Eng 28(7):1905–1918

    Article  Google Scholar 

  21. Pal S, Beaumont J, Park D, Amarnath A, Feng S, Chakrabarti C, Kim H, Blaauw DT, Mudge TN, Dreslinski RG (2018) Outerspace: an outer product based sparse matrix multiplication accelerator. In: IEEE international symposium on high performance computer architecture, HPCA 2018, Vienna, Austria, February 24–28, 2018, pp 724–736

  22. Pichon G, Faverge M, Ramet P, Roman J (2017) Reordering strategy for blocking optimization in sparse linear solvers. SIAM J Matrix Anal Appl 38(1):226–248

    Article  MathSciNet  MATH  Google Scholar 

  23. Schaub MT, Trefois M, Dooren PV, Delvenne J (2017) Sparse matrix factorizations for fast linear solvers with application to laplacian systems. SIAM J Matrix Anal Appl 38(2):505–529

    Article  MathSciNet  MATH  Google Scholar 

  24. Sulatycke P, Ghose K (1998) Caching-efficient multithreaded fast multiplication of sparse matrices. In: IPPS/SPDP, pp 117–123

  25. Sun Q, Zhang C, Wu C, Zhang J, Li L (2018) Bandwidth reduced parallel spmv on the SW26010 many-core platform. In: Proceedings of the 47th international conference on parallel processing, ICPP 2018, Eugene, OR, USA, August 13–16, 2018, pp 54:1–54:10

  26. Wang S, Liu J, Shroff NB (2018) Coded sparse matrix multiplication. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, pp 5139–5147

  27. Xiao G, Li K, Li K (2017) Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries. Inf Sci 405:207–226

    Article  Google Scholar 

  28. Xiao G, Li K, Li K, Zhou X (2015) Efficient top-(k, l) range query processing for uncertain data based on multicore architectures. Distrib Parallel Databases 33(3):381–413

    Article  Google Scholar 

  29. Xiao G, Li K, Zhou X, Li K (2017) Efficient monochromatic and bichromatic probabilistic reverse top-k query processing for uncertain big data. J Comput Syst Sci 89:92–113

    Article  MathSciNet  MATH  Google Scholar 

  30. Yang W, Li K, Mo Z, Li K (2015) Performance optimization using partitioned spmv on gpus and multicore cpus. IEEE Trans Comput 64(9):2623–2636

    Article  MathSciNet  MATH  Google Scholar 

  31. Zhang J, Gruenwald L (2018) Regularizing irregularity: bitmap-based and portable sparse matrix multiplication for graph data on gpus. In: Proceedings of the 1st ACM SIGMOD joint international workshop on graph data management experiences and systems (GRADES) and network data analytics (NDA), Houston, TX, USA, June 10, 2018, pp 4:1–4:8

  32. Zhao Y, Li J, Liao C, Shen X (2017) POSTER: bridging the gap between deep learning and sparse matrix format selection. In: 26th International conference on parallel architectures and compilation techniques, PACT 2017, Portland, OR, USA, September 9–13, 2017, pp 152–153

  33. Zheng D, Mhembere D, Lyzinski V, Vogelstein JT, Priebe CE, Burns RC (2017) Semi-external memory sparse matrix multiplication for billion-node graphs. IEEE Trans Parallel Distrib Syst 28(5):1470–1483

    Article  Google Scholar 

Download references

Acknowledgements

The research was partially funded by the National Key R&D Program of China (Grant No. 2018YFB0203800), the National Outstanding Youth Science Program of National Natural Science Foundation of China (Grant No. 61625202), the International (Regional) Cooperation and Exchange Program of National Natural Science Foundation of China (Grant Nos. 61661146006, 61860206011), the Program of National Natural Science Foundation of China (Grant Nos. 61572175, 61806077), the Program of Hunan Provincial Innovation Foundation for Postgraduate (Grant No. CX2018B230), the International Postdoctoral Exchange Fellowship Program of China Postdoctoral Council (Grant No. OCPC2017032), and the Fellowship Program of China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqing Xiao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Xiao, G. & Yang, W. Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight. Neural Comput & Applic 32, 5571–5582 (2020). https://doi.org/10.1007/s00521-019-04121-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04121-z

Keywords

Navigation