skip to main content
10.1145/3404397.3404413acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Efficient Block Algorithms for Parallel Sparse Triangular Solve

Published:17 August 2020Publication History

ABSTRACT

The sparse triangular solve (SpTRSV) kernel is an important building block for a number of linear algebra routines such as sparse direct and iterative solvers. The major challenge of accelerating SpTRSV lies in the difficulties of finding higher parallelism. Existing work mainly focuses on reducing dependencies and synchronizations in the level-set methods. However, the 2D block layout of the input matrix has been largely ignored in designing more efficient SpTRSV algorithms.

In this paper, we implement three block algorithms, i.e., column block, row block and recursive block algorithms, for parallel SpTRSV on modern GPUs, and propose an adaptive approach that can automatically select the best kernels according to input sparsity structures. By testing 159 sparse matrices on two high-end NVIDIA GPUs, the experimental results demonstrate that the recursive block algorithm has the best performance among the three block algorithms, and it is on average 4.72x (up to 72.03x) and 9.95x (up to 61.08x) faster than cuSPARSE v2 and Sync-free methods, respectively. Besides, our method merely needs moderate cost for preprocessing the input matrix, thus is highly efficient for multiple right-hand sides and iterative scenarios.

References

  1. [1] E. Agullo, A. Buttari, A. Guermouche, and F. Lopez. Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems. ACM Trans. Math. Softw., 43(2), 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov. Numerical Linear Algebra on Emerging Architectures: The PLASMA and MAGMA Projects. Journal of Physics: Conference Series, 180:012037, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] K. Akbudak, H. Ltaief, A. Mikhalev, A. Charara, A. Esposito, and D. Keyes. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par ’18, pages 721–734, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] P. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the Complexity of the Block Low-Rank Multifrontal Factorization. SIAM Journal on Scientific Computing, 39(4):A1710–A1740, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Trans. Math. Softw., 45(1), 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] P. R. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures. ACM Trans. Math. Softw., 45(1), 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] E. Anderson and Y. Saad. Solving Sparse Triangular Linear Systems on Parallel Computers. International Journal of High Speed Computing, 1(1):73–95, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] H. Anzt, E. Chow, and J. Dongarra. Iterative Sparse Triangular Solves for Preconditioning. In Euro-Par ’15, pages 650–661. 2015.Google ScholarGoogle Scholar
  9. [9] H. Anzt, E. Chow, and J. Dongarra. ParILUT–A New Parallel Threshold ILU Factorization. SIAM Journal on Scientific Computing, 40(4):C503–C519, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] H. Anzt, E. Chow, T. Huckle, and J. Dongarra. Batched Generation of Incomplete Sparse Approximate Inverses on GPUs. In 2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pages 49–56, 2016.Google ScholarGoogle Scholar
  11. [11] H. Anzt, E. Chow, D. B. Szyld, and J. Dongarra. Domain Overlap for Iterative Sparse Triangular Solves on GPUs. In Software for Exascale Computing - SPPEXA 2013-2015, pages 527–545, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] H. Anzt, M. Gates, J. Dongarra, M. Kreutzer, G. Wellein, and M. Köhler. Preconditioned Krylov solvers on GPUs. Parallel Computing, 68:32 – 44, 2017.Google ScholarGoogle Scholar
  13. [13] H. Anzt, T. Huckle, J. Brackle, and J. Dongarra. Incomplete Sparse Approximate Inverses for Parallel Preconditioning. Parallel Computing, 71:1–22, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] A. M. Bradley. A Hybrid Multithreaded Direct Sparse Triangular Solver. In SIAM CSC workshop ’16, pages 13–22, 2016.Google ScholarGoogle Scholar
  15. [15] A. Buluç and J. R. Gilbert. On the Representation and Multiplication of Hypersparse Matrices. In IPDPS ’08, pages 1–11, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] D. Buono, F. Petrini, F. Checconi, X. Liu, X. Que, C. Long, and T.-C. Tuan. Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics. In ICS ’16, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] A. Buttari, V. Eijkhout, J. Langou, and S. Filippone. Performance Optimization and Modeling of Blocked Sparse Kernels. The International Journal of High Performance Computing Applications, 21(4):467–484, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. Parallel Tiled QR Factorization for Multicore Architectures. Concurrency and Computation: Practice and Experience, 20(13):1573–1590, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures. Parallel Computing, 35(1):38 – 53, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] A. Charara, D. Keyes, and H. Ltaief. A Framework for Dense Triangular Matrix Kernels on Various Manycore Architectures. Concurrency and Computation: Practice and Experience, 29(15):e4187, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] A. Charara, H. Ltaief, and D. Keyes. Redesigning Triangular Dense Matrix Computations on GPUs. In Euro-Par ’16, pages 477–489, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] J. Chen, J. Fang, W. Liu, T. Tang, and C. Yang. clMF: A Fine-Grained and Portable Alternating Least Squares Algorithm for Parallel Matrix Factorization. Future Generation Computer Systems, 108:1192–1205, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis. In SC ’17, page 1–13, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] K. Cheshmi, S. Kamil, M. M. Strout, and M. M. Dehnavi. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC ’18, pages 779–793, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] E. Chow, H. Anzt, J. Scott, and J. Dongarra. Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning. Journal of Parallel and Distributed Computing, 119:219 – 230, 2018.Google ScholarGoogle Scholar
  26. [26] E. Chow and A. Patel. Fine-Grained Parallel Incomplete LU Factorization. SIAM Journal on Scientific Computing, 37(2):C169–C193, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] T. Cojean, A. Guermouche, A. Hugo, R. Namyst, and P. Wacrenier. Resource Aggregation for Task-Based Cholesky Factorization on Top of Modern Architectures. Parallel Computing, 83:73 – 92, 2019.Google ScholarGoogle Scholar
  28. [28] T. Davis. Direct Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] N. Ding, S. Williams, Y. Liu, and X. S. Li. Leveraging One-Sided Communication for Sparse Triangular Solvers. In SIAM PP ’20, pages 93–105, 2020.Google ScholarGoogle Scholar
  31. [31] J. Dongarra, V. Eijkhout, and P. Łuszczek. Recursive Approach in Sparse Matrix LU Factorization. Scientific Programming, 9(1):51–60, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek. Achieving Numerical Accuracy and High Performance Using Recursive Tile LU Factorization with Partial Pivoting. Concurrency and Computation: Practice and Experience, 26(7):1408–1431, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Oxford University Press, Inc., 2nd edition, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] I. S. Duff and B. Uçar. On the Block Triangular Form of Symmetric Matrices. SIAM Review, 52(3):455–470, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] E. Dufrechou and P. Ezzatti. A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems. In IPDPS ’18, pages 920–929, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] E. Dufrechou and P. Ezzatti. Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm. In PDP ’18, pages 196–203, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] J. González-Domínguez, M. J. Martín, G. L. Taboada, and J. Touriño. Dense Triangular Solvers on Multicore Clusters using UPC. Procedia Computer Science, 4:231 – 240, 2011.Google ScholarGoogle Scholar
  38. [38] L. Grigori, J. W. Demmel, and X. S. Li. Parallel Symbolic Factorization for Sparse LU with Static Pivoting. SIAM Journal on Scientific Computing, 29(3):1289–1314, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] A. Haidar, H. Ltaief, A. YarKhan, and J. Dongarra. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and Computation: Practice and Experience, 24(3):305–321, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] J. D. Hogg. A Fast Dense Triangular Solve in CUDA. SIAM Journal on Scientific Computing, 35(3):C303–C322, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] K. Hou, W. Liu, H. Wang, and W.-c. Feng. Fast Segmented Sort on GPUs. In ICS ’17, pages 12:1–12:10, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] D. Irony and S. Toledo. Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers. Parallel Processing Letters, 12(01):79–94, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] H. Kabir, J. D. Booth, G. Aupy, A. Benoit, Y. Robert, and P. Raghavan. STS-k: A Multilevel Sparse Triangular Solution Scheme for NUMA Multicores. In SC ’15, pages 55:1–55:11, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] A. Li, W. Liu, M. R. B. Kristensen, B. Vinter, H. Wang, K. Hou, A. Marquez, and S. L. Song. Exploring and Analyzing the Real Impact of Modern On-package Memory on HPC Scientific Kernels. In SC ’17, pages 26:1–26:14, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] R. Li and Y. Saad. GPU-Accelerated Preconditioned Iterative Linear Solvers. The Journal of Supercomputing, 63(2):443–466, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] X. S. Li. An Overview of SuperLU: Algorithms, Implementation, and User Interface. ACM Trans. Math. Softw., 31(3):302–325, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] J. Liu, X. He, W. Liu, and G. Tan. Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication. International Journal of Parallel Programming, page 403–417, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] W. Liu. Parallel and Scalable Sparse Basic Linear Algebra Subprograms. PhD thesis, University of Copenhagen, 2015.Google ScholarGoogle Scholar
  49. [49] W. Liu, A. Li, J. Hogg, I. S. Duff, and B. Vinter. A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves. In Euro-Par ’16, pages 617–630, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] W. Liu, A. Li, J. D. Hogg, I. S. Duff, and B. Vinter. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience, 29(21):e4244–n/a, 2017.Google ScholarGoogle Scholar
  51. [51] W. Liu and B. Vinter. A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors. Journal of Parallel and Distributed Computing, 85(C):47–61, 2015.Google ScholarGoogle Scholar
  52. [52] W. Liu and B. Vinter. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In ICS ’15, pages 339–350, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] W. Liu and B. Vinter. Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Parallel Computing, 49(C):179–193, 2015.Google ScholarGoogle Scholar
  54. [54] Y. Liu, M. Jacquelin, P. Ghysels, and X. S. Li. Highly Scalable Distributed-Memory Sparse Triangular Solution Algorithms. In SIAM CSC workshop ’18, pages 87–96.Google ScholarGoogle Scholar
  55. [55] K. K. Matam and K. Kothapalli. Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU. In ICPP ’11, pages 612–621, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] J. Mayer. Parallel Algorithms for Solving Linear Systems with Sparse Triangular Matrices. Computing, 86(4):291–312, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] M. S. Mohammadi, T. Yuki, K. Cheshmi, E. C. Davis, M. Hall, M. M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat, and M. M. Strout. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors. In PLDI ’19, page 594–609, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] M. Naumov. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU. Technical report, NVIDIA, 2011.Google ScholarGoogle Scholar
  59. [59] M. Naumov, P. Castonguay, and J. Cohen. Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU. Nvidia White Paper, 2015.Google ScholarGoogle Scholar
  60. [60] J. Park, M. Smelyanskiy, N. Sundaram, and P. Dubey. Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver. In ISC ’14, pages 124–140, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] A. Picciau, G. E. Inggs, J. Wickerson, E. C. Kerrigan, and G. A. Constantinides. Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs. In HiPC ’16, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd edition, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. Pileggi, and F. Franchetti. Efficient SpMV Operation for Large and Highly Sparse Matrices Using Scalable Multi-Way Merge Parallelization. In MICRO ’19, page 347–358, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] J. H. Saltz. Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors. SIAM Journal on Scientific and Statistical Computing, 11(1):123–144, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] P. Sao, R. Kannan, X. S. Li, and R. Vuduc. A Communication-Avoiding 3D Sparse Triangular Solver. In ICS ’19, page 127–137, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] E. Saule, K. Kaya, and Ü. V. Çatalyürek. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi. In PPAM ’14, pages 559–570, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] R. Schreiber and W.-P. Tang. Vectorizing the Conjugate Gradient Method. In Proceedings of the Symposium on CYBER 205 Applications, 1982.Google ScholarGoogle Scholar
  68. [68] M. M. Strout, M. Hall, and C. Olschanowsky. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proceedings of the IEEE, 106(11):1921–1934, 2018.Google ScholarGoogle Scholar
  69. [69] M. M. Strout, A. LaMielle, L. Carter, J. Ferrante, B. Kreaseck, and C. Olschanowsky. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Computing, 53:32 – 57, 2016.Google ScholarGoogle Scholar
  70. [70] J. Su, F. Zhang, W. Liu, B. He, R. Wu, X. Du, and R. Wang. CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs. In ICPP ’20, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] B. Suchoski, C. Severn, M. Shantharam, and P. Raghavan. Adapting Sparse Triangular Solution to GPUs. In ICPPW ’12, pages 140–148, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] D. T. Vooturi, G. Varma, and K. Kothapalli. Dynamic Block Sparse Reparameterization of Convolutional Neural Networks. In ICCV ’19 Workshops, Oct 2019.Google ScholarGoogle Scholar
  73. [73] B. Uçar and C. Aykanat. Partitioning Sparse Matrices for Parallel Preconditioned Iterative Methods. SIAM Journal on Scientific Computing, 29(4):1683–1709, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] A. Venkat, M. S. Mohammadi, J. Park, H. Rong, R. Barik, M. M. Strout, and M. Hall. Automating Wavefront Parallelization for Sparse Matrix Computations. In SC ’16, pages 480–491, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] D. T. Vooturi and K. Kothapalli. Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU. In HiPC ’19, pages 215–224, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] R. Vuduc, S. Kamil, J. Hsu, R. Nishtala, J. W. Demmel, and K. A. Yelick. Automatic Performance Tuning and Analysis of Sparse Triangular Solve. In ICS ’02 Workshop, 2002.Google ScholarGoogle Scholar
  77. [77] H. Wang, W. Liu, K. Hou, and W.-c. Feng. Parallel Transposition of Sparse Data Structures. In ICS ’16, pages 33:1–33:13, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] X. Wang, W. Liu, W. Xue, and L. Wu. SwSpTRSV: A Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In PPoPP ’18, page 338–353, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] X. Wang, P. Xu, W. Xue, Y. Ao, C. Yang, H. Fu, L. Gan, G. Yang, and W. Zheng. A Fast Sparse Triangular Solver for Structured-Grid Problems on Sunway Many-Core Processor SW26010. In ICPP ’18, 2018.Google ScholarGoogle Scholar
  80. [80] T. Wicky, E. Solomonik, and T. Hoefler. Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations. In IPDPS ’17, pages 678–687, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] M. Wittmann, G. Hager, R. Janalik, M. Lanser, A. Klawonn, O. Rheinbach, O. Schenk, and G. Wellein. Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model. In SBAC-PAD ’18, pages 233–241, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] M. M. Wolf, M. A. Heroux, and E. G. Boman. Factors Impacting Performance of Multithreaded Sparse Triangular Solve. In VECPAR ’10, pages 32–44. 2011.Google ScholarGoogle Scholar
  83. [83] Z. Xie, G. Tan, W. Liu, and N. Sun. IA-SpGEMM: An Input-Aware Auto-Tuning Framework for Parallel Sparse Matrix-Matrix Multiplication. In ICS ’19, pages 94–105, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] B. Yılmaz, B. Sipahioğrlu, N. Ahmad, and D. Unat. Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems. In HPC Asia ’20, page 188–198, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] F. Zhang, W. Liu, N. Feng, J. Zhai, and X. Du. Performance Evaluation and Analysis of Sparse Matrix and Graph Kernels on Heterogeneous Processors. CCF Transactions on High Performance Computing, pages 131–143, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] F. Zhang, J. Zhai, B. Wu, B. He, W. Chen, and X. Du. Automatic Irregularity-Aware Fine-Grained Workload Partitioning on Integrated Architectures. IEEE Transactions on Knowledge and Data Engineering, 2019.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
    August 2020
    844 pages
    ISBN:9781450388160
    DOI:10.1145/3404397

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 August 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format