Experiences in autotuning matrix multiplication for energy minimization on GPUs
Journal Article
·
· Concurrency and Computation. Practice and Experience
- Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science
- Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)
Summary In this paper, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double precision. In contrast to traditional autotuning and/or optimization for runtime performance only, we also take the energy efficiency into account. For kernels achieving equal performance, we show significant differences in their energy balance. We also identify the memory throughput as the most influential metric that trades off performance and energy efficiency. As a result, the performance optimal case ends up not being the most efficient kernel in overall resource use. Copyright © 2015 John Wiley & Sons, Ltd.
- Research Organization:
- Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE; National Science Foundation (NSF); Nvidia Corporation (United States); Intel Corporation (United States); Advanced Micro Devices, Inc. (AMD) (United States); Russian Scientific Fund (Russian Federation)
- Contributing Organization:
- Univ. of Manchester (United Kingdom)
- Grant/Contract Number:
- AC05-00OR22725; SHF-1320603; N14-11-00190
- OSTI ID:
- 1361296
- Alternate ID(s):
- OSTI ID: 1401625
- Journal Information:
- Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; ISSN 1532-0626
- Publisher:
- WileyCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Cited by: 10 works
Citation information provided by
Web of Science
Web of Science
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs
|
conference | January 2017 |
BOAST: A metaprogramming framework to produce portable and efficient computing kernels for HPC applications
|
journal | August 2017 |
Similar Records
Overcoming element quality dependence of finite elements with adaptive extended stencil FEM (AES‐FEM)
Scenario analysis for techno-economic model development of U.S. offshore wind support structures
Acceleration of GPU-based Krylov solvers via data transfer reduction
Journal Article
·
Wed Mar 23 00:00:00 EDT 2016
· International Journal for Numerical Methods in Engineering
·
OSTI ID:1361296
Scenario analysis for techno-economic model development of U.S. offshore wind support structures
Journal Article
·
Thu Sep 22 00:00:00 EDT 2016
· Wind Energy
·
OSTI ID:1361296
+2 more
Acceleration of GPU-based Krylov solvers via data transfer reduction
Journal Article
·
Wed Apr 08 00:00:00 EDT 2015
· International Journal of High Performance Computing Applications
·
OSTI ID:1361296
+2 more