skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Experiences in autotuning matrix multiplication for energy minimization on GPUs

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.3516· OSTI ID:1361296
 [1];  [1];  [1]; ORCiD logo [1];  [2]
  1. Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science
  2. Univ. of Tennessee, Knoxville, TN (United States). Dept. of Electrical Engineering and Computer Science; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)

Summary In this paper, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double precision. In contrast to traditional autotuning and/or optimization for runtime performance only, we also take the energy efficiency into account. For kernels achieving equal performance, we show significant differences in their energy balance. We also identify the memory throughput as the most influential metric that trades off performance and energy efficiency. As a result, the performance optimal case ends up not being the most efficient kernel in overall resource use. Copyright © 2015 John Wiley & Sons, Ltd.

Research Organization:
Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; National Science Foundation (NSF); Nvidia Corporation (United States); Intel Corporation (United States); Advanced Micro Devices, Inc. (AMD) (United States); Russian Scientific Fund (Russian Federation)
Contributing Organization:
Univ. of Manchester (United Kingdom)
Grant/Contract Number:
AC05-00OR22725; SHF-1320603; N14-11-00190
OSTI ID:
1361296
Alternate ID(s):
OSTI ID: 1401625
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 27, Issue 17; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 10 works
Citation information provided by
Web of Science

References (26)

Auto-tuning a high-level language targeted to GPU codes conference May 2012
Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control conference July 2011
Fast implementation of DGEMM on Fermi GPU
  • Tan, Guangming; Li, Linchuan; Triechle, Sean
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063431
conference January 2011
A new energy aware performance metric journal July 2010
Autotuning Stencil-Based Computations on GPUs conference September 2012
Energy-efficient execution of dense linear algebra algorithms on multi-core processors journal May 2012
Search Space Pruning Constraints Visualization conference September 2014
Quantifying the energy cost of data movement in scientific applications conference September 2013
An Improved Magma Gemm For Fermi Graphics Processing Units journal September 2010
Input-aware auto-tuning for directive-based GPU programming conference January 2013
Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors: Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors journal September 2014
The LINPACK Benchmark: past, present and future
  • Dongarra, Jack J.; Luszczek, Piotr; Petitet, Antoine
  • Concurrency and Computation: Practice and Experience, Vol. 15, Issue 9 https://doi.org/10.1002/cpe.728
journal January 2003
Preliminary Results of Autotuning GEMM Kernels for the NVIDIA Kepler Architecture- GeForce GTX 680 report April 2012
Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems
  • Anzt, H.; Quintana-Ortí, E. S.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 372, Issue 2018 https://doi.org/10.1098/rsta.2013.0279
journal June 2014
Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks
  • Choi, Jee; Dukhan, Marat; Liu, Xing
  • 2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014 IEEE 28th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2014.54
conference May 2014
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications journal May 2010
Energy Efficient Scheduling of Real-Time Tasks on Multicore Processors journal November 2008
RAPL: memory power estimation and capping
  • David, Howard; Gorbatov, Eugene; Hanebutte, Ulf R.
  • Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design - ISLPED '10 https://doi.org/10.1145/1840845.1840883
conference January 2010
Power emulation based DVFS efficiency investigations for embedded systems conference September 2010
Model-driven autotuning of sparse matrix-vector multiply on GPUs journal May 2010
Understanding the Energy Consumption of Dynamic Random Access Memories conference December 2010
Accelerating GPU Kernels for Dense Linear Algebra book January 2011
A survey of architectural techniques for DRAM power management journal January 2012
Resource-conscious scheduling for energy efficiency on multicore processors conference January 2010
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
  • Matsumoto, Kazuya; Nakasato, Naohito; Sedukhin, Stanislav G.
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.59
conference November 2012
Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures book January 2010

Cited By (2)

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs conference January 2017
BOAST: A metaprogramming framework to produce portable and efficient computing kernels for HPC applications journal August 2017

Similar Records

Overcoming element quality dependence of finite elements with adaptive extended stencil FEM (AES‐FEM)
Journal Article · Wed Mar 23 00:00:00 EDT 2016 · International Journal for Numerical Methods in Engineering · OSTI ID:1361296

Scenario analysis for techno-economic model development of U.S. offshore wind support structures
Journal Article · Thu Sep 22 00:00:00 EDT 2016 · Wind Energy · OSTI ID:1361296

Acceleration of GPU-based Krylov solvers via data transfer reduction
Journal Article · Wed Apr 08 00:00:00 EDT 2015 · International Journal of High Performance Computing Applications · OSTI ID:1361296