research-article

Workload and power budget partitioning for single-chip heterogeneous processors

Authors:

Ripudaman Singh,

Michael J. Schulte,

Nam Sung KimAuthors Info & Claims

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Pages 401 - 410

https://doi.org/10.1145/2370816.2370873

Published: 19 September 2012 Publication History

Abstract

With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize the throughput of a single-chip heterogeneous processor (SCHP), the chip power budget shared between the CPU and GPU must be effectively utilized. At the same time, the CPU and GPU in an SCHP must each satisfy its own power constraint. Furthermore, the power budget allocated to the CPU and GPU impacts performance. In this paper, using a detailed cycle-level SCHP simulator, we first demonstrate that the joint optimization of workload and power budget partitioning between the CPU and GPU can provide 13% higher throughput than the optimization of workload partitioning alone under a fixed power budget allocation to the CPU and GPU. Second, we propose an effective runtime algorithm that can determine near-optimal or optimal combinations of workload and power budget partitioning. The algorithm exploits the runtime power efficiencies of the workload executed on the CPU and the GPU. Using the detailed cycle-level SCHP simulator, we show that within five to eight kernel invocations the algorithm can achieve 96% of the maximum throughput obtained by an exhaustive search algorithm. Finally, we demonstrate comparable throughput improvements when we apply the algorithm to a commercial computing system with an SCHP.

References

[1]

H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," in IEEE Int. Symp. on Comp. Arch. (ISCA), 2011, pp. 365--376.

Digital Library

[2]

J. Nickolls and W.J. Dally, "The GPU Computing Era," IEEE Micro, vol. 30, no. 2, pp. 56 -69, Mar-Apr 2010.

Digital Library

[3]

NVIDIA. What is CUDA? {Online}. http://www.nvidia.com/object/cuda_home_new.html

[4]

OpenCL - The open standard for parallel programming of heterogeneous systems. {Online}. http://www.khronos.org/opencl/

[5]

A. Branover, D. Foley, and M. Steinman, "AMD Fusion APU:Llano," IEEE Micro, vol. 32, no. 2, pp. 28--37, Mar-Apr 2012.

Digital Library

[6]

M. Yuffe, E. Knoll, M. Mehalel, J. Shor, and T. Kurts, "A fully integrated multi-CPU, GPU and memory controller 32nm processor," in IEEE Int. Solid-State Circuits Conf. (ISSCC), 2011, pp. 264--266.

[7]

Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim, "Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping," in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009, pp. 45--55.

Digital Library

[8]

M. Linderman, J. Collins, H. Wang, and Meng T., "Merge: a programming model for heterogeneous multi-core systems,", ACM Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008, pp. 287--296.

Digital Library

[9]

P. Wang et al., "EXOCHI: Architecture and Programming Environment for a Heterogeneous Multi-core Multithreaded System," in ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), pp. 156--166.

Digital Library

[10]

G. Wang and X. Ren, "Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System," in IEEE Parallel and Dist. Proc. with Applications (ISPA), 2010, pp. 122--129.

Digital Library

[11]

D Foley et al., "AMD'S Llano Fusion APU," in IEEE/ACM Symp. on High Performance Chips (HOTCHIPS), 2011.

[12]

C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, "An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget," in IEEE Int. Symp. on Microarch. (MICRO), 2006, pp. 347 -358.

Digital Library

[13]

J. Lee, V. Sathish, M. Schulte, K. Compton, and N.S. Kim, "Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling," in IEEE/ACM Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2011, pp. 111--120.

Digital Library

[14]

J. Li and J.F. Martinez, "Dynamic power-performance adaptation of parallel computation on chip multiprocessors," in IEEE/ACM Int. Symp. on High-Performance Computer Architecture (HPCA), 2006, pp. 77--87.

[15]

N. Binkert et al., "The gem5 simulator," ACM SIGARCH Comp. Arch. News, vol. 39, no. 2, pp. 1--7, Aug 2011.

Digital Library

[16]

A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads using a Detailed GPU Simulator," in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.

[17]

NVIDIA. GeForce GTS 260M Specification. {Online}. http://www.geforce.com/hardware/notebook-gpus/geforce-gts-260m/specifications

[18]

C. Shuai et al., "Rodinia: A Benchmark Suite for Heterogeneous Computing," IEEE Int. Symp. on Workload Characterization (IISWC), Oct. 2009.

Digital Library

[19]

Sheng Li et al., "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in IEEE Int. Symp. on Microarch. (MICRO), 2009, pp. 469--480.

Digital Library

[20]

S. Hong and Kim. H, "An integrated GPU power and performance model.," in IEEE/ACM Int. Symp. on Computer Architecture (ISCA), 2010, pp. 280--289.

Digital Library

[21]

W. Zhao and Y. Cao, "New generation of Predictive Technology Model for sub-45nm early design exploration," IEEE T. on Electron Devices, vol. 53, no. 11, pp. 2816--2823, Nov 2006.

[22]

NVIDIA, "NVIDIA's Next Generation CUDA Compute Architecture: Fermi,".

Cited By

Ortega CAlvarez LBuyuktosunoglu ABertran Monfort RRosedahl TBose PMoreto Planas M(2022)Adaptive Power Shifting for Power-Constrained Heterogeneous SystemsIEEE Transactions on Computers10.1109/TC.2022.3174545(1-1)Online publication date: 2022
https://doi.org/10.1109/TC.2022.3174545
Kumar RGhoshal B(2022)Machine learning guided thermal management of Open Computing Language applications on CPU‐GPU based embedded platformsIET Computers & Digital Techniques10.1049/cdt2.1205017:1(20-28)Online publication date: 28-Dec-2022
https://doi.org/10.1049/cdt2.12050
Singh ABasireddy KPrakash AMerrett GAl-Hashimi B(2019)Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCsIEEE Transactions on Computers10.1109/TC.2019.2943855(1-1)Online publication date: 2019
https://doi.org/10.1109/TC.2019.2943855
Show More Cited By

Index Terms

Workload and power budget partitioning for single-chip heterogeneous processors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data

Recommendations

Workload-Aware Optimal Power Allocation on Single-Chip Heterogeneous Processors
As technology scales below 32 nm, manufacturers began to integrate both CPU and GPU cores in a single chip, i.e., single-chip heterogeneous processor (SCHP), to improve the throughput of emerging applications. In SCHPs, the CPU and the GPU share ...
Workload Partitioning for Accelerating Applications on Heterogeneous Platforms

Heterogeneous platforms composed of multi-core CPUs and different types of accelerators, like GPUs and Xeon Phi, are becoming popular for data parallel applications. The heterogeneity of the hardware mix and the diversity of the applications pose ...
Designing heterogeneous many-core processors to provide high performance under limited chip power budget

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

September 2012

512 pages

ISBN:9781450311823

DOI:10.1145/2370816

General Chairs:
Pen-Chung Yew
University of Minnesota
,
Sangyeun Cho
University of Pittsburgh
,
Program Chairs:
Luiz DeRose
Cray, Inc.
,
David J. Lilja
University of Minnesota

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '12

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '12: International Conference on Parallel Architectures and Compilation Techniques

September 19 - 23, 2012

Minnesota, Minneapolis, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,500
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ortega CAlvarez LBuyuktosunoglu ABertran Monfort RRosedahl TBose PMoreto Planas M(2022)Adaptive Power Shifting for Power-Constrained Heterogeneous SystemsIEEE Transactions on Computers10.1109/TC.2022.3174545(1-1)Online publication date: 2022
https://doi.org/10.1109/TC.2022.3174545
Kumar RGhoshal B(2022)Machine learning guided thermal management of Open Computing Language applications on CPU‐GPU based embedded platformsIET Computers & Digital Techniques10.1049/cdt2.1205017:1(20-28)Online publication date: 28-Dec-2022
https://doi.org/10.1049/cdt2.12050
Singh ABasireddy KPrakash AMerrett GAl-Hashimi B(2019)Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCsIEEE Transactions on Computers10.1109/TC.2019.2943855(1-1)Online publication date: 2019
https://doi.org/10.1109/TC.2019.2943855
Gupta UAyoub RKishinevsky MKadjo DSoundararajan NTursun UOgras U(2018)Dynamic Power Budgeting for Mobile Systems Running Graphics WorkloadsIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26834874:1(30-40)Online publication date: 1-Jan-2018
https://doi.org/10.1109/TMSCS.2017.2683487
Ahmed KLiu JBadawy AEidenbenz S(2017)A brief history of HPC simulation and future challengesProceedings of the 2017 Winter Simulation Conference10.5555/3242181.3242210(1-12)Online publication date: 3-Dec-2017
https://dl.acm.org/doi/10.5555/3242181.3242210
Wachter EMerrett GAl-Hashimi BSingh A(2017)Reliable mapping and partitioning of performance-constrained openCL applications on CPU-GPU MPSoCsProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3157088(78-83)Online publication date: 15-Oct-2017
https://dl.acm.org/doi/10.1145/3139315.3157088
Singh APrakash ABasireddy KMerrett GAl-Hashimi B(2017)Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCsACM Transactions on Embedded Computing Systems10.1145/312654816:5s(1-22)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126548
Ahmed KLiu JBadawy AEidenbenz S(2017)A brief history of HPC simulation and future challenges2017 Winter Simulation Conference (WSC)10.1109/WSC.2017.8247804(419-430)Online publication date: Dec-2017
https://doi.org/10.1109/WSC.2017.8247804
Garzón EMoreno JMartínez J(2017)An approach to optimise the energy efficiency of iterative computation on integrated GPU---CPU systemsThe Journal of Supercomputing10.1007/s11227-016-1643-973:1(114-125)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1007/s11227-016-1643-9
Barik RFarooqui NLewis BHu CShpeisman TFranke BWu YRastello F(2016)A black-box approach to energy-aware scheduling on integrated CPU-GPU systemsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854052(70-81)Online publication date: 29-Feb-2016
https://dl.acm.org/doi/10.1145/2854038.2854052
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten