skip to main content
10.1145/2370816.2370873acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Workload and power budget partitioning for single-chip heterogeneous processors

Published: 19 September 2012 Publication History

Abstract

With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize the throughput of a single-chip heterogeneous processor (SCHP), the chip power budget shared between the CPU and GPU must be effectively utilized. At the same time, the CPU and GPU in an SCHP must each satisfy its own power constraint. Furthermore, the power budget allocated to the CPU and GPU impacts performance. In this paper, using a detailed cycle-level SCHP simulator, we first demonstrate that the joint optimization of workload and power budget partitioning between the CPU and GPU can provide 13% higher throughput than the optimization of workload partitioning alone under a fixed power budget allocation to the CPU and GPU. Second, we propose an effective runtime algorithm that can determine near-optimal or optimal combinations of workload and power budget partitioning. The algorithm exploits the runtime power efficiencies of the workload executed on the CPU and the GPU. Using the detailed cycle-level SCHP simulator, we show that within five to eight kernel invocations the algorithm can achieve 96% of the maximum throughput obtained by an exhaustive search algorithm. Finally, we demonstrate comparable throughput improvements when we apply the algorithm to a commercial computing system with an SCHP.

References

[1]
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger, "Dark silicon and the end of multicore scaling," in IEEE Int. Symp. on Comp. Arch. (ISCA), 2011, pp. 365--376.
[2]
J. Nickolls and W.J. Dally, "The GPU Computing Era," IEEE Micro, vol. 30, no. 2, pp. 56 -69, Mar-Apr 2010.
[3]
NVIDIA. What is CUDA? {Online}. http://www.nvidia.com/object/cuda_home_new.html
[4]
OpenCL - The open standard for parallel programming of heterogeneous systems. {Online}. http://www.khronos.org/opencl/
[5]
A. Branover, D. Foley, and M. Steinman, "AMD Fusion APU:Llano," IEEE Micro, vol. 32, no. 2, pp. 28--37, Mar-Apr 2012.
[6]
M. Yuffe, E. Knoll, M. Mehalel, J. Shor, and T. Kurts, "A fully integrated multi-CPU, GPU and memory controller 32nm processor," in IEEE Int. Solid-State Circuits Conf. (ISSCC), 2011, pp. 264--266.
[7]
Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim, "Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping," in IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009, pp. 45--55.
[8]
M. Linderman, J. Collins, H. Wang, and Meng T., "Merge: a programming model for heterogeneous multi-core systems,", ACM Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008, pp. 287--296.
[9]
P. Wang et al., "EXOCHI: Architecture and Programming Environment for a Heterogeneous Multi-core Multithreaded System," in ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI), pp. 156--166.
[10]
G. Wang and X. Ren, "Power-Efficient Work Distribution Method for CPU-GPU Heterogeneous System," in IEEE Parallel and Dist. Proc. with Applications (ISPA), 2010, pp. 122--129.
[11]
D Foley et al., "AMD'S Llano Fusion APU," in IEEE/ACM Symp. on High Performance Chips (HOTCHIPS), 2011.
[12]
C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi, "An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget," in IEEE Int. Symp. on Microarch. (MICRO), 2006, pp. 347 -358.
[13]
J. Lee, V. Sathish, M. Schulte, K. Compton, and N.S. Kim, "Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling," in IEEE/ACM Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2011, pp. 111--120.
[14]
J. Li and J.F. Martinez, "Dynamic power-performance adaptation of parallel computation on chip multiprocessors," in IEEE/ACM Int. Symp. on High-Performance Computer Architecture (HPCA), 2006, pp. 77--87.
[15]
N. Binkert et al., "The gem5 simulator," ACM SIGARCH Comp. Arch. News, vol. 39, no. 2, pp. 1--7, Aug 2011.
[16]
A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads using a Detailed GPU Simulator," in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.
[17]
NVIDIA. GeForce GTS 260M Specification. {Online}. http://www.geforce.com/hardware/notebook-gpus/geforce-gts-260m/specifications
[18]
C. Shuai et al., "Rodinia: A Benchmark Suite for Heterogeneous Computing," IEEE Int. Symp. on Workload Characterization (IISWC), Oct. 2009.
[19]
Sheng Li et al., "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in IEEE Int. Symp. on Microarch. (MICRO), 2009, pp. 469--480.
[20]
S. Hong and Kim. H, "An integrated GPU power and performance model.," in IEEE/ACM Int. Symp. on Computer Architecture (ISCA), 2010, pp. 280--289.
[21]
W. Zhao and Y. Cao, "New generation of Predictive Technology Model for sub-45nm early design exploration," IEEE T. on Electron Devices, vol. 53, no. 11, pp. 2816--2823, Nov 2006.
[22]
NVIDIA, "NVIDIA's Next Generation CUDA Compute Architecture: Fermi,".

Cited By

View all
  • (2022)Adaptive Power Shifting for Power-Constrained Heterogeneous SystemsIEEE Transactions on Computers10.1109/TC.2022.3174545(1-1)Online publication date: 2022
  • (2022)Machine learning guided thermal management of Open Computing Language applications on CPU‐GPU based embedded platformsIET Computers & Digital Techniques10.1049/cdt2.1205017:1(20-28)Online publication date: 28-Dec-2022
  • (2019)Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCsIEEE Transactions on Computers10.1109/TC.2019.2943855(1-1)Online publication date: 2019
  • Show More Cited By

Index Terms

  1. Workload and power budget partitioning for single-chip heterogeneous processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
    September 2012
    512 pages
    ISBN:9781450311823
    DOI:10.1145/2370816
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. power constraint
    2. single-chip heterogeneous processor
    3. voltage/frequency/core scaling

    Qualifiers

    • Research-article

    Conference

    PACT '12
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Adaptive Power Shifting for Power-Constrained Heterogeneous SystemsIEEE Transactions on Computers10.1109/TC.2022.3174545(1-1)Online publication date: 2022
    • (2022)Machine learning guided thermal management of Open Computing Language applications on CPU‐GPU based embedded platformsIET Computers & Digital Techniques10.1049/cdt2.1205017:1(20-28)Online publication date: 28-Dec-2022
    • (2019)Collaborative Adaptation for Energy-Efficient Heterogeneous Mobile SoCsIEEE Transactions on Computers10.1109/TC.2019.2943855(1-1)Online publication date: 2019
    • (2018)Dynamic Power Budgeting for Mobile Systems Running Graphics WorkloadsIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2017.26834874:1(30-40)Online publication date: 1-Jan-2018
    • (2017)A brief history of HPC simulation and future challengesProceedings of the 2017 Winter Simulation Conference10.5555/3242181.3242210(1-12)Online publication date: 3-Dec-2017
    • (2017)Reliable mapping and partitioning of performance-constrained openCL applications on CPU-GPU MPSoCsProceedings of the 15th IEEE/ACM Symposium on Embedded Systems for Real-Time Multimedia10.1145/3139315.3157088(78-83)Online publication date: 15-Oct-2017
    • (2017)Energy-Efficient Run-Time Mapping and Thread Partitioning of Concurrent OpenCL Applications on CPU-GPU MPSoCsACM Transactions on Embedded Computing Systems10.1145/312654816:5s(1-22)Online publication date: 27-Sep-2017
    • (2017)A brief history of HPC simulation and future challenges2017 Winter Simulation Conference (WSC)10.1109/WSC.2017.8247804(419-430)Online publication date: Dec-2017
    • (2017)An approach to optimise the energy efficiency of iterative computation on integrated GPU---CPU systemsThe Journal of Supercomputing10.1007/s11227-016-1643-973:1(114-125)Online publication date: 1-Jan-2017
    • (2016)A black-box approach to energy-aware scheduling on integrated CPU-GPU systemsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854052(70-81)Online publication date: 29-Feb-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media