Elsevier

Integration

Volume 65, March 2019, Pages 322-330
Integration

Invited paper
Run-time demand estimation and modulation of on-chip decaps at system level for leakage power reduction in multicore chips

https://doi.org/10.1016/j.vlsi.2018.01.009Get rights and content

Highlights

  • We propose an approximate approach to estimate the demand capacitance of each decap at runtime.

  • We achieve runtime decap modulation by the idea of Gated decap, which can turn off part of each decap to save leakage power.

  • We did experiments to show the decap leakage saving and the efficiency of the improvement techniques.

Abstract

The leakage power of decaps occupies a large portion of total chip leakage power. In this paper we propose an approximate approach to estimate the amount of the required “on” capacitance of each decap at runtime to achieve runtime decap modulation in multicore chips, and further develop two techniques (incremental calculation and sparsification) to improve the approximate approach. Results on a set of benchmarks show that our approach can achieve about 45% saving in decap leakage on average, and the approximate approach can further reduce the computation cost by up to 22× with accuracy loss of less than 1%.

Introduction

Power has been a primary challenge in the design of high-performance processor chips [1], and leakage power constitutes a significant part of total chip power. Although leakage power can be reduced effectively by technologies such as Hi-k metal, SOI and Dual-Gate in sub-45 nm technologies, it is still a big challenge in sub-22 nm technology [2,3].

Power is delivered from the voltage regulator on the board to each transistor on chip by a power grid as shown in Fig. 1(a). The parasitics in the power grid, together with the temporal variations in the current drawn by the switching circuit blocks, result in transient voltage noise in the power grid, which can adversely impact the performance and reliability of a chip. The on-chip decoupling capacitors (decaps) in a multicore chip (see Fig. 2) can serve as temporary power reservoirs and provide the needed power to its nearby large switching circuit blocks to suppress the supply noise, and thus increase the reliability of the power grid [4]. Unfortunately, the deliberately-added decap can occupy more than 20% of the total chip area in high-end processors and its leakage can contribute to more than 20% of the total power consumption [5].

There has been very limited prior work on reducing the decap leakage. In Ref. [6], the authors apply power gating technique to turn off part of the power grid, together with its associated decaps to reduce decap leakage. This technique can only be used when all the circuit blocks under the local power grid are inactive. The authors in Ref. [5] develop an active decap circuit to boost the performance of the conventional decaps so that the allocated decap resource for a given chip can be reduced.

Our work is motivated by the characteristic of the power profile of circuit blocks in a processor chip. Fig. 1(b) shows the power profile of one circuit block in a processor that is generated by simulating one PARSEC 2.1 benchmark [7] with the full-system function simulator GEM5 [8] and power simulator McPAT [2]. We can see it clearly that the current demand of the circuit block has large temporal variation, implying that its demand on the amount of decap capacitance also has large variation, and thus there is big room for us to explore to save the leakage power of decaps.

In this work, we consider the system level runtime decap optimization problem in a multicore chip, in which each core is composed of a group of large circuit blocks and a small number (at most tens) of decaps. It should be emphasized that at the circuit level, each core may have thousands of tiny decaps distributed over the whole chip area, but our work looks at the decap modulation problem at the system level and thus each decap in our work can be thought of a lumped decap which consists of tens to hundreds of tiny decaps at the circuit level. We first propose an approximate approach to estimate the demands on the “on” capacitance of each decap at runtime, then present two techniques to further improve the runtime efficiency of the approximate approach. Once we know the required “on” capacitance of each decap in each time interval, we can achieve runtime decap modulation by the idea of Gated decap [9] or digital capacitance modulation [10], which can turn off the unused part of each decap to save leakage power. Therefore, our approach has better flexibility than the work in Ref. [6], and can also work in tandem with the active decap idea in Ref. [5]. Results on a set of benchmarks show that our approach can achieve about 30% saving in decap leakage, and the approximate approach can further reduce the computation cost by up to 22× with accuracy loss of less than 1%. Regarding the overhead for our decap estimation and modulation methods, our analysis results show that the area, energy and estimation time overheads of our methods are negligible.

The rest of the paper is organized as follows. In Section 2 we present our problem formulation. In Section 3, we describe our approximate approach. We explain the two techniques to improve the approximate approach in Section 4. Then we analyze and estimate the overhead to implement our proposed method in Section 5. Section 6 reports the performance of the proposed approximate approach and two techniques for speedup on a set of benchmarks, followed by the conclusions in Section 7.

Section snippets

Problem formulation

Our run-time demand estimation and modulation of decap problem has the following inputs:

  • 1.

    A power grid (See Fig. 1(a)) with M VDD sources, K decaps, and L circuit block loads modeled as current sources. The locations of the VDD sources, decaps and current sources are known.

  • 2.

    The temporal power profile (tj,Ijl) of each current source l (l=1,,L), where tj (j=1,,J) is the starting time of the j-th time interval, J is the total number of intervals, and Ijl is the average current of load l in interval

Our methodology

In this section, we present our approximate approach to estimate the current demands on each decap and show how it can be used in run-time decap modulation.

Improvement techniques

In this section, we present two techniques to improve the performance of the approximate approach presented in Section 3.

Overhead of decap estimation and modulation

In this section we analyze the implementation overhead of our online decap estimation and modulation methodology, which includes 1) the time and energy overhead to estimate the decap amount using the method presented in Section 4.2, and 2) the area, time and energy overhead for online decap modulation.

The time and energy overhead for decap estimation can be estimated as follows:

  • Time overhead: The total computation time is the sum of the total multiply-accumulate (MAC) time, which is product of

Experimental results

In our experiments, we generated four multicore benchmarks in which the number of cores ranges between 2 and 16. We first used GEM5 [8] to simulate the PARSEC 2.1 benchmarks [7] to get their runtime statistics and then obtained the function blocks and their area and power numbers from McPAT [2]. To build the power grid, we took the metal layer information, together with the power pad pitch information, from the IBM power grid benchmarks [11] (at 180 nm technology node), and then scaled them

Conclusion

In this paper, we present an approximate approach to estimate the amount of required “on” capacitance of each decap at runtime in multicore chips. Based on the estimation, we can dynamically modulate the decaps to save leakage power by turning off their unused capacitance. Results on a set of benchmarks show that our approach can achieve on average 45% saving in decap leakage. We further develop two techniques (incremental calculation and sparsification) to reduce the operation amount of the

Acknowledgment

This work was supported by the NSFC grant 61401276.

References (19)

  • H. Esmaeilzadeh

    Power challenges may end the multicore era

    Commun. ACM

    (Feb. 2013)
  • S. Li

    Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures

  • H. Xiao

    Leakage power characterization and minimization in 3D stacked multi-core chips with microfluidic cooling

  • P. Zhou

    Congestion-aware power grid optimization for 3D circuits using MIM and CMOS decoupling capacitors

  • J. Gu

    Design and implementation of active decoupling capacitor circuits for power supply regulation in digital ics

    IEEE Trans. VLSI Syst.

    (Feb. 2009)
  • T. Xu

    Decoupling for power gating: sources of power noise and design strategies

  • C. Bienia

    Benchmarking Modern Multiprocessors

    (2011)
  • N. Binkert et al.

    The gem5 simulator

    Comput. Architect. News

    (2011)
  • Y. Chen

    Gated decap: gate leakage control of on-chip decoupling capacitors in scaled technologies

    IEEE Trans. VLSI Syst.

    (Dec. 2009)
There are more references available in the full text version of this article.

Cited by (2)

View full text