Invited paperRun-time demand estimation and modulation of on-chip decaps at system level for leakage power reduction in multicore chips
Introduction
Power has been a primary challenge in the design of high-performance processor chips [1], and leakage power constitutes a significant part of total chip power. Although leakage power can be reduced effectively by technologies such as Hi-k metal, SOI and Dual-Gate in sub-45 nm technologies, it is still a big challenge in sub-22 nm technology [2,3].
Power is delivered from the voltage regulator on the board to each transistor on chip by a power grid as shown in Fig. 1(a). The parasitics in the power grid, together with the temporal variations in the current drawn by the switching circuit blocks, result in transient voltage noise in the power grid, which can adversely impact the performance and reliability of a chip. The on-chip decoupling capacitors (decaps) in a multicore chip (see Fig. 2) can serve as temporary power reservoirs and provide the needed power to its nearby large switching circuit blocks to suppress the supply noise, and thus increase the reliability of the power grid [4]. Unfortunately, the deliberately-added decap can occupy more than 20% of the total chip area in high-end processors and its leakage can contribute to more than 20% of the total power consumption [5].
There has been very limited prior work on reducing the decap leakage. In Ref. [6], the authors apply power gating technique to turn off part of the power grid, together with its associated decaps to reduce decap leakage. This technique can only be used when all the circuit blocks under the local power grid are inactive. The authors in Ref. [5] develop an active decap circuit to boost the performance of the conventional decaps so that the allocated decap resource for a given chip can be reduced.
Our work is motivated by the characteristic of the power profile of circuit blocks in a processor chip. Fig. 1(b) shows the power profile of one circuit block in a processor that is generated by simulating one PARSEC 2.1 benchmark [7] with the full-system function simulator GEM5 [8] and power simulator McPAT [2]. We can see it clearly that the current demand of the circuit block has large temporal variation, implying that its demand on the amount of decap capacitance also has large variation, and thus there is big room for us to explore to save the leakage power of decaps.
In this work, we consider the system level runtime decap optimization problem in a multicore chip, in which each core is composed of a group of large circuit blocks and a small number (at most tens) of decaps. It should be emphasized that at the circuit level, each core may have thousands of tiny decaps distributed over the whole chip area, but our work looks at the decap modulation problem at the system level and thus each decap in our work can be thought of a lumped decap which consists of tens to hundreds of tiny decaps at the circuit level. We first propose an approximate approach to estimate the demands on the “on” capacitance of each decap at runtime, then present two techniques to further improve the runtime efficiency of the approximate approach. Once we know the required “on” capacitance of each decap in each time interval, we can achieve runtime decap modulation by the idea of Gated decap [9] or digital capacitance modulation [10], which can turn off the unused part of each decap to save leakage power. Therefore, our approach has better flexibility than the work in Ref. [6], and can also work in tandem with the active decap idea in Ref. [5]. Results on a set of benchmarks show that our approach can achieve about 30% saving in decap leakage, and the approximate approach can further reduce the computation cost by up to 22× with accuracy loss of less than 1%. Regarding the overhead for our decap estimation and modulation methods, our analysis results show that the area, energy and estimation time overheads of our methods are negligible.
The rest of the paper is organized as follows. In Section 2 we present our problem formulation. In Section 3, we describe our approximate approach. We explain the two techniques to improve the approximate approach in Section 4. Then we analyze and estimate the overhead to implement our proposed method in Section 5. Section 6 reports the performance of the proposed approximate approach and two techniques for speedup on a set of benchmarks, followed by the conclusions in Section 7.
Section snippets
Problem formulation
Our run-time demand estimation and modulation of decap problem has the following inputs:
- 1.
A power grid (See Fig. 1(a)) with VDD sources, decaps, and circuit block loads modeled as current sources. The locations of the VDD sources, decaps and current sources are known.
- 2.
The temporal power profile of each current source (), where () is the starting time of the -th time interval, is the total number of intervals, and is the average current of load in interval
Our methodology
In this section, we present our approximate approach to estimate the current demands on each decap and show how it can be used in run-time decap modulation.
Improvement techniques
In this section, we present two techniques to improve the performance of the approximate approach presented in Section 3.
Overhead of decap estimation and modulation
In this section we analyze the implementation overhead of our online decap estimation and modulation methodology, which includes 1) the time and energy overhead to estimate the decap amount using the method presented in Section 4.2, and 2) the area, time and energy overhead for online decap modulation.
The time and energy overhead for decap estimation can be estimated as follows:
- ∙
Time overhead: The total computation time is the sum of the total multiply-accumulate (MAC) time, which is product of
Experimental results
In our experiments, we generated four multicore benchmarks in which the number of cores ranges between 2 and 16. We first used GEM5 [8] to simulate the PARSEC 2.1 benchmarks [7] to get their runtime statistics and then obtained the function blocks and their area and power numbers from McPAT [2]. To build the power grid, we took the metal layer information, together with the power pad pitch information, from the IBM power grid benchmarks [11] (at 180 nm technology node), and then scaled them
Conclusion
In this paper, we present an approximate approach to estimate the amount of required “on” capacitance of each decap at runtime in multicore chips. Based on the estimation, we can dynamically modulate the decaps to save leakage power by turning off their unused capacitance. Results on a set of benchmarks show that our approach can achieve on average 45% saving in decap leakage. We further develop two techniques (incremental calculation and sparsification) to reduce the operation amount of the
Acknowledgment
This work was supported by the NSFC grant 61401276.
References (19)
Power challenges may end the multicore era
Commun. ACM
(Feb. 2013)Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures
Leakage power characterization and minimization in 3D stacked multi-core chips with microfluidic cooling
Congestion-aware power grid optimization for 3D circuits using MIM and CMOS decoupling capacitors
Design and implementation of active decoupling capacitor circuits for power supply regulation in digital ics
IEEE Trans. VLSI Syst.
(Feb. 2009)Decoupling for power gating: sources of power noise and design strategies
Benchmarking Modern Multiprocessors
(2011)- et al.
The gem5 simulator
Comput. Architect. News
(2011) Gated decap: gate leakage control of on-chip decoupling capacitors in scaled technologies
IEEE Trans. VLSI Syst.
(Dec. 2009)
Cited by (2)
Early Area and Power Estimation Model for Rapid System Level Design and Design Space Exploration
2022, Advances in Electrical and Electronic EngineeringOptimization of the voltage regulators and voltage noise for the power delivery network
2020, Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University