

# Power modeling for digital circuits with clock gating

## Joonhwan Yi and Jonggyu Kim<sup>a)</sup>

*Dept. of Computer Engineering, Kwangwoon University, Seoul, South Korea* a) *jonggyu.q@kw.ac.kr* 

**Abstract:** A power model for digital circuits with clock gating is proposed. The power states are defined by the values of clock gating enable signals. The power consumption for each power state is characterized by the lowlevel power analysis results. Experimental results show that the proposed power model achieves about 400 times faster analysis speed with less than 1% of error on average comparing to gate-level power models.

**Keywords:** power model, clock gating, high-level, power analysis **Classification:** Integrated circuits

### References

- B. Fischer, C. Cech and H. Muhr: Proc. of Design, Automation and Test in Europe Conf. (2014) 1. DOI:10.7873/DATE.2014.210
- [2] L. Ikhwan, H. Kim, P. Yang, S. Yoo, E.-Y. Chung, K.-M. Choi, J.-T. Kong and S.-K. Eo: Proc. of Asia South Pacific Design Automation Conf. (2006) 551. DOI:10.1109/ASPDAC.2006.1594743
- [3] Y. Park, S. Pasricha, F. J. Kurdahi and N. Dutt: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19 (2010) 668. DOI:10.1109/TVLSI.2009.2039153
- [4] L. Zhong, S. Ravi, A. Raghunathan and N. K. Jha: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 25 (2006) 2103. DOI:10.1109/TCAD.2005.859504
- [5] N. Bansal, K. Lahiri and A. Raghunathan: Proc. of the Int. Conf. on VLSI Design (2007) 513. DOI:10.1109/VLSID.2007.46
- [6] R. Fraer, G. Kamhi and M. K. Mhameed: Proc. of Design Automation Conf. (2008) 658. DOI:10.1145/1391469.1391638
- [7] G. Kim, Y. H. Je and S. Kim: IEEE Trans. Consum. Electron. 55 (2009) 1847. DOI:10.1109/TCE.2009.5373741
- [8] J. Oh and M. Pedram: IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 20 (2001) 715. DOI:10.1109/43.924825
- [9] Carbon Design Systems: http://www.carbondesignsystems.com/.
- [10] Opencores.org: IP circuits, http://opencores.org/.
- [11] Synopsys Inc.: http://www.synopsys.com.
- [12] Arizona State University: YUV Video Sequences, http://trace.eas.asu.edu/yuv/.

#### 1 Introduction

System-level power analysis is essential for designing a low power system-ona-chip (SoC) [1]. Rapidly increasing design complexity of SoCs hinders many of system-level design activities at low levels of design abstraction such as register transfer level (RTL) or gate-level. Due to the unnecessary details of low-level





power models, low-level power analyses are extremely slow. Thus, the power analysis at RTL or gate-level is usually performed for a block in an SoC rather than the whole SoC. In this case, the total power consumption of an SoC is estimated by simply summing the power values of all blocks in the SoC. This results in an overestimated value of power consumption because all blocks in an SoC may not operate all the time. So, the current low-level power analysis of an SoC is neither accurate nor fast.

The power consumption of a circuit depends not only on functional complexities but also on various low level implementation details. For examples, various low power techniques such as clock gating, data gating, and power gating significantly affect the power consumption of a circuit. Note that these low power techniques are not manifest at higher level than RTL or gate-level. So, the power related implementation details at RTL or gate-level need to be back-annotated to the high-level power models for accurate power analysis.

Various high-level power modeling approaches back annotating RTL or lower level design information have been proposed [2, 3, 4]. All of these approaches need to know the behavior of the target circuit in detail for power modeling. So it is very hard to automatically extract the power states using these approaches where a *power state* is a set of circuit states that show significantly different behavior in power consumption from others. An automatic high-level power modeling approach for IP circuits is proposed in [5] where user's discretion is still needed to reduce the number of power states for practical usage.

We propose a high-level power model based on clock gating enable signals. Since the clock gating enable signals can be structurally extracted from a gate-level description, automatic power modeling is possible without knowing design details. Note that current practices of designing SoCs reuse many of existing legacy IP circuits whose RTL or gate-level designs are available. If an SoC contains an IP circuit whose RTL or gate-level description is not available, other behavioral synthesis based power modeling approaches such as [4] can be used.

The remainder of the paper is organized as follows. The proposed power model is discussed in Section 2. The power estimation based on a linear equation derived by multiple linear regression is discussed in Section 3. The experimental results are presented in Section 4 and a conclusion is drawn in Section 5.

## 2 Power modeling

*Clock gating* [6, 7] is a well-known low power design technique that reduces the dynamic power consumption of a circuit by making clock signals stable when it is possible. Clock gating reduces power consumptions not only on flip-flops but also on clock tree networks [6, 8]. It is reported that clock gating reduces dynamic power of a circuit upto 70% [6, 7]. This implies that the dynamic power consumption of a circuit varies much depending on the number of gated clocks. Thus, enabling or disabling of clock gating enable signals influences the dynamic power consumption of a circuit notably. From this observation, we propose a higher level power model based on the clock gating enable signals.







| Power<br>state               | en <sub>1</sub> | en <sub>2</sub> | en <sub>3</sub> | Power<br>value |
|------------------------------|-----------------|-----------------|-----------------|----------------|
| $ps_0 = 000$                 | 0               | 0               | 0               | $p_0$          |
| <i>ps</i> <sub>1</sub> = 001 | 0               | 0               | 1               | $p_1$          |
| <i>ps</i> <sub>2</sub> = 010 | 0               | 1               | 0               | P <sub>2</sub> |
| <i>ps</i> <sub>3</sub> = 011 | 0               | 1               | 1               | $P_3$          |
| <i>ps</i> <sub>4</sub> = 100 | 1               | 0               | 0               | $p_4$          |
| <i>ps</i> <sub>5</sub> = 101 | 1               | 0               | 1               | $p_5$          |
| <i>ps</i> <sub>6</sub> = 110 | 1               | 1               | 0               | $P_6$          |
| $ps_7 = 111$                 | 1               | 1               | 1               | $P_7$          |

(a) Circuit C1 with three clock gating enable signals.

(b) Power state table.

Fig. 1. Example circuit and its power states

Consider a circuit *C* that has *n* clock gating enable signals  $EN_C = \{en_1, \ldots, en_i, \ldots, en_n\}$  where  $en_i$  for  $1 \le i \le n$  controls  $n_i$  clock gating cells  $CG_i = \{cg_{i1}, cg_{i2}, \ldots, cg_{in_i}\}$ . Then, there are *n* clock gating domains  $CGD_C = \{cgd_1, \ldots, cgd_j, \ldots, cgd_n\}$  where  $cgd_i$  whose dynamic power consumption is significantly affected by the value of enable signal  $en_i$ . Consequently, a power state  $ps_k$  of *C* can be defined by the combinations of the values  $V_k = \{v_1, \ldots, v_n\}$  of enable signals  $EN_C$  where  $v_i$  is the value of enable signal  $en_i$ .

**Definition 2.1.** (A power state) A power state of a circuit C is the combinations of the values of clock gating enable signals in C.

For example, consider a circuit C1 in Fig. 1a that has three clock gating enable signals  $EN_{C1} = \{en_1, en_2, en_3\}$ . So, there are eight power states  $PS_{C1} = \{ps_0, ps_1, \dots, ps_7\}$  as shown in Fig. 1b.

A power state  $ps_i$  in a circuit *C* is *sensitized* by an input sequence *S* if *S* makes *C* experience  $ps_i$  while *S* is applied. Power state  $ps_i$  is *redundant* if no input sequence sensitizes  $ps_i$ . Identifying a redundant power state is an important problem and further research is needed.

For a non-redundant power state  $ps_i$ , an average power (consumption) value  $p_i$  is computed or *characterized* by using a representative set of input sequences at RTL or lower levels of design abstraction. A set *S* of input sequences is called a *characterization sequence* if *S* is used to characterize the power values of a power model. It is challenging to define a characterization sequence that sensitizes all non-redundant power states. In practice, most IP circuit designers have a comprehensive set of input sequences for functional verification purpose. We use the verification sequences as characterization sequences for our experiments. The uncharacterized non-redundant power states are handled by the multiple linear regression discussed in Section 3.

Fig. 2 shows a snippet of cycle-by-cycle power estimation results of C1 at RTL or lower level. For each clock cycle, the values of enable signals in C1 can be translated to a power state, and the power value is obtained by a low level power analysis tool. In Fig. 2, for power state  $ps_5$ , low-level power values  $v_1$  and  $v_3$  may be different. The average of such low-level power values corresponding to  $ps_5$  becomes power value  $p_5$ .

The proposed power model  $PM_C$  of a circuit *C* will input a scenario of *C* and output cycle-by-cycle power values that are eventually used to compute the average







Fig. 2. Characterization of power values.

power of the given scenario. For this purpose,  $PM_C$  needs to have a set of circuits  $Cen_C = \{Cen_1, \ldots, Cen_i, \ldots, Cen_y\}$  for  $EN_C$  where  $Cen_i$  is the circuit that compute the values of  $en_i$  by using the inputs of C. Once the values of  $EN_C$  for every clock cycle are computed, the corresponding power state and power value can be computed. Consequently, the proposed power model of circuit C is composed of power states  $PS_C$ , power values  $P_C$ , and  $Cen_C$ .

The size of  $Cen_C$  can be comparable to that of original circuit *C*. Nevertheless, the proposed power model can estimate power consumption faster than the RTL or gate-level power models because of two facts. The first one is that the proposed power model can have  $Cen_C$  at higher level than RTL for power estimation. The conversion from RTL to C-level (SystemC/C/C++) can be done manually or by using a commercial tool such as Carbon Model Studio [9]. The second is that the proposed power model computes the power consumption only by performing simulation. The gate-level or RTL power analysis time is composed of two significant components: the simulation time  $t_s$  and the analysis time  $t_a$ . Note that  $t_a$  is usually much longer than  $t_s$ . By using the proposed power model, we can eliminate the analysis time  $t_a$  low level power model can, see Section 4.

#### 3 Power estimation for uncharacterized power states

The characterized power states and their corresponding power values are stored in a form of binary tree called a *power tree* in the proposed power model. A characterization sequence may not sensitize all non-redundant power states. If a user scenario sensitizes these uncharacterized power states, the power model with the incomplete power tree cannot estimate the power consumption of the scenario properly. To sensitize those power states, additional input sequences may be generated and added.

As an alternative solution, we have performed multiple linear regression based on the equations of the characterized power states  $PS = ps_1, ps_2, ..., ps_y$ and their corresponding power values  $P = p_1, p_2, ..., p_y$ . Multiple linear regression computes the coefficients  $\{c_0, c_1, ..., c_i, ..., c_y\}$  for clock gating enable signals  $\{en_1, ..., en_i, ..., en_y\}$  that comprises *PS* where  $c_i$  is the coefficient for  $en_i$  and  $c_0$ 





represents the power consumption when all clock gating enable signals are disabled. The resultant equation of the multiple linear regression predicts the power consumption of the uncharacterized power states.

$$p_C = c_0 + c_1 e n_1 + \dots + c_y e n_y \tag{1}$$

A power model uses equation (1) to compute the power value of an uncharacterized power state in addition to the power tree introduced earlier in this section.

## 4 Experimental results

We measured the power analysis accuracy and speed gain of the proposed power model using various IP circuits from Opencores.org [10]: universal asynchronous receiver and transceiver (UART), double floating point unit (FPU), and H.264/ AVC baseline decoder *nova*. Circuits are synthesized with a TSMC 130 nm process library using Synopsys Design Compiler [11]. Clock gating cells are automatically inserted by Synopsys Power Compiler.

| Circuits | Area<br>(gate<br>count) | Gated<br>flipflop<br>ratio | Number of<br>clock gating<br>enable signals | Number of<br>characterized<br>power states |
|----------|-------------------------|----------------------------|---------------------------------------------|--------------------------------------------|
| UART     | 2 K                     | 82%                        | 16                                          | 22                                         |
| FPU      | 63 K                    | 90%                        | 13                                          | 26                                         |
| novaN    | 306 K                   | 93%                        | 37                                          | 386                                        |
| nova8    | 305 K                   | 99%                        | 148                                         | 2,086                                      |

Table I. Information of the circuits under experiments.

Table I shows the gate-level information of the circuits. In case of *nova*, two versions of gate-level netlists are synthesized: *novaN* and *nova8*. For example, the area of *nova* is 305 K in gate count and 99% of flipflops in *nova8* are gated. Although there are 148 clock gating enable signals, only 2,086 power states are characterized, which is a vanishingly small fraction of the number of exhaustive power states  $2^{148}$ . Nevertheless, the accuracy is quite high as can be seen in Table II.

One characterization sequence is used to characterize the power model for each circuit. Then, three test sequences {TS1, TS2, TS3} are used to measure the accuracies and the speed gains. *UART* uses a simple characterization sequence composed of single write and single read operations. Test sequences are composed of twenty writes, twenty reads, and randomly mixed writes and reads with random number of operations. *FPU* uses the total of fifty operations composed of addition, subtraction, multiplication, and division. The order of operations and the execution number of each operation are random. Then, test sequences are composed of different number of executions, orders, and operands of the four operations. The first three frames of the well-known video sequence *Carphone* [12] are used for characterization of *novaN* and *nova8*. Then, the first three frames of video sequences *Bridge*, *Hall*, and *Foreman* are the test sequences.





| Table II. | Absolute error rates compared to gate-level (GL) power  |  |  |  |  |  |  |
|-----------|---------------------------------------------------------|--|--|--|--|--|--|
|           | model. The high-level (HL) power values obtained by the |  |  |  |  |  |  |
|           | proposed power models and GL power values are in mW and |  |  |  |  |  |  |
|           | the error rates (Err) are in %.                         |  |  |  |  |  |  |

| Circuit | TS1   |       |      | TS2   |       |      | TS3   |       |      | Aug  |
|---------|-------|-------|------|-------|-------|------|-------|-------|------|------|
|         | GL    | HL    | Err  | GL    | HL    | Err  | GL    | HL    | Err  | Avg. |
| UART    | 0.127 | 0.128 | 0.53 | 0.121 | 0.120 | 0.70 | 0.126 | 0.125 | 0.12 | 0.45 |
| FPU     | 8.96  | 8.49  | 5.24 | 13.2  | 13.2  | 0.08 | 12.5  | 12.4  | 0.47 | 1.93 |
| novaN   | 0.812 | 0.822 | 1.14 | 0.815 | 0.824 | 1.03 | 0.841 | 0.841 | 0.10 | 0.76 |
| nova8   | 0.753 | 0.759 | 0.74 | 0.756 | 0.762 | 0.72 | 0.783 | 0.783 | 0.01 | 0.49 |
| Average |       |       |      |       |       |      |       | 0.91  |      |      |

The accuracy of the proposed power model is presented in terms of the error rates compared to gate-level power models in Table II. The gate-level power analysis has been performed by using Synopsys PrimeTime-PX. All circuits achieve less than 2% error rates on average. The worst case error rate 5.24% is shown for test sequence TS1 of *FPU*, which is mainly due to incomplete characterizations. More than 99% of cycles in TS1 hit uncharacterized power states whose power values are computed via the equation in 1. For example, a sensitized uncharacterized power state that is hit more than 40% during the application of TS1 is added to the set of characterized power states. This results in error rate of 0.33%, which is dropped by 4.91% from the previous results 5.24% shown in Table II. The quality measure of characterization sequence needs to be studied further.

In order to measure the speed gain of the proposed power models, C-level functional models of the test circuits are developed by using Carbon Model Studio. The C-level functional models and the power models are integrated in a high-level simulation tool Carbon SoC Designer [9]. Then, the power analysis time of the proposed power model is compared to that of the gate-level power analysis. The speed gains of the proposed power models are summarized in Table III. The average speed gain is more than 200 times and the best speed gain reaches more than 500 times. The time for power model generation of *UART*, *FPU*, and *nova* is 218, 210, and 3857 seconds, respectively.

**Table III.** Speed gains compared to gate-level power models. Power<br/>analysis times at gate-level (GL) and high-level (HL) are in<br/>second. The speed gain (Gain) is the ratio of GL to HL.

| Circuit | TS1  |      |      | TS2  |     |      | TS3  |      |      | Aug  |
|---------|------|------|------|------|-----|------|------|------|------|------|
|         | GL   | HL   | Gain | GL   | HL  | Gain | GL   | HL   | Gain | Avg. |
| UART    | 340  | 11.7 | 29   | 262  | 9.1 | 29   | 302  | 10.6 | 28   | 29   |
| FPU     | 49   | 0.1  | 487  | 54   | 0.1 | 540  | 66   | 0.2  | 331  | 452  |
| novaN   | 3291 | 4.6  | 112  | 3327 | 4.5 | 735  | 3722 | 4.6  | 818  | 755  |
| nova8   | 2980 | 8.9  | 337  | 3061 | 8.8 | 349  | 3569 | 8.9  | 401  | 362  |
| Average |      |      |      |      |     |      |      | 399  |      |      |





# 5 Conclusion

A high-level power model based on the clock gating enable signals is proposed whose power states are defined by the combination of the values of clock gating enable signals. Multiple linear regression on the observed power values and corresponding power states is performed to derive a linear equation for estimating power values of uncharacterized power states. Experimental results show that the proposed power tree based power model results in less than 1% of power estimation errors on average compared to the gate-level power analysis methods. The average analysis speed of the proposed power models is 300 times higher than that of the gate-level power models. Most of all, the proposed power modeling technique can be easily automated without knowing the detail behavior of a target circuit. It is straightforward to extend this power modeling approach to dynamic and static power modeling based on data gating, power gating, and so on.

The quality of a characterization sequence is very important and impacts on the accuracy of the proposed power models. Although the synthesis of high-quality characterization sequences may be hard, the quality of a characterization sequence should be measurable quantitatively. This is one of the research areas on which we plan to work.

# Acknowledgments

The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2015. This work was supported by the Industrial Strategic Technology Development Program (10047664, Automatic power model generation software development for low power designs with more than 300 times faster power analysis speed and less than 20% error rate on average with respect to the gate-level power models) funded By the Ministry of Trade, industry & Energy (MI, Korea).

