# T2R2 東京工業大学リサーチリポジトリ Tokyo Tech Research Repository

## 論文 / 著書情報 Article / Book Information

| Title     | A Fast Clock Scheduling for Peak Power Reduction in LSI                    |
|-----------|----------------------------------------------------------------------------|
| Authors   | Yosuke Takahashi, Yukihide Kohira, Atsushi Takahashi                       |
| Citation  | IEICE Trans. Fundamentals, Vol. E91-A, No. 12, pp. 3803-3811               |
| Pub. date | 2008, 12                                                                   |
| URL       | http://search.ieice.org/                                                   |
| Copyright | (c) 2008 Institute of Electronics, Information and Communication Engineers |

#### PAPER

### A Fast Clock Scheduling for Peak Power Reduction in LSI\*

Yosuke TAKAHASHI<sup>†</sup>, Nonmember, Yukihide KOHIRA<sup>†a)</sup>, and Atsushi TAKAHASHI<sup>†</sup>, Members

SUMMARY The reduction of the peak power consumption of LSI is required to reduce the instability of gate operation, the delay increase, the noise, and etc. It is possible to reduce the peak power consumption by clock scheduling because it controls the switching timings of registers and combinational logic elements. In this paper, we propose a fast peak power wave estimation method for clock scheduling and fast clock scheduling methods for the peak power reduction. In experiments, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed peak power wave estimation method, proposed clock scheduling methods find clock schedules that greatly reduce the peak power consumption in a few minutes.

**key words:** clock scheduling, general-synchronous framework, peak power reduction, peak power wave estimation

#### 1. Introduction

The semiconductor manufacturing process technology has improved the scale, speed and power consumption of LSI circuits. However, the increase of the ratio of the routing delay in the propagation delay bounds the amount of improvements in the conventional synchronous framework in which the simultaneous clock distribution to every register is assumed. We call it complete-synchronous framework (cframe). The increases of the size and power consumption of a clock distribution circuit have become serious issues in c-frame. While, general-synchronous framework (*g-frame*) [2]-[8], in which the clock is assumed to be distributed periodically to each individual register though not necessarily to all registers simultaneously, is expected to give an essential solution. By using g-frame, the quality of circuit such as the clock frequency, clock distribution circuit size, power consumption, and etc. are expected to be improved.

In LSI design, the reduction of the peak power consumption is required to reduce the instability of gate operation, the delay increase, the noise, and etc. In c-frame, the difference of the maximum and the minimum power consumption in the power wave is usually very large since all the registers switch simultaneously and the switchings of gates follow. Therefore, the peak power consumption is very large compared with the average power consumption.

Manuscript received February 22, 2008.

Manuscript revised July 20, 2008.

\*The preliminary vesion was presented at [1].

a) E-mail: kohira@lab.ss.titech.ac.jp DOI: 10.1093/ietfec/e91-a.12.3803 While, in g-frame, the peak power consumption is also expected to be reduced since the switching timings of registers and gates can be controlled by clock scheduling.

The peak power reduction methods by clock scheduling are investigated in [7], [8]. In [7], the power wave is estimated under the assumption that switching timings of gates in a combinational circuit are fixed regardless of clock scheduling. However, in fact, since the switching timings of gates in the combinational circuit depend on the clock schedule, the obtained clock schedule using this estimation is not accurate. In [8], the method is extended from [7]. The power wave is estimated under the assumption that switching timings of gates in a combinational circuit depend on the minimum delay from a register and the clock timing of the register. However, since the switchings of gates are caused by non-minimum delays from a register, the obtained clock schedule using this estimation is still not accurate. Moreover, the computation time of these methods is large, because both methods adopt genetic algorithm (GA) to obtain a feasible clock schedule that reduces the peak power consumption.

In this paper, we propose a fast peak power wave estimation method for clock scheduling. The peak power consumption of a circuit we concern is the maximum instantaneous power consumption during whole circuit operation. In our estimation method, for a given clock schedule, a peak power wave of one clock period length is generated. A peak power wave is expressed as the sum of register-originated peak power waves. A register-originated peak power wave is estimated in advance, based on the estimated switching probabilities of elements. By assuming a register-originated peak power wave is an invariant under clock scheduling, the peak power wave for a clock schedule is quickly generated, and is used for the evaluation of the clock schedule in clock scheduling.

In this paper, clock scheduling methods using the proposed peak power wave estimation method are also proposed. The proposed scheduling methods consist of two stages. In the first stage, a feasible clock schedule is generated by determining the clock timing of a register one by one among its feasible clock timing range so that the objective is as small as possible. In the second stage, the clock schedule obtained by the first stage is greedily modified by changing the clock timing of a register one by one to reduce the objective.

In experiments, it is shown that the peak power wave obtained by the proposed peak power wave estimation

 $<sup>^\</sup>dagger$ The authors are with the Department of Communications and Integrated Systems, Tokyo Institute of Technology, Tokyo, 152-8552 Japan.

method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed peak power wave estimation method, proposed clock scheduling methods find clock schedules for benchmark circuits that greatly reduce the peak power consumption in a few minutes.

#### 2. General-Synchronous Framework

In the conventional synchronous framework, the clock arrival timing of a register is the same as those of the other registers. We call this framework c-frame (complete-synchronous framework). C-frame, in which the clock arrival timings of registers are assumed to be equal, is a kind of g-frame (general-synchronous framework). In g-frame [2]–[8], the clock arrival timing of a register may be different from other registers. The clock timing S(r) of a register r is defined as the difference in the clock arrival time between r and an arbitrary chosen reference register.

A circuit works correctly with a clock period T if the following two types of constraints are satisfied for every register pair with signal propagations [2].

#### **Setup (No-Zero-Clocking) Constraints**

$$S(a) - S(b) \le T - D_{\max}(a, b)$$

#### **Hold (No-Double-Clocking) Constraints**

$$S(b) - S(a) \le D_{\min}(a, b),$$

where  $D_{\text{max}}(a, b)$  ( $D_{\text{min}}(a, b)$ ) is the maximum (minimum) delay from a register a to a register b.

We call the set of clock timing of all registers clock schedule. The clock schedule is called feasible if all constraints are satisfied. The feasible clock schedule is not necessarily being unique. We can define a set of feasible clock schedules by using the range of clock timing of a register [5]. This range of clock timing of a register is referred to as the timing range of the register. Let  $[S_{min}(r), S_{max}(r)]$  be the timing range of a register r, where  $S_{\min}(r)$  is the lower bound and  $S_{\text{max}}(r)$  is the upper bound of timing range of r. If the clock timing of a register r is unique, we regard register r as having the timing range such that  $S_{\min}(r) = S_{\max}(r)$ . We call the set of the timing ranges of all registers clock range schedule. If the clock schedule is feasible whenever the clock timing of each register is selected among its timing range, then the clock range schedule is also called feasible. Since the timing ranges of registers depend on each others, a feasible clock range schedule is also not unique.

The evaluation of a feasible clock schedule depends on the optimization targets such as clock frequency, clock distribution circuit size, power consumption, and etc. Although the difficulty is in finding a feasible clock schedule and a clock range schedule that compromise on multiple objectives, a feasible clock schedule that maximizes the clock frequency is obtained in very short time [3], [4]. In addition, if the target of clock timing of each register is given,

a feasible clock range schedule that minimizes the sum of differences from the targets is also obtained in short time [6].

#### 3. Estimation of Power Wave

The peak power consumption of a circuit we concern is the maximum instantaneous power consumption during whole circuit operation. The peak power consumption of a circuit within each clock period depends on a sequence of the input vectors and on a clock schedule. While, the peak power consumption of a circuit during whole circuit operation is independent of a sequence of the input vectors but depends on a clock schedule. In our peak power estimation method, the concept of the peak power wave of one clock period length is used. The peak power wave consists of the instantaneous power consumption of each time within one clock period that corresponds to the maximum instantaneous power consumption at the time during the whole circuit operation. The peak power wave depends on a clock schedule. The peak power consumption of a circuit with a clock schedule is the maximum instantaneous power consumption in the corresponding peak power wave.

In our peak power wave estimation, a power wave of one clock period length is generated based on the switching probabilities of elements. Switching probabilities of elements are set so that the obtained power wave approaches the peak power wave. Then, the power wave obtained is used as the estimated peak power wave.

#### 3.1 Overview of Our Power Estimation

In our model, the power consumption of a circuit is the sum of power consumptions of elements by switching. A switching event emerges at a register when the clock is inputted to the register, and propagates into the combinational circuit. The emerging time of a switching event at a register is determined by a clock schedule. A switching event is blocked by an element if some conditions hold, thus the switching event is assumed to propagate with some probability along a path. The propagation time of a switching event from a register to an element along a path is estimated by the sum of gate delays and routing delays of the path.

The power consumption of an element by switching continues during certain time units. The power wave of an element caused by a switching event depends on the environment such as the slew rate of switching and the number of fanouts of the element. The power wave of an element by a switching evaluated empirically is used for the estimation. The details are explained in Sect. 5.

Since each switching event is originated by a register, the power wave of a circuit is divided into the *register-originated power waves* in our model. Associated with a register, the register-originated power wave is defined. The power consumption caused by a switching event emerged at a register belongs to the register-originated power wave of the register.



**Fig. 1** A circuit C and a clock schedule S.



**Fig. 2** Register-originated power waves in schedule *S*.



**Fig. 3** The power wave of C in schedule S.

For example, Fig. 2 shows the register-originated power waves of registers in circuit C with clock schedule S shown in Fig. 1. Let the clock period of C be 5. For simplicity, the power wave of each gate is assumed to be consumed within a time unit at which the switching event occurs in this example. Since the switching event of register u propagates to gate g1, g2, g3, and g4, the register-originated power wave of register u is modeled as shown in Fig. 2. The power wave of circuit C is modeled as shown in Fig. 3, which is obtained by summing register-originated power waves of every registers.

If the input vector to the circuit and the register outputs change, then an actual register-originated power wave changes. It also depends on a clock schedule. The propagation of a switching event which is not blocked in a clock schedule may be blocked in another clock schedule. In the proposed estimation, however, the register-originated power wave is estimated based on the probability, and it is assumed to be independent of a clock schedule. According to this assumption, the computation time in obtaining the estimated power wave in clock scheduling is very small. The estimated power wave is not exact but the accuracy is enough for clock scheduling.

#### 3.2 Power Wave Model

Since each switching event is originated by a register, the power wave of a circuit is divided into the register-originated

power waves in our model.

Let R be the set of registers. Let W(r, t) be the registeroriginated power consumption of a register  $r \in R$  at time t which is derived by the clock inputted to t at time t. The register-originated power consumption of t at time t in clock schedule t is denoted by t0 by t1. The register-originated power wave of t2 is shifted according to t3 by t4. That is,

$$W^{S}(r,t) = W(r,t-S(r)),$$

where S(r) is the clock timing of r in S. Let  $W^S(t)$  be the power consumption of the circuit at time t in S. We have,

$$W^{S}(t) = \sum_{r \in R} W^{S}(r, t).$$

W(r,t) consists of the instantaneous static power consumption of a register r, and the instantaneous dynamic power consumption of r and gates. The instantaneous static power consumption of r is consumed whenever clock is inputted, which is independent of emerging of switching event at r. Let  $W_c(r,t)$  and  $W_g(r,t)$  be the static power consumption and the dynamic power consumption at time t, respectively, which are derived by the clock inputted to r at time t. That is,

$$W(r,t) = W_c(r,t) + W_a(r,t).$$

Let  $w_g(v,t)$  be the switching power consumption of an element v at time t where the switching of v occurs at time 0. Let p(r,v,t) be the switching probability of an element v at time t which is caused by switching events emerged at a register t at time 0. The dynamic power consumption at time t is estimated as follows.

$$W_g(r,t) = \sum_{v \in O(r)} \sum_{0 \le i < T} p(r,v,t-i) \cdot w_g(v,i),$$

where O(r) is the set of gates in the output cone of r.

#### 3.3 Switching Probability

Here, we discuss the switching probabilities of gates and registers. First, we discuss the switching probabilities of a gate. The switching of a gate occurs when the switchings of its several fanin gates occur and the other fanin gates do not prevent the switching. For example, the switching of a NAND gate occurs when switchings of several fanin gates to the same direction occur and the outputs of the other gates are 1. Therefore, the switching provability of each gate depends on condition probabilities of fanin gates and switching probabilities of fanin gates.

The condition probability of each gate depends on condition probabilities of its fanin gates. We define the condition probability of a gate as a function of condition probability of fanin gates. Let condition probability  $c_a(v)$  be the probability of the output of a gate v to be a. In this paper,  $c_a(v)$  as shown in Table 1 is used in the switching probability analysis. The initial condition probability of each register is set to 0.5, and the initial condition probability of each

|          | NOT       | NAND                              | NOR                               | AND                               | OR                                | FF        |
|----------|-----------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|-----------|
| $c_0(v)$ | $c_1(v')$ | $\prod_{v'\in I(v)}c_1(v')$       | $1 - \prod_{v' \in I(v)} c_0(v')$ | $1 - \prod_{v' \in I(v)} c_1(v')$ | $\prod_{v'\in I(v)}c_0(v')$       | $c_0(v')$ |
| $c_1(v)$ | $c_0(v')$ | $1 - \prod_{v' \in I(v)} c_1(v')$ | $\prod_{v' \in I(v)} c_0(v')$     | $\prod_{v' \in I(v)} c_1(v')$     | $1 - \prod_{v' \in I(v)} c_0(v')$ | $c_1(v')$ |

**Table 1** The condition probabilities of gates and registers to be  $a \in \{0, 1\}$  (Let I(v) be the set of fanin gates of v and  $v' \in I(v)$ ).



**Fig. 4** The switching probability of gate and the condition probability (Each gate delay is assumed to be 1).

gate is set if condition probabilities of every fanin gates are set. If condition probabilities of all gates are set, the condition probability of each register are updated by the condition probability of each fanin gate and the computation of condition probabilities of all gates are repeated. The computation of condition probability is repeated until the difference of condition probabilities of every registers in one iteration is less than 0.0001. By using the condition probability of each gate which is independent of input vectors, the switching probability of each gate can be defined quickly.

The switching probability of each gate depends on condition probabilities and switching probabilities of its fanin gates (see Fig. 4). Let v be a NAND gate, I(v) be a set of fanin gates of v, X be a subset of fanin gates of v which are switching, and d(v) be the delay of v. As mentioned above, the switching of NAND gate occurs when all the outputs of fanin gates which are not switching are 1 and the outputs of the other gates are switching to the same direction. Therefore, we have,

$$p(r, v, t) = \sum_{X \subseteq I(v)} \left\{ \prod_{v' \in I(v) \setminus X} c_1(v') \cdot (1 - p(r, v', t - d(v'))) \cdot \left( \prod_{v' \in X} c_1(v') + \prod_{v' \in X} c_0(v') \right) \cdot \prod_{v' \in X} p(r, v, t - d(v')) \right\}.$$

Switching probabilities of NOR, AND, and OR are defined similarly. If v is a NOT gate and the fanin gate of v is v',

$$p(r, v, t) = p(r, v', t - d(v')).$$

Next, we discuss the switching probabilities of a register. The switching of each register occurs when its fanin gate switches odd times within one clock period, but does

not when its fanin gate switches even times. Therefore, the switching probability of a register is the probability such that the fanin gate switches odd times within one clock period. Let  $p_1, p_2, \ldots, p_n$  be the non-zero switching probabilities of the fanin gate of a register r in one clock period. Then the switching probability p(r, r, 0) of r is

$$p(r, r, 0) = p_1(1 - p_2)(1 - p_3) \cdots (1 - p_n)$$

$$+ (1 - p_1)p_2(1 - p_3) \cdots (1 - p_n)$$

$$\cdots$$

$$+ (1 - p_1)(1 - p_2) \cdots (1 - p_{n-1})p_n$$

$$+ p_1p_2p_3(1 - p_4)(1 - p_5) \cdots (1 - p_n)$$

$$+ p_1p_2(1 - p_3)p_4(1 - p_5) \cdots (1 - p_n)$$

$$+ \cdots$$

$$= (1 - (1 - 2p_1)(1 - 2p_2) \cdots (1 - 2p_n))/2.$$

The switching probability of a gate depends on the switching probabilities of registers, and the switching probability of a register depends on the switching probability of the fanin gate of the register. So, the computation of switching probability is repeated until it converges.

In our peak power wave estimation method, the initial switching probability of each register is set to 1. The computation is repeated until the difference of the switching probability of every register in one iteration is less than 0.0001. By setting the initial switching probability larger, it is expected that switching probabilities converge in larger values, and that the obtained register-originated power wave approaches the register-originated peak power wave. Then, the obtained register-originated power wave is used as the estimated register-originated peak power wave. The estimated peak power wave is obtained by summing the estimated register-originated peak power waves according to a clock schedule.

#### 4. Clock Scheduling

We propose fast clock scheduling methods for the reduction of the peak power consumption using the proposed peak power wave estimation.

The goal of the clock scheduling is the minimization of the peak power consumption of a circuit. Since the error of the proposed peak power wave estimation is not small, it is probable that the actual peak power consumption is achieved at a time when an estimated instantaneous power consumption is near the estimated peak power consumption. Therefore, when the difference between the peak power consumption and the average power consumption is large, a time span

```
Input: register set R, clock period T, the delay between
     registers, and the register-originated peak power
\mathbf{Output}: clock schedule S
Objective: minimization of the peak power consump-
     tion or the variance of the power consumptions in
     estimated power wave.
Step 1: Obtain a feasible clock schedule by setting the
     clock timing of each register to the middle of its clock
     timing range determined by the scheduling engine in
Step 2: R_u := R, W^S(t) := 0 \quad (0 \le t < T)
Step 3: Choose the register r_t in R_u such that
     W_{peak}(r_t) = \max_{r \in R_u} W_{peak}(r).
Step 4: Obtain the feasible clock range schedule by en-
     larging the clock timing range of r_t so that the clock
     timing range of r_t is maximized while fixing the clock
     timings of the other registers.
Step 5: Set the clock timing of r_t within its clock timing
     range so that the objective of W^{S}(t) + W^{S}(r_{t}, t) is
```

Fig. 5 The first stage of clock scheduling.

**Step 6**:  $W^S(t) := W^S(t) + W^S(r_t, t), R_u := R_u \setminus \{r_t\}.$ 

**Step 7**: If  $R_u \neq \emptyset$ , then go to Step 3. Otherwise output

S and terminate.

in which the estimated instantaneous power consumptions are near the estimated peak power consumption is better to be short. In order to shorten such time spans, we adopt not only the minimization of the peak power consumption but also the minimization of the variance of the instantaneous power consumptions as an objective. If the variance of the instantaneous power consumptions becomes small, the difference between the peak power consumption and the average power consumption is expected to be reduced, and a time span in which an estimated instantaneous power consumption would become large. However, in such cases, it is expected that the actual peak power consumption has been reduced enough.

The proposed clock scheduling consists of two stages. The first stage of clock scheduling is shown in Fig. 5. Let  $W_{peak}(r) = \max_{0 \le t < T} W(r,t)$  be the maximum instantaneous power consumption of register-originated peak power wave of a register r. In the first stage, a feasible clock schedule is generated by determining the clock timing of a register one by one among its feasible clock timing range from the register with the maximum  $W_{peak}(r)$  to the register with the minimum  $W_{peak}(r)$  so that the objective is as small as possible.

The first stage of clock scheduling shown in Fig. 5 depends on the order of registers that the clock timing is fixed. Therefore, we also propose the clock scheduling modification method in Fig. 6 as the second stage of clock scheduling.

In the second stage, the clock timings of registers are modified one by one so that the objective is improved. Let  $t_{\text{max}}$  ( $t_{\text{min}}$ ) be a time at which an estimated instantaneous power consumption is maximum (minimum), that is,  $W^S(t_{\text{max}}) = \max_{0 \le t < T} W^S(t)$  ( $W^S(t_{\text{min}}) = \min_{0 \le t < T} W^S(t)$ ).

```
Input: register set R, clock period T, the delay between
     registers, the register-originated peak power waves,
     initial clock schedule S, and the peak power con-
     sumption of the circuit W^S(t)
\mathbf{Output}: modified clock schedule S
Objective: minimization of the peak power consump-
      tion or the variance of the power consumption.
Step 1: R_u := R
Step 2: Choose the time t_{\text{max}} and t_{\text{min}} from between 0
      and T such that W^S(t_{\text{max}}) = \max_{0 \le t \le T} W^S(t) and
      W^S(t_{\min}) = \min_{0 \le t < T} W^S(t).
Step 3: If the objective is the peak power, then choose
      the register r_t in R_u such that W^S(r_t, t_{\text{max}}) is max-
      imum. Otherwise, choose the register r_t in R_u such
     that W^S(r_t, t_{\text{max}}) - W^S(r_t, t_{\text{min}}) is maximum.
Step 4: Obtain the feasible clock range schedule by en-
      larging the clock timing range of r_t so that the clock
      timing range of r_t is maximized while fixing the clock
     timings of the other registers.
Step 5: Set the clock timing of r_t within its clock timing
     range so that the objective of W^S(t) is minimized.
Step 6: If the clock timing of r_t is changed, then go to
     Step 1. Otherwise R_u := R_u \setminus \{r_t\}.
```

Fig. 6 The second stage of clock scheduling.

**Step 7**: If  $R_u \neq \emptyset$ , then go to Step 2. Otherwise output

S and terminate.

Let  $r_t$  be a register such that the instantaneous power consumption of the register-originated peak power consumption at  $t_{\text{max}}$  is maximum among registers.  $r_t$  has a larger potential that can reduce the peak power consumption in changing the clock timing, since the peak power consumption is decreased by  $W^{S}(r_{t}, t_{\text{max}})$  if the instantaneous power consumption of the register-originated peak power consumption of  $r_t$  at  $t_{\text{max}}$  becomes 0. Therefore, if the objective is the minimization of the peak power consumption, the algorithm chooses  $r_t$  as a candidate of changing the clock timing in the second stage. While, if the objective is the minimization of the variance of the instantaneous power consumptions as an objective, the algorithm chooses  $r_t$  such that  $W^{S}(r_{t}, t_{\text{max}}) - W^{S}(r_{t}, t_{\text{min}})$  is maximum among registers. If the instantaneous power consumption at  $t_{max}$  is decreased and that at  $t_{\min}$  is increased, the variance is expected to be reduced. The algorithm chooses a register that has a larger potential to reduce the objective in both cases. If the objective is not improved by changing the clock timing of the chosen register, the algorithm chooses another register. If no improvement is obtained by all registers, the algorithm terminates.

#### 5. Experiments

In experiments, the proposed methods are applied to 4 circuits in ISCAS89 benchmark suite shown in Table 2.

The power wave of the each gate and that of the register are set to the results obtained by HSPICE using the  $0.35 \,\mu\text{m}$  library of Rohm Corporation. The power wave of an element with n fanouts is obtained as follows. n NAND gates are used as fanouts of the element and wire effects are ignored. The output of 20 series buffers with step input is used as the input wave of the element. The power wave of

an element is set to the average of rise and fall waves. The time unit of the power estimation, that of clock timing, that of calculation of variance, are set to 5 ps, 100 ps, and 25 ps, respectively. The sampling interval from HSPICE output is set to 5 ps. The clock period is set to the minimum clock period in c-frame rounded up to the time unit of the power estimation.

By using HSPICE simulation, we generate two kinds of power waves of one clock period length, peak power wave and average power wave. The input vectors which are randomly generated are used. The length of the input vectors is set to the number of registers in each circuit. The resolution of power waves is 5 ps. The instantaneous power consumption at a time in a peak power wave is the maximum instantaneous power consumption at the time during whole circuit operation. While, the instantaneous power consumption at a time in an average power wave is the average instantaneous power consumption at the time over whole circuit operation. The peak power consumption by HSPICE is the maximum instantaneous power consumption obtained

 Table 2
 Benchmark circuits.

|         |      |        | clock period [ps] |         |  |
|---------|------|--------|-------------------|---------|--|
| circuit | #FFs | #gates | c-frame           | g-frame |  |
| s1238   | 18   | 508    | 8135              | 6214    |  |
| s1423   | 74   | 657    | 27350             | 23531   |  |
| s5378   | 179  | 2779   | 7641              | 6553    |  |
| s9234   | 211  | 5597   | 17991             | 12351   |  |

**Table 3** The results of peak power consumption, simulation time, and correlation between the estimated peak power wave and the peak power wave by HSPICE simulation in c-frame.

|         | estin  | nation  | HSI      | PICE     |       |
|---------|--------|---------|----------|----------|-------|
| circuit | peak   | time    | peak     | time     | corr. |
|         | [mW]   | [s]     | [mW]     | [s]      |       |
| s1238   | 78.06  | 0.17    | 57.61    | 383.12   | 0.958 |
| s1423   | 67.27  | 8.34    | 66.54    | 4654.34  | 0.854 |
| s5378   | 167.85 | 4.44    | 147.06   | 21402.50 | 0.981 |
| s9234   | 216.50 | 1441.91 | 233.71   | 65915.71 | 0.851 |
| ave.[%] | 110.84 | 0.61    | (100.00) | (100.00) | _     |

peak: peak power consumption

corr.: correlation between the peak power waves of estimation and

HSPICE

ave.: average of the ratios of estimation to HSPICE

by HSPICE which is independent of our sampling interval. While, the peak power consumption estimated by the proposed method is the maximum instantaneous power consumption of the estimated peak power wave, which depends on our sampling interval.

First, the accuracy of our peak power estimation is evaluated in c-frame by comparing with HSPICE simulation. The estimated peak power consumption obtained by our estimation method, the peak power consumption by HSPICE simulation, and the correlation between the estimated peak power wave and the peak power wave by HSPICE simulation are shown in Table 3. Table 3 shows that the peak power consumptions estimated by the proposed method are larger than those by HSPICE in some circuits. However, the peak power waves estimated by the proposed method are highly correlated with those by HSPICE. The computation of the proposed power estimation method is much faster than that of HSPICE simulation.

Next, the property of our estimated peak power wave is evaluated in g-frame by comparing with a peak power wave and an average power wave by HSPICE simulation. In comparisons, clock schedules obtained by our clock scheduling with the peak power objective (P&P) are used. The correlations among estimated peak power wave, peak power wave by HSPICE and average power wave by HSPICE are shown in Table 4. Table 4 shows that if the correlation between a peak power wave by HSPICE simulation and an average power wave by HSPICE simulation is high, then the correlation between an estimated peak power wave and a peak power wave by HSPICE simulation is high. However, the

**Table 4** Correlations among estimated peak power wave, peak power wave by HSPICE simulation and average power wave by HSPICE simulation

|         | P&P      |          |          |  |  |  |  |
|---------|----------|----------|----------|--|--|--|--|
| circuit | peak     | esti.    | esti.    |  |  |  |  |
|         | vs. ave. | vs. peak | vs. ave. |  |  |  |  |
| s1238   | 0.950    | 0.941    | 0.923    |  |  |  |  |
| s1423   | 0.916    | 0.676    | 0.788    |  |  |  |  |
| s5378   | 0.234    | 0.058    | 0.634    |  |  |  |  |
| s9234   | 0.849    | 0.877    | 0.843    |  |  |  |  |

peak : peak power wave by HSPICE simulation ave. : average power wave by HSPICE simulation

esti.: estimated peak power wave

 Table 5
 The estimation results of the clock schedulings.

|         |          | peak power minimization |                 |      |       |                    |      |       | variance minimization |      |       |                    |       |  |
|---------|----------|-------------------------|-----------------|------|-------|--------------------|------|-------|-----------------------|------|-------|--------------------|-------|--|
|         | c-frame  | fii                     | first stage (P) |      |       | second stage (P&P) |      |       | first stage (V)       |      |       | second stage (V&V) |       |  |
| circuit | peak     | peak                    | var.            | time | peak  | var.               | time | peak  | var.                  | time | peak  | var.               | time  |  |
|         | [mW]     | [mW]                    | $[mW^2]$        | [s]  | [mW]  | $[mW^2]$           | [s]  | [mW]  | $[mW^2]$              | [s]  | [mW]  | $[mW^2]$           | [s]   |  |
| s1238   | 78.06    | 14.72                   | 27.75           | 0.03 | 14.55 | 26.38              | 0.04 | 18.03 | 15.79                 | 0.04 | 14.72 | 14.76              | 0.22  |  |
| s1423   | 67.27    | 29.71                   | 20.18           | 0.51 | 29.69 | 20.13              | 0.54 | 30.54 | 8.67                  | 0.71 | 29.71 | 4.51               | 13.43 |  |
| s5378   | 167.85   | 24.42                   | 53.43           | 0.26 | 20.22 | 9.71               | 0.43 | 24.26 | 7.00                  | 0.31 | 21.10 | 0.53               | 2.88  |  |
| s9234   | 216.50   | 43.35                   | 235.65          | 0.95 | 39.31 | 152.87             | 1.63 | 39.78 | 24.21                 | 1.48 | 39.45 | 9.20               | 90.86 |  |
| ave.[%] | (100.00) | 24.40                   | _               | _    | 23.24 | _                  | _    | 25.33 | _                     |      | 23.45 | _                  | _     |  |

peak: peak power consumption

var.: the variance of the peak power consumption

time: computation time

ave.: average of ratios to the peak power consumption in c-frame

accuracy of our estimated peak power wave is not good in some cases. The improvement of the accuracy of our estimated peak power wave will be in our future works.

Next, the clock scheduling methods for the peak power minimization and the variance minimization are evaluated. Clock schedules obtained by the first stage and the second stage are compared by using our peak power wave estimation. The estimation results of the obtained clock schedule are shown in Table 5. In Table 5, P and P&P are the results of the first and second stages, respectively, where the objective is minimization of the peak power consumption.



**Fig. 7** Peak power waves of s1238 obtained by HSPICE (clock schedules obtained by P&P and V&V and c-frame).



**Fig. 8** Peak power waves of s9234 obtained by HSPICE (clock schedules obtained by P&P and V&V and c-frame).

Similarly, V and V&V are the results of the first and second stages, respectively, where the objective is minimization of the variance of power consumption. The estimated peak power consumptions of P which are smaller than that of c-frame are obtained within about a second. They are slightly improved by the second stage with small computation cost. Although the estimated peak power consumptions of V and V&V are almost same as that of P and P&P, the variances become small. The computation times of V&V are large compared with the other schedulings but they are less than a few minutes.

Next, obtained clock schedules are evaluated by HSPICE simulation. The peak power waves of s1238 and s9234 obtained by HSPICE for clock schedules obtained by P&P and V&V are shown in Figs. 7 and 8, respectively. The peak power waves in c-frame are also shown in these figures.

In Table 6, the comparisons of peak power consumptions between the proposed estimation method and HSPICE simulation are shown. The correlation of the peak power wave by HSPICE simulation and our estimation is high on most circuits, so the validity of proposed peak power wave estimation method is confirmed.

In c-frame, the power consumption is very high at the time when the clock is inputted to registers, and the power consumption is near 0 at other unit time. So the peak power consumption is also very high in c-frame. While, since the switching timings of registers and gates are controlled by clock schedules, the peak power is reduced in g-frame. The peak power of circuit with clock schedule obtained by P&P is improved 63.9% from c-frame on the average.

Peak power waves obtained by our estimation and HSPICE of s9234 with clock schedules obtained by P&P and V&V are shown in Figs. 9 and 10, respectively. The peak power consumption can not be accurately estimated by our method in a few cases. For example, the estimation and HSPICE simulation shown in Fig. 10 have less relation than those shown in Fig. 9. Figure 10 shows that the proposed clock scheduling works well since the peak power wave by the proposed estimation is flat. However, the peak power wave by HSPICE simulation is not flat. This shows that the error of our proposed estimation becomes large in some cases. Errors are caused since our estimation is probability based and since special cases are not captured enough. The accuracy improvement without increase of the computation

**Table 6** Comparisons of peak power consumptions between estimation and HSPICE.

|         | P     |        |       |       | P&P V  |       |       | V      | V V&V |       |        |        |
|---------|-------|--------|-------|-------|--------|-------|-------|--------|-------|-------|--------|--------|
| circuit | esti. | HSPICE | corr.  |
|         | [mW]  | [mW]   |       | [mW]  | [mW]   |       | [mW]  | [mW]   |       | [mW]  | [mW]   |        |
| s1238   | 14.72 | 19.95  | 0.939 | 14.55 | 25.51  | 0.941 | 18.03 | 18.81  | 0.874 | 14.72 | 18.44  | 0.896  |
| s1423   | 29.71 | 27.58  | 0.678 | 29.69 | 27.58  | 0.678 | 30.54 | 26.35  | 0.718 | 29.71 | 38.88  | 0.750  |
| s5378   | 24.42 | 39.68  | 0.872 | 20.22 | 36.84  | 0.058 | 24.26 | 44.30  | 0.451 | 21.10 | 44.79  | -0.012 |
| s9234   | 43.35 | 82.78  | 0.927 | 39.31 | 79.81  | 0.887 | 39.78 | 71.24  | 0.203 | 39.45 | 74.78  | 0.212  |

esti.: peak power consumption obtained by the estimation HSPICE: peak power consumption obtained by HSPICE

corr.: correlation between the peak power waves of estimation and HSPICE

time in estimation is desired, but it is in our future works.

Finally, we show the necessity of the consideration of the power consumption of combinatorial circuit in clock scheduling. In our proposed method, the power wave of an element in combinatorial circuit is shifted according to a clock schedule. While, in this experiment, the power waves of registers are shifted according to a clock schedule, but the



**Fig. 9** Peak power waves of s9234 by our estimation and HSPICE (clock schedule obtained by P&P).



Fig. 10 Peak power waves of s9234 by our estimation and HSPICE (with a clock schedule obtained by V&V).

power waves of elements in combinatorial circuit are fixed. The peak power consumptions of clock schedules obtained by fixing the power consumption of combinatorial circuit are shown in Table 7. The peak power consumptions are not reduced enough. This fact shows the necessity of the consideration of the power consumption of combinatorial circuit in clock scheduling.

#### 6. Conclusions

In this paper, we propose a fast peak power wave estimation method for clock scheduling and clock scheduling methods for peak power reduction. In experiment, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed peak power wave estimation method, the proposed clock scheduling methods find clock schedules for benchmark circuits that greatly reduce the peak power consumption in a few minutes.

As a future work, an improvement of the accuracy of peak power wave estimation without increasing the computation time is desired. In this paper, the power consumed by a clock tree is not taken into account in the proposed power wave estimation. Although any clock schedule can be realized by a clock tree, the power consumption of the clock tree would become very large and unacceptable if the clock schedule is determined without considering layout information. A clock scheduling method that reduces both the peak power consumption and the total power consumption should be perused.

#### Acknowledgments

This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc., Cadence Design Systems, Inc., Rohm Corporation and Toppan Printing Corporation. This research was partially supported by Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for JSPS Fellows 19·6015, 2007.

**Table 7** The peak power consumption by the estimation in which the power wave of each element in combinatorial circuit is fixed.

|         |        | P&P    |       | V&V    |        |        |  |  |
|---------|--------|--------|-------|--------|--------|--------|--|--|
| circuit | esti.  | HSPICE | corr. | esti.  | HSPICE | corr.  |  |  |
|         | [mW]   | [mW]   |       | [mW]   | [mW]   |        |  |  |
| s1238   | 49.98  | 42.59  | 0.756 | 50.51  | 29.79  | 0.471  |  |  |
|         |        | (2.13) |       |        | (1.62) |        |  |  |
| s1423   | 41.76  | 37.08  | 0.804 | 41.76  | 33.37  | -0.022 |  |  |
|         |        | (1.34) |       |        | (0.86) |        |  |  |
| s5378   | 141.28 | 44.35  | 0.506 | 141.28 | 45.04  | -0.530 |  |  |
|         |        | (1.12) |       |        | (1.01) |        |  |  |
| s9234   | 188.28 | 108.98 | 0.296 | 188.28 | 94.11  | 0.345  |  |  |
|         |        | (1.32) |       |        | (1.26) |        |  |  |

esti.: peak pov HSPICE: peak pov

peak power consumption obtained by the estimation peak power consumption obtained by HSPICE

corr.: correlation between the peak power waves of estimation and HSPICE

(): the ratio of the peak power consumption by this estimation to that by our estimation

#### References

- Y. Takahashi, Y. Kohira, and A. Takahashi, "A fast clock scheduling for peak power reduction in LSI," GLSVLSI, pp.582–587, 2007.
- [2] J. Fishburn, "Clock skew optimization," IEEE Trans. Comput., vol.39, no.7, pp.945–951, 1990.
- [3] R.B. Deoker and S.S. Sapatneker, "A graph-theoretic approach to clock skew optimization," ISCAS, pp.407–410, 1994.
- [4] A. Takahashi and Y. Kajitani, "Performance and reliability driven clock scheduling of sequential logic circuits," ASP-DAC'97, pp.37– 43, 1997.
- [5] K. Kurokawa, T. Yasui, Y. Matsumura, M. Toyonaga, and A. Takahashi, "A high-speed and low-power clock tree synthesis by dynamic clock scheduling," IEICE Trans. Fundamentals, vol.E85-A, no.12, pp.2746–2755, Dec. 2002.
- [6] A. Takahashi, "Practical fast clock-schedule design algorithms," IEICE Trans. Fundamentals, vol.E89-A, no.4, pp.1005–1011, April 2006.
- [7] P. Vuilod, L. Benini, A. Bogliolo, and G. DeMIcheli, "Clock-skew optimization for peak current reduction," ISLPED, pp.265–270, 1996.
- [8] W.C.D. Lam, C.K. Koh, and C.W.A. Tsao, "Clock scheduling for power supply noise suppression using genetic algorithm with selective gene therapy," ISQED, pp.327–332, 2003.



Atsushi Takahashi received his B.E., M.E., and D.E. degrees in electrical and electronic engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1989, 1991, and 1996, respectively. He had been with the Tokyo Institute of Technology as a research associate from 1991 to 1997 and has been an associate professor since 1997. He visited University of California, Los Angeles, U.S.A., as a visiting scholar from 2001 to 2002. He is currently with Department of Communications and Integrated Systems, Grad-

uate School of Science and Engineering, Tokyo Institute of Technology. His research interests are in VLSI layout design and combinational algorithms. He is a member of IEEE and IPSJ.



Yosuke Takahashi received his B.E. degree from University of Tsukuba, Tsukuba, Japan, in 2005. He received his M.E. degree from Tokyo Institute of Technology, Tokyo, Japan, in 2007. He is currently with Simplex Technology, Inc. His research interests are in VLSI design automation and combinational algorithms.



Yukihide Kohira received his B.E., M.E., and D.E. degrees from Tokyo Institute of Technology, Tokyo, Japan, in 2003, 2005, and 2007, respectively. He is currently a researcher of Department of Communications and Integrated Systems in Tokyo Institute of Technology. His research interests are in VLSI design automation and combinational algorithms. He is a member of IEEE and IPSJ.