# Voltage and Level-Shifter Assignment Driven Floorplanning 

Bei YU ${ }^{\dagger \mathrm{a})}$, Nonmember, Sheqin DONG ${ }^{\dagger \mathrm{b})}$, Song CHEN ${ }^{\dagger \dagger}$, Members, and Satoshi GOTO ${ }^{\dagger \dagger}$, Fellow


#### Abstract

SUMMARY Low Power Design has become a significant requirement when the CMOS technology entered the nanometer era. Multiple-Supply Voltage (MSV) is a popular and effective method for both dynamic and static power reduction while maintaining performance. Level shifters may cause area and Interconnect Length Overhead(ILO), and should be considered at both floorplanning and post-floorplanning stages. In this paper, we propose a two phases algorithm framework, called VLSAF, to solve voltage and level shifter assignment problem. At floorplanning phase, we use a convex cost network flow algorithm to assign voltage and a minimum cost flow algorithm to handle level-shifter assignment. At post-floorplanning phase, a heuristic method is adopted to redistribute white spaces and calculate the positions and shapes of level shifters. The experimental results show VLSAF is effective. key words: Voltage-Island, Voltage Assignment, Convex Network Flow, Level Shifter Assignment, White Space Redistribution


## 1. Introduction

Low Power Design has become a significant requirement when the CMOS technology entered the nanometer era. On the one hand, hundreds of millions of transistors can be integrate on the same chip by using system-on-chip(SoC) design methodologies. On the other hand, the shrinking feature sizes and increasing circuit speed cause higher power consumption, which not only shorten the battery life for handheld devices but also lead to thermal and reliability problems.

Many techniques were introduced to deal with power optimization. Among the existing techniques, MSV is a popular and effective method for both dynamic and static power reduction while maintaining performance. In the MSV design, one of the most important problem is voltage assignment: timing critical modules are assigned to higher voltage while noncritical modules are assigned to lower voltage, so the power can be saved without degrading the overall circuit performance.

[^0]Level-shifter [1] has to be inserted to an interconnect when a low voltage module drives a high voltage module or a circuit may suffer from excessive shortcircuit current and leakage energy. From [5] we can observe that the number of level shifters increase rapidly as modules increase and the area level-shifters consume can not be ignored. As a result, level-shifters may cause area and performance overhead, and should be considered during floorplanning and post-floorplanning stages.

There are a number of works addressing island generation and voltage assignment in floorplanning and placement. Among these works, voltage assignment is considered at various stages, including prefloorplanning[4], [5]; during floorplanning[6]-[8]; and post-floorplaning / post-placement [9]-[12].

Lee et al.[5] handle voltage assignment by dynamic programming, and level shifters are inserted as soft block according to the voltage assignment result at prefloorplanning stage. Then power network resource are considered during floorplanning. However, there are some deficiencies in the work: first, voltage assignment is handled before floorplanning, so physical information such as the distances among modules are not able to be taken into account; secondly, the search space is large if level-shifters are considered as a module.

An approach based on ILP is used in [10] for voltage assignment at the post-floorplanning stage. Levelshifter planning and power-network resources are considered. However, their approach does not consider level-shifter's area consumption and relies on the floorplanning result.

To make use of physical information such as the length of interconnects among modules, voltage assignment problem should be addressed during floorplanning. Ma et al.[8] transform voltage assignment problem into a convex cost network flow problem, and integrate it into floorplanning stage. However, their approach consider neither level-shifters' area overhead nor level-shifters' positions.

The remainder of this paper is organized as follows. Section 2 defines the voltage-island driven floorplanning problem. Section 3 presents our algorithm flow. Section 4 reports our experimental results. At last, Section 5 concludes this paper.

## 2. PROBLEM FORMULATION

In this paper, we use CBL[3] to represent every floorplan generated. CBL is a topological representation dissecting the chip into rectangular rooms, and each room is assigned at most one module. Besides, all the nets are two-pin nets, and multi-pin nets can be decomposed into a set of source-sink two-pin nets. The wire length of every net is calculated by half-perimeter model.

Definition 1 (Interconnect Length Overhead): Each level-shifter belongs to a net, we assume that a level shifter can always be inserted in the net's bounding box. However, if level-shifter is outside net's bounding box, its net's interconnect length would increase. The increased length is denoted as Interconnect Length Overhead (ILO).
Definition 2 (Power Network Resource): The power network resource of a voltage island is evaluated by the half perimeter wirelength of the minimal bounding box enclosing the island.

For every candidate floorplan, to meet the performance constraint, timing-critical modules are assigned a high voltage, and the other non-timing-critical modules are assigned a lower voltage to maximize power saving. Besides, each level-shifter is assigned to a rough position to minimize interconnect length overhead. We refer to the problem as the Voltage and Level-Shifter Assignment driven Floorplanning (VLSAF).

Problem 1: (VLSAF) We are given

1) A set of m modules: $N=\left\{n_{1}, n_{2}, \ldots, n_{m}\right\}$. Each module $n_{i}$ is hard block(fixed size and aspect ratio), and is given $k$ legal working voltages, and power -delay tradeoff is represented as a delay-power curve (DP-curve, as shown in Fig.3).
2) A netlist, which can be denoted as a directed acyclic $\operatorname{graph}(\mathrm{DAG}), \hat{G}=(\hat{V}, \hat{E})$, where $\hat{V}=\left\{n_{1}, n_{2}, \ldots\right.$, $\left.n_{m}\right\}$, and $e(i, j) \in \hat{E}$ denotes an interconnect from $n_{i}$ to $n_{j}$.
3) A timing constraint $T_{\text {cycle }}$.
4) Level-shifter's area, power and delay.

After VLSAF, a chip floorplanning is generated to meet several objectives: First, minimize the area and power cost. Secondly, satisfying timing constraint. Third, insert all the level-shifters in need and minimize the wire length and the interconnect length overhead.

## 3. VLSAF Algorithm

### 3.1 Overview of VLSAF

As shown in Fig.1, algorithm VLSAF consists of two


Fig. 1 Overall of VLSAF
phases: (I)voltage and level-shifters assignment during floorplanning, (II) White Space Redistribution(WSR) at post-floorplanning.

In Phase I, we modify the model in [8] to handle voltage assignment and present a Min-Cost MaxFlow based method to solve the level-shifters assignment problem. When generate a new packing, we carry out voltage and level-shifter assignment. After voltage assignment(VA), each module is assigned a voltage to reduce power consumption as much as possible yet satisfies the performance constraint. After level-shifter assignment(LSA), as many level-shifters as possible are assigned a room. Level shifters which can not assigned are belong to set $E L S$ (detail in 3.4) and will cause some Interconnect Length Overhead(ILO).

In Phase II, a heuristic method is adopted to calculate every module's relative position in room. Besides, every room's white space is divided into grids, and each level-shifter is decided its aspect ratio and inserted to a grid. Finally, if a level-shifter can not assign a room in LSA, it can be inserted into a room in order to reduce interconnect length overhead(ILO).

### 3.2 Voltage Assignment of Two Voltages

During floorplanning, when a new floorplan is generated, we can estimate the interconnect length between module i and module j , denoted as $l e n_{i j}$. Similar to [8], $l e n_{i j}$ can be scaled to delay delay $i_{i j}$ according to delay $_{i j}=\delta \times$ len $_{i j}$, where $\delta$ is a constant scaling factor. We check every delay ${ }_{i j}$, if delay ${ }_{i j} \geq T_{\text {cycle }}$, then time constraint can not be satisfied, so another new floorplan is generated. Otherwise we carry out voltage assignment.

Given netlist $\hat{G}=(\hat{V}, \hat{E})$, voltage assignment problem can be formulated as (1):

$$
\begin{align*}
& \text { Minimize } \sum_{i \in \hat{V}} P_{i}\left(d_{i}\right)  \tag{1}\\
& \text { s.t. } \begin{cases}\mu_{j}-\mu_{i} \geq \text { delay }_{i j}+d_{i} & \forall e(i, j) \in \hat{E} \\
d_{i} \in\left\{d_{i}^{1}, d_{i}^{2}, \ldots d_{i}^{k}\right\} & \forall i \in \hat{V} \\
0 \leq \mu_{i} \leq T_{\text {cycle }} & \forall i \in \hat{V}\end{cases} \tag{1a}
\end{align*}
$$

where $\mu_{i}$ is the arrival time of vertex $i$ in DAG, and $d_{i}$ is the delay of vertex $i$.


Fig. $2($ a) $\bar{G}=\{\bar{V}, \bar{E}\}$, after adding nodes $s, t$ and diving nodes $N_{i}$ into $I_{i}$ and $O_{i}$ (b)Transformed $\bar{G}=\{\bar{V}, \bar{E}\}$ by adding edge $e(s, t)$ to remove constraint $\mu_{t}-\mu_{s} \leq T_{\text {cycle }}$ in equation (2).

### 3.2.1 Two Legal Working Voltages Assignment

When there are only two legal working voltages, we transform $\hat{G}$ into $\bar{G}=(\bar{V}, \bar{E})$. First, a start node $s$ and an end node $t$ are added to $\hat{V}, s$ interconnect the nodes whose in-degree are zero, and nodes with zero out-degree interconnect $t$. We set $\bar{V}=$ $\{s, t\} \cup \hat{V}=\left\{s, t, n_{1}, n_{2}, \ldots, n_{m}\right\}$. Besides, $n_{i}(i=$ $1, \ldots, m)$ are divided into two nodes: $I_{i}$ and $O_{i}$, so $\bar{V}=\left\{s, t, I_{1}, O_{1}, I_{2}, O_{2}, \ldots, I_{m}, O_{m}\right\}$. And $I_{i}$ is connected to $O_{i}$ by a directed edge. We denote these new created edges $\left\{e\left(I_{i}, O_{i}\right) \mid I_{i}, O_{i} \in \bar{V}\right\}$ as $\bar{E}_{1}$, denote edges $\left\{e\left(s, I_{k}\right) \mid I_{k} \in \bar{V}\right\}$ as $\bar{E}_{3}$, and other edges as $\bar{E}_{2}$, and $\bar{E}=\bar{E}_{1} \cup \bar{E}_{2} \cup \bar{E}_{3}$. The DAG $\bar{G}=(\bar{V}, \bar{E})$ is shown in Fig. 2 (a).

The mathematical program is in (2), where $d_{i j}$ is delay from node i to node j .

$$
\begin{align*}
& \text { Minimize } \sum_{e(i, j) \in \bar{E}} P_{i j}\left(d_{i j}\right) \\
& \text { s.t. } \begin{cases}\mu_{j}-\mu_{i} \geq d_{i j} & \forall e(i, j) \in \bar{E} \\
\mu_{t}-\mu_{s} \leq T_{c y c l e} & \forall(i, j) \in \bar{E}_{1} \\
d_{i j} \in\left\{d_{i j}^{1}, d_{i j}^{2}\right\} & \forall e(i, i, j) \in \bar{E}_{2} \\
d_{i j}=\operatorname{del} a y_{i j} & \forall e(i, j) \\
d_{i j}=0 & \forall e(i, j) \in \bar{E}_{3}\end{cases} \tag{2a}
\end{align*}
$$

Compare with [8], which has more constraints as follows:

$$
\begin{cases}0 \leq \mu_{i} \leq T_{\text {cycle }} & \forall i \in \bar{V} \\ l_{i j} \leq d_{i j} \leq u_{i j} & \forall e(i, j) \in \bar{E}\end{cases}
$$

we introduce some modifications. First, timing constraint used to be estimated as $T_{\text {cycle }}-L \times d_{l s}$, where $L$ is the longest path in DAG. To reduce tolerance of timing constraint, in module's DP-curve, we add $d_{l s}$ to lower voltage's delay and add $p_{l s}$ to lower voltage's power(as shown in Fig. 3), and time constraint can be set as $T_{\text {cycle }}$. Since there are only two possible supply voltages, power function $P_{i j}\left(d_{i j}\right)$ still be convex function. Secondly, we add start node $s$ and end node $t$ to remove constraint $0 \leq t_{i} \leq T_{\text {cycle }}$. Third, since DPcurve is a linear function, in other word, for $e(i, j) \in E_{1}$, $d_{i j}$ has only two choices: $d_{i j}^{1}$ and $d_{i j}^{2}$. We can prove


Fig. 3 For a module, when number of legal working voltages is 2, (a)original DP-curve, (b)modified DP-curve, adding the power and delay of level-shifter.
later that we can solve the program optimally even if we remove the constraint $l_{i j} \leq d_{i j} \leq u_{i j}$.

We can incorporate constraints (2b) and (2a) by transforming (2b) into $\mu_{s}-\mu_{t} \geq-T_{\text {cycle }}$, and define $d_{s t}$, s.t. $\mu_{t}-\mu_{s}=d_{s t} \quad \& \quad d_{s t} \leq T_{\text {cycle }}$. Accordingly, $\bar{E}_{3}=\left\{\bar{E}_{3} \cup e(s, t)\right\}$, and the transformed DAG $\bar{G}$ is shown in Fig.2(b). Besides, we dualize the constraints (2a) using a nonnegative Lagrangian multiplier vector $\bar{x}$, obtaining the following Lagrangian subproblem:

$$
\begin{equation*}
L(\vec{x})=\min \sum_{e(i, j) \in \bar{E}}\left[P_{i j}\left(d_{i j}\right)+x_{i j} d_{i j}\right]+\sum_{i \in \bar{V}} x_{s i} \mu_{i} \tag{3}
\end{equation*}
$$

We set $V=\bar{V}$, remove $e(i, j) \in E_{3}$, and add an edge $e(s, i)$ for each node $i \in V$. The newly edges are denoted as $E_{3}$, and $E_{1}=\bar{E}_{1}, E_{2}=\bar{E}_{2}$. Now $E=$ $E_{1} \cup E_{2} \cup E_{3}$, and the transformed DAG is denoted as $G=(V, E)$.

For every $e(s, i) \in E_{3}$, we set $d_{s i}=\mu_{i}, P_{s i}\left(d_{s i}\right)=$ $0, l_{s i}=0, u_{s i}=\left\{\begin{array}{ll}K, & \text { if } i \neq t \\ T_{\text {cycle }}, & \text { if } \quad i=t,\end{array}\right.$, where $k$ is a huge coefficient.

We define function $H_{i j}\left(x_{i j}\right)$ for each $e(i, j) \in E$ as follows: $H_{i j}\left(x_{i j}\right)=\min _{d i j}\left\{P_{i j}\left(d_{i j}\right)+x_{i j} d_{i j}\right\}$.

For the $e(i, j) \in E_{1}$, because $P_{i j}\left(d_{i j}\right)$ is linear function

$$
\begin{equation*}
P_{i j}\left(d_{i j}\right)=-k \times d_{i j}, \quad d_{i j} \in\left[d_{i j}^{1}, d_{i j}^{2}\right] \tag{4}
\end{equation*}
$$

where $k \geq 0$ and $-k$ denotes slope of the function, $k=\frac{P_{i j}\left(d_{i j}^{1}\right)-P_{i j}\left(d_{i j}^{2}\right)}{d_{i j}^{2}-d_{i j}^{d}}$.

$$
\begin{align*}
H_{i j}\left(x_{i j}\right) & =\min \left\{\left(x_{i j}-k\right) \times d_{i j}\right\} \\
& =\left\{\begin{array}{l}
\left(x_{i j}-k\right) \times d_{i j}^{2} 0 \leq x_{i j} \leq k \\
\left(x_{i j}-k\right) \times d_{i j}^{1} k \leq x_{i j}
\end{array}\right. \\
& =\left\{\begin{array}{l}
P_{i j}\left(d_{i j}^{2}\right)+d_{i j}^{2} x_{i j} 0 \leq x_{i j} \leq k \\
P_{i j}\left(d_{i j}^{J}\right)+d_{i j}^{j} x_{i j} \leq \leq x_{i j}
\end{array}\right. \tag{5}
\end{align*}
$$

For the $e(i, j) \in E_{2}, H_{i j}\left(x_{i j}\right)=d_{i j} x_{i j}, x_{i j} \geq 0$.
For the $e(i, j) \in E_{3}, H_{i j}\left(x_{i j}\right)=\left\{\begin{array}{ll}K_{j} \times x_{i j} & x_{i j} \leq 0 \\ 0 & x_{i j} \geq 0\end{array}\right.$, where $K_{j}=T_{\text {cycle }}$ if $j=t$; and if $j \neq t, K_{j}$ equals $K$.

To transform the problem into a minimum cost flow problem, we construct an expanded network $G^{\prime}=$ $\left(V^{\prime}, E^{\prime}\right)$. There are three kinds of edges to consider:

- $e(i, j)$ in E1:we introduce 2 edges in $G^{\prime}$, and the costs of these edges are: $-d_{i j}^{2},-d_{i j}^{1}$; upper capacities: $k, M-k$; lower capacities are both 0 .
- $e(i, j)$ in E2: cost, lower and upper capacity is $-d_{i j}, 0, \mathrm{M}$.
- Edge in E3: two edges are introduced in $G^{\prime}$, one with cost, lower and upper capacity as $\left(-K_{j},-M, 0\right)$, another is $(0,0, M)$.

Using the cost-scaling algorithm, we can solve the minimum cost flow problem in $G^{\prime}$. For the given optimal flow $x^{*}$, we construct residual network $G\left(x^{*}\right)$ and solve a shortest path problem to determine shortest path distance $d(i)$ from node $s$ to every other node. By implying that $\mu(i)=d(i)$ and $d_{i j}=\mu(i)-\mu(j)$ for each $e(i, j) \in E_{1}$, we can finally solve voltage assignment problem.

### 3.2.2 Multi-Voltage Assignment

When number of legal working voltages is more than two, we can solve voltage assignment in a similar method.

Definition 3 (LS-DP-Curve): The power-delay tradeoff of level shifter is represented by a LS-DP-Curve $\left\{\left(d_{l s} 1, p_{l s} 1\right),\left(d_{l s} 2, p_{l s} 2\right),\left(d_{l s} 3, p_{l s} 3\right)\right\}$, where each pair $\left(d_{l s} i, p_{l s} i\right)$ is the corresponding delay and power consumption when level shifter is driving from module at voltage $i$.

When a module is at voltage 1 ( the most high voltage ), it does not need level shifter to drive other modules, $d_{l s} 1=p_{l s} 1=0$. Lower voltage module needs bigger level shifter to drive other modules. Since dynamic energy consumption is proportional to the square of the supply voltage, it is trival that power increases rapidly than delay. We assume the LS-DP-Curve is convex.

For each module, we modify its DP-Curve: replace each pair $\left(d_{i}, p_{i}\right)$ by $\left(d_{i}+d_{l s} i, p_{i}+p_{l s} i\right)$, where $\left(d_{l s} i, p_{l s} i\right)$ is level shifter's delay and power consumption.
LEMMA 1: $f(x)$ is convex $\Longleftrightarrow f\left(x_{1}+x_{2}\right)<$ $\frac{f\left(x_{1}\right)+f\left(x_{2}\right)}{2}, \forall x_{1}, x_{2} \in Z$.
LEMMA 2: If $f(x)$ and $g(x)$ are convex, then $P(x)=f(x)+g(x)$ is also convex.

Using lemma 1 and lemma 2, we can prove that modified DP-Curve is piecewise linear convex function with integer breakpoints, and we can apply similar method like 3.2.1 to solve voltage assignment problem.

### 3.3 Level Shifters Assignment

After voltage assignment, every module is assigned a

Table 1 Notation used in LS Assignment

| $m$ | \# of modules |
| :--- | :--- |
| $n_{l s}$ | \# of level-shifters in need |
| $R$ | Set of rooms, $R=\left\{r_{1}, r_{2}, \ldots, r_{m}\right\}$ |
| $r_{j}$ | Room containing module $j$ |
| $w s_{j}$ | White space in $r_{j}$ |
| $L S_{i j}$ | Set of LSs with same source $i$ and same sink $j$ |
| $s i z e_{i j}$ | \# of level shifters in $L S_{i j}$ |
| $p w s_{j}$ | Potential white space in room $r_{j}$ |
| $w_{r j}\left(h_{r j}\right)$ | Width(Height) of room $r_{j}$ |
| $w_{m j}\left(h_{m j}\right)$ | Width(Height) of module $n_{j}$ |
| $w_{i j}\left(h_{i j}\right)$ | Width(Height) of 1st Feasible Region $f r 1_{i j}$ |



Fig. 4 (a)No matter how to move the module, dark area can not insert level-shifter, while blank area is Potential White Space(PWS) of $R_{j}(\mathrm{~b}) 1$ st and 2nd Feasible Region of $F R_{i j}$.
voltage. Since each net driving from a low voltage module to a high voltage module should insert a level shifter, the number of level-shifters $n_{l s}$ is determined. To locate the $m$ modules, chip is dissected into set of rooms $R=\left\{r_{1}, r_{2}, \ldots, r_{m}\right\}$. Due to the restriction that level shifter cannot be placed on a module, the location must be within a white space. Besides, level shifter has nonzero area, it cannot be placed arbitrarily close to each other.

Here we carry out minimum cost flow based levelshifters assignment to try to assign every level-shifters one room. We define sets of level shifters $L S=$ $\sum_{i=1}^{n} \sum_{j=1}^{n} L S_{i j}(i=1, \ldots, n ; j=1, \ldots, n ; i \neq j)$, every set $L S_{i j}$ contain $s i z e_{i j}$ level shifters with same source module $i$ and the same sink module $j$, and $\sum_{i=1}^{n} \sum_{j=1}^{n} s i z e_{i j}=n_{l s}$.

To check whether a room has extra space to insert level-shifter, we denote the White Space in room $r_{j}$ as $w s_{j}$, whose area can be calculated as follow:

$$
\begin{equation*}
\operatorname{Area}\left(w s_{j}\right)=w_{r j} \times h_{r j}-w_{m j} \times h_{m j} \tag{6}
\end{equation*}
$$

where $w_{r j}\left(h_{r j}\right)$ denotes the width(height) of room $r_{j}$, $w_{m j}\left(h_{m j}\right)$ denotes the width(height) of module $n_{j}$.

Each level-shifter belongs to a net, and is inserted into white space. If white space is outside the net's bounding box, inserting level shifter may cause Interconnect Length Overhead(ILO), so each white space has its own cost for given level shifter. Since we assume all modules are hard blocks, some space of room must belong to a module(as shown in Fig.4(a), center dashed area can not insert level shifter no matter how to put the module).


Define cost of edge $e\left(L S_{i}, r_{j}\right), F_{i j}$ is a function of $p_{i j}$ :

$$
\begin{align*}
F_{i j}\left(p_{i j}\right)= & \left\lceil\frac{1}{p_{i j}+\mu}+\left(1-p_{i j}\right) \times k\right. \\
& \left.\times\left(\operatorname{Term}_{1} 1_{i j}+\operatorname{Term} 2_{i j}\right)\right\rceil \tag{9}
\end{align*}
$$

where $\mu$ is a small coefficient, $k$ is a undetermined coefficient and $\operatorname{Term} 1_{i j}, \operatorname{Term} 2_{i j}$ is penalty terms, and $\operatorname{Term} 1_{i j}=\left\{\begin{array}{ll}\frac{h_{r j}-h i j}{w_{c j}}, & w_{c j} \neq 0 \\ 0, & w_{c j}=0\end{array}, \operatorname{Term} 2_{i j}=\right.$ $\left\{\begin{array}{ll}\frac{w_{r j}-w_{i j}}{h_{c j}}, & h_{c j} \neq 0 \\ 0, & h_{c j}=0\end{array}\right.$.

Equation (9) has some special characters. First, it is a monotonically decreasing function of $p_{i j}$, which means we are inclined to put level-shifter in the room which has higher percentage of 1st $F R$. Besides, it can not be too large even $f r 1_{i j}$ is very small, so we add coefficient $\mu$ and $\max F_{i j}\left(p_{i j}\right) \simeq\left\lceil\frac{1}{\mu}\right\rceil$. Third, we observe that even two room have the same $p_{i j}$ and $p_{i j} \leq 1$, if level shifter is inserted in $f r 2_{i j}$, the room has longer $\operatorname{fr} 2_{i j}$ may cause longer length. Consequently, in equation (9), we add the penalty term $\operatorname{Term} 1_{i j}$ and Term $2_{i j}$.

It can be shown that any flow in the network $G^{*}$ assigns level shifters to white spaces (given by the saturated edges between the level shifters $L S_{i}$ 's and the white space nodes $w s_{j}$ 's). Although level shifter assignment is similar to buffer assignment, each net has at most one level shifter to insert and it can be solved effectively by minimum cost flow algorithm(run in polynomial time[13]).

### 3.4 White Space Redistribution (WSR)

During floorplanning, voltage assignment and level shifter assignment are carried out for each candidate solution. Best solution that satisfies constraints and inserts most level shifters would be stored. After floorplanning, most level-shifters can be assigned to rooms in stored best solution. We define $E L S$ a set which contains level-shifters that can not be assigned to any room. In room $r_{j}$, we define the module to pack as $n_{j}$, and a group of level shifters to insert as $L s_{j}=$ $\left\{l s_{1}, l s_{2}, \ldots, l s_{p i}\right\}$. Follow condition must be satisfied:

$$
\operatorname{Area}\left(n_{j}\right)+\sum_{k=1}^{p i} \operatorname{Area}\left(l s_{k}\right) \leq \operatorname{Area}\left(r_{j}\right)
$$

Traditional room-based floorplanner will pack the modules at the lower-left corner or the center of the rooms. Different from the traditional block planning method, to favor the level-shifters insertion, a heuristic method ( called WSR) is adopted to calculate modules' and level-shifters' relative positions in rooms. The framework of algorithm WSR is shown in Algorithm 1.

```
Algorithm 1 (WSR)
    for \(j=1\) to \(m\) do
        \(p j \leftarrow \operatorname{sizeof}\left(L s_{j}\right) ;\)
        \(F_{\text {right }} \leftarrow 0, F_{\text {left }} \leftarrow 0, F_{\text {up }} \leftarrow 0, F_{\text {down }} \leftarrow 0 ;\)
        for \(i=1\) to \(p j\) do
            calculate \(F_{i x}\) and \(F_{i y}\);
            update \(F_{\text {right }}, F_{\text {left }}, F_{\text {up }}, F_{\text {down }}\);
        end for
        calculate \(X_{n}\) and \(Y_{n j} ; /\) *Relative Position*/
        generate grids \(G_{j}\) in white space;
        sort \(L s_{j}\) by priority;
        for \(i=1\) to \(p j\) do
            pick one grid to insert \(l s_{i} ; /\) Level shifter insertion*/
        end for
    end for
    InsertELS();
    for \(j=1\) to \(m\) do
        move modules \(n_{j}\) under demand of Power Network;
    end for
```


### 3.4.1 Relative Position Calculation

If a level-shifter $l s_{i}$ is assigned into room $r_{j}$, a prefer region is provided. If $l s_{i}$ is inserted in the prefer region, then interconnect would not lengthen. For each level-shifter to insert in room $r_{j}$, a force is produced to push the module $n_{j}$ apart from the level-shifter. We consider the force produced by $l s_{i}$ in x- and y-direction separately, denoted as $F_{i x}$ and $F_{i y}$. For example, as shown in Fig. 6(a), if $l s_{i}$ prefers to locate in the lowerleft corner of $r_{j}$, then $F_{i x}$ pushes $n_{j}$ to right and $F_{i y}$ pushes $n_{j}$ to upper. To calculate $F_{i x}$ and $F_{i y}$, prefer area is defined as a quaternion $\left(w_{1 i j}, w_{2 i j}, h_{1 i j}, h_{2 i j}\right)$, where $w_{1 i j}\left(w_{2 i j}\right)$ is the distance from prefer area to left(right) boundary of $r_{j}, h_{1 i j}\left(h_{2 i j}\right)$ is the distance from prefer area to upper(lower) boundary of $r_{j}$, as shown in Fig. 6(b).
$F_{i x}$ and $F_{i y}$ can be calculated as equation (10).

$$
\begin{equation*}
F_{i x}=\frac{w_{2 i j}-w_{1 i j}}{w_{r j}}, \quad F_{i y}=\frac{h_{2 i j}-h_{1 i j}}{h_{r j}} \tag{10}
\end{equation*}
$$

To calculate the position of module $n_{j}$, we define four variables $F_{\text {right }}, F_{l e f t}, F_{u p}, F_{\text {down }}$ as follows:

Relative position of $n_{j}$ in room $r_{j}$ is denoted as $\left(X_{n j}, Y_{n j}\right)$, then $X_{n j}=\frac{\left(w_{r j}-w_{m j}\right) \times F_{r i g h t}}{F_{r i g h t}-F_{l e f t}}$ and $Y_{n j}=$ $\frac{\left(h_{r j}-h_{m j}\right) \times F_{u p}}{F_{u p}-F_{\text {down }}}$.

### 3.4.2 Grids Generation and LS Insertion

Definition 6 ( $l$-bounding box): Given a level shifter $l s_{k}$, we define the bounding box of $l s_{k}$ 's net as $B$, whose width is $w i d_{B}$ and height is $h e i_{B}$. The $l$-bounding box


Fig. 6 In room $r_{j}$, (a)if level-shifter $l s_{i}$ prefers to locate in lower-left corner (dark area is prefer region), then $l s_{i}$ produces forces $\left(F_{i x}, F_{i y}\right)$ to pushes module $n_{j}$ upper and right. (b) $w_{1 i j}, w_{2 i j}, h_{1 i j}, h_{2 i j}$ are defined to calculate forces $\left(F_{i x}, F_{i y}\right)$.


Fig. $7 \quad B_{l}$ is $l$-bounding box of $B$.
of $B$ is $B_{l}$, which has the same centric position. Besides, width of $B_{l}$ is $\left(w i d_{B}+2 \times l\right)$ and height is $\left(h e i_{B}+2 \times l\right)$ (as shown in Fig.7).

In room $r_{j}$, after calculating module $n_{j}$ 's relative position, at most four rectangular white spaces are generated. We divide each white spaces into rectangular grids, whose area are all $a_{l s}$. So room $r_{j}$ records a set of grids $G_{j}=\left\{g_{1}, g_{2}, \ldots, g_{m}, m \times a_{l s} \leq\right.$ $\left.\operatorname{Area}\left(r_{j}\right)-\operatorname{Area}\left(n_{j}\right)\right\}$, and each grid has its position. Level-shifters in set $L s_{j}$ are sorted by area of prefer region. Smaller prefer region, higher priority. Then each level-shifter picks one grid in order.

After every level-shifter assigned choosing a grid, each level shifter $l s_{k}$ in ELS chooses one free grid to insert(as shown in Algorithm 2).

```
Algorithm 2 InsertELS()
    Initialize \(l=0\), step;
    while \(E L S\) is not empty do
        \(l \leftarrow l+\) step \(;\)
        num \(\leftarrow E L S\).size ();
        for \(i=1\) to num do
            Generate \(l\)-bounding box of \(l s_{i}\);
            Find all free grids inside \(l\)-bounding box;
        end for
        Construct bipartite graphs;
        Solve maximum bipartite matching;
        Update \(E L S\);
    end while
```

Given $l$, for each level shifter $l s_{k}$ in $E L S$, we construct a $l$-bounding box, called $B_{l}^{k}$ (step 6 ). Then we find all free grids in $B_{l}^{k}$ (step 7). In step 9, we construct bipartite graphs, then we use Hungarian algorithm to find maximum bipartite matching, which takes $\mathrm{O}(\mathrm{mn})$

Table 2 The Comparison Between the VLSAF and the Previous Work

| Benchmark | Max Power | Power Cost |  | PNR |  | LS Number |  | W.S(\%) |  | Time(s) |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | [5] | VLSAF | [5] | VLSAF | [5] | VLSAF | [5] | VLSAF | [5] | VLSAF |
| n10 | 216841 | 216840 | 189142 | 965 | 1007 | 0 | 9 | 4.87 | 9.44 | 6.001 | 3.24 |
| n30 | 205650 | 190717 | 146483 | 1369 | 1436 | 57 | 25 | 9.03 | 11.32 | 115.07 | 35.11 |
| n50 | 195140 | 172884 | 135316 | 1514 | 1460 | 119 | 114 | 21.10 | 16.66 | 569.36 | 116.97 |
| n100 | 180022 | 179876 | 123526 | 1671 | 1354 | 92 | 153 | 34.07 | 26.71 | 1768 | 688.13 |
| n200 | 177633 | 174818 | 130050 | 2040 | 1763 | 399 | 203 | 46.52 | 29.66 | 4212 | 1969.12 |
| n300 | 273499 | 219492 | 234389 | 2147 | 1997 | 452 | 337 | 44.10 | 37.74 | 4800 | 2392.8 |
| Avg | - | 192438 | 159818 | 1617.7 | 1502.8 | 186 | 140.2 | 26.61 | 21.92 | 1911.74 | 857.56 |
| Diff | - | - | -17\% | - | -7.2\% | - | -24.7\% | - | -17.6\% | - | -55.2\% |

Table 3 VLSAF v.s. VAF + LSI

|  | Wire Length w. LS |  | ILO(\%) |  | W.S(\%) |  | Time(s) |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | VLSAF | VAF+LSI | VLSAF | VAF+LSI | VLSAF | VAF+LSI | VLSAF | VAF+LSI |
| n10 | 13552 | 17937 | 0.89 | 2.29 | 9.44 | 10.46 | 3.24 | 2.09 |
| n30 | 44225 | 43282 | 0.31 | 0.85 | 11.32 | 10.75 | 35.11 | 23.13 |
| n50 | 92678 | 95666 | 1.20 | 2.27 | 16.66 | 18.12 | 116.97 | 39.81 |
| n100 | 185622 | 191522 | 1.03 | 2.40 | 26.71 | 26.40 | 688.13 | 327.01 |
| n200 | 366003 | 365792 | 1.64 | 4.28 | 29.66 | 30.06 | 1969.12 | 1304.3 |
| n300 | 560042 | 600348 | 0.67 | 1.37 | 37.74 | 35.36 | 2392.8 | 1772.03 |
| Avg | 210404 | 219091 | 0.96 | 2.24 | 21.92 | 21.86 | 857.56 | 578.06 |
| Diff | - | $+4 \%$ | - | $+133 \%$ | - | $-0.3 \%$ | - | $-32.5 \%$ |

Table 4 Experimental Results with More Legal Working Voltage

|  | $k$ | Power <br> Cost | Wire <br> Length | LS <br> Num | ILO <br> $(\%)$ | W.S <br> $(\%)$ | Time <br> $(\mathrm{s})$ |  | $k$ | Power <br> Cost | Wire <br> Length | LS <br> Num | ILO <br> $(\%)$ | W.S <br> $(\%)$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| T10 | 3 | 163352 | 16386 | 10 | 0.13 | 11.58 | 3.03 | n 100 | 3 | 131394 | 180023 | 150 | 0.50 | 26.8 |
|  | 4 | 162794 | 16474 | 11 | 0.12 | 11.54 | 3.96 |  | 438.05 |  |  |  |  |  |
| $(\mathrm{~s})$ |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

time $^{\dagger}$ (step 10). In step 11 , we update $E L S$, and remove level shifters that have been inserted. If there still are level shifters in $E L S$, we update $l$ and go back to step 6.

After InsertELS(), in room $r_{j}$, if not all the grids are inserted by level shifter, module $n_{j}$ may remove. If $n_{j}$ is in lowest voltage, it removes toward left and down to reduce total area. Otherwise, it removes toward the center of power network to minimize power network resource.

## 4. EXPERIMENTAL RESULTS

We implemented algorithm VLSAF in the C++ programming language and executed on a Linux machine with a 3.0 GHz CPU and 1GB Memory. Fig. 8 shows the experimental results of the benchmarks n50 and n200. Blocks in the same voltage are nearly clustered together to reduce the power-network resource, and level shifters (small dark blocks) are inserted in white spaces. Cost function in simulated annealing is:

$$
\Phi=\lambda_{A} A+\lambda_{W} W+\lambda_{P} P+\lambda_{R} R+\lambda_{N} N
$$

[^1]where $A$ and $W$ represent the floorplan area and wire length; $P$ represents the total power consumption; $R$ represents the power network resource; and $N$ records the number of level shifters that can not be assigned.

The previous work [5] is the recent one in handling floorplanning problem considering voltage assignment and level-shifter insertion. To compare with [5], we performed our experiments on the same test cases, which are based on the GSRC benchmarks adding power and delay specifications. Table 1 shows comparisons between our experimental result and [5]. The column Power Cost means the actual power consumption, column PNR means power network resource consumption and the column W.S means white space. VLSAF can save $17 \%$ power and $7.2 \%$ PNR. The White Space and Run Time results show our framework is about 2X faster while white space can be saved by $17.6 \%$.

We further demonstrated the effectiveness of our approach by performing another contrastive approach VAF + LSI, which solves level-shifter assignment and insertion only at post-floorplan stage. Table 3 compares VLSAF and VAF+LSI. We can see that in VAF+LSI, although runtime is shorter (no iterative level shifter assignments during floorplanning), wire length and inter-


Fig. 8 Experimental results of n50 and n200 with two legal working voltages.
connect length overhead(ILO) are increased by $4 \%$ and $133 \%$. High ILO may cause delay estimation among modules inaccurate, or even lead to timing constraint violation. Accordingly, VLSAF is effective and significant with a reasonable more runtime.

Besides, we have done two sets of experiments in which the number of legal working voltages for each module is set three and four. The detailed results are listed in Table 4.

## 5. CONCLUSIONS

We have proposed a two phases framework to solve voltage assignment and level shifter insertion: phase one is voltage and level-shifter assignment driven floorplanning; phase two is white space redistribution at postfloorplanning stage. Experimental results have shown that our framework is effective in reducing power cost while considering level shifters' positions and areas.

## References

[1] David Lackey, Paul Zuchowski and J. Cohn. Managing power and performance for system-on- chip designs using voltage islands. $I C C A D$, pages 195-202, 2002.
[2] M.Hamada and T.Kuroda. Utilizing surplus timing for power reduction. CICC, pages 89-92, 2001.
[3] Xianlong Hong, Sheqin Dong. Non-slicing floorplan and placement using corner block list topological representation. IEEE Transaction on CAS, 51:228-233, 2004.
[4] W.L.Hung, G.M.Link and J.Conner. Temperature-aware voltage islands architecting in system-on-chip design. $I C C D, 2005$.
[5] W.P.Lee and Y.W.Chang. Voltage island aware floorplanning for power and timing optimization. ICCAD, pages 389-394, 2006.
[6] J.Hu, Y.Shin and R.Marculescu. Architecting voltage islands in core-based system-on-a-chip designs. ISLPED, pages 180-185, 2004.
[7] D.Sengupta and R.Saleh. Application-driven FloorplanAware Voltage Island Design. $D A C$, pages 155-160, 2008.
[8] Q.Ma and F.Y.Young. Network flow-based power optimization under timing constraints in msv-driven floorplanning. ICCAD, 2008.
[9] W.K.Mak and J.W.Chen. Voltage island generation under performance requirement for soc designs. $A S P \_D A C, 2007$.
[10] W.P.Lee and Y.W.Chang. An ILP algorithm for postfloorplanning voltage-island generation considering powernetwork planning. ICCAD, pages 650-655, 2007.
[11] H.Wu, I.M.Liu and Y.Wang. Post-placement voltage island
generation under performance requirement. ICCAD, 2005.
[12] R.Ching and F.Y.Young. Post-placement voltage island generation. ICCAD, 2006.
[13] R.K.Ahuja, T.L.Magnanti, and J.B.Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall/Pearson, 2005.


Bei Yu received the B.E degree in the Department of Mathematic from UESTC, China in 2007. He is currently a M.E. candidate in EDA lab, Department of Computer Science and Technology, Tsinghua University, China. His research interests include CAD for VLSI, floorplanning algorithms and low power design.


Sheqin Dong received the B.E. degree in Computer Science in 1985, M.S. degree in semiconductor physics and device in 1988, and Ph.D. degree in mechantronic control and automation in 1996. He is currently an associate professor of the EDA lab at the department of computer science and technology in Tsinghua University. His current research interests include CAD for VLSI, floorplanning and placement algorithms, multimedia ASIC and hardware design.


Song Chen received the B.S. degree in computer science from Xian Jiao-tong University, China, in 2000, the M.S. and Ph.D. degrees in computer science from Tsinghua University, China, in 2003 and 2005, respectively. From August 2005 to April 2009, he had been a visiting associate at the Graduate School of IPS, Waseda University, Japan, where he is now an assistant professor. His research interests include several aspects of electronic design automation, e.g., floorplanning, placement, highlevel synthesis.


Satoshi GOTO received the B.E. and M.E. degree in Electronics and Communication Engineering from Waseda University in 1968 and 1970, respectively. He also received the Dr. of Engineering from Waseda University in 1981. He is IEEE fellow, Member of Academy Engineering Society of Japan and professor of Waseda University. His research interests include LSI System and Multimedia System.


[^0]:    Manuscript received March 18, 2009.
    Manuscript revised June 22, 2009.
    ${ }^{\dagger}$ The authors are with the EDA lab, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    ${ }^{\dagger \dagger}$ The authors are with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, 808-0135 Japan
    a) E-mail: disyulei@gmail.com
    b) E-mail: dongsq@mail.tsinghua.edu.cn

[^1]:    ${ }^{\dagger} m$ is the number of edges, and $n$ is the number of nodes

