# Prime : A Timing-Driven Placement Tool 

 usingA Piecewise Linear Resistive Network Approach

Takeo Hamada, Chung-Kuan Cheng, and Paul M. Chau

University of California at San Diego, La Jolla, CA 92093-0407


#### Abstract

An approach toward path-oriented timing-driven placement is proposed. We first transform the placement with timing constraints to a Lagrange problem. A primal-dual approach is used to find the optimal relative module locations. In each primal dual iteration, the primal problem is solved by a piecewise linear resistive network method, while the dual process is used to update the Lagrange multiplier. The sparsity of the piecewise linear resistive network is exploited to obtain dramatic improvement on the efficiency of the calculation. Up to $22.0 \%$ of clock cycle reduction was observed for Primary2 test case.


## 1 Introduction

There have been extensive studies on timing-driven placement in recent years. The approaches toward the problem are often categorized into two groups; netbased and path-based. In a typical net-based approach, potential critical paths and acceptable delays at each cell are calculated, from which slack of a net on each path is derived. The slack of a net gives the upper and lower bounds on the net size, which serves as constraints during the subsequent placement phase. The basic idea of a net-based approach can be combined with ideas such as iterative improvement [13], constructive placement [18, 12], incremental timing analysis [19], giving rise to several variations.

Net weighting $[5,18]$ is a technique often used with this approach; it puts heavier weight on nets with smaller slack, or puts priority on critical nets, thus turning a constrained optimization problem into an unconstrained one. It has been reported that these techniques can achieve favorable results, however it should also be noted that the net weights tend to be heuristic, which makes it difficult to apply proper mathematical analysis to the problem.

In the path-based approach all or a subset of paths are taken into account in the formulation of the problem, often with a set of linear constraints. It is expected that the problem can be handled more mathematically in this approach, since the timing in VLSI is inherently path-oriented.

Jackson and Kuh [10] proposed an approach based on linear programming. Gao et al. [7] proposed it-
erative modification of net bounds to take advantage of the computational efficiency of the net-based approach. Srinivasan et al. [17] proposed an approach based on Lagrangian Relaxation. They observed that only a small subset of timing requirements are active as constraints at one time, thus the problem of a large number of paths can be effectively avoided. They represented timing requirements by a set of linear inequalities. When the corresponding constrained optimization problem is turned into a Lagrangian, these linear inequalities make Lagrangian non-differentiable. Subgradient method [2] was used to update Lagrange multipliers on the non-differentiable Lagrangian.

This paper presents an approach toward the pathoriented timing-driven placement. We adopt a nonlinear timing model to estimate the delay. For the problem of the nonlinear system, we propose a transformation to formulate the system with a nonlinear resistive network. An efficient piecewise linear resistive network approach is devised to solve the nonlinear problem. The approach of the resistive network analogy exploits the sparsity of the network and thus drastically reduces the complexity of the problem.

## 2 Timing Model

The timing of a chip, i.e. the clock cycle of a chip, is determined by a path with the worst delay time, which is also called a critical path. A path starts at a primary input or at an output of a latch, and terminates at a primary output or at an input of a latch. The delay of a path is obtained by aggregating cell delays on the path.

The delay between two successive logic stages is composed of three elements; (i) intrinsic delay due to switching a cell on/off. (ii) delays due to charging fanout and load capacitance up/down. (iii) delay due to distributed RC of interconnection.

Scaling rule [1] suggests that the wiring delay, which are delay(ii) and (iii), will be dominant for the circuits with larger chip size and smaller geometry, and especially delay (iii), which is delay due to distributed RC, can be quite significant for submicron circuit since it
grows super-linearly with the scaling factor and the chip dimension.

Suppose $v_{i}, v_{j}$ be logic cells of two successive stages, where $v_{i}$ precedes $v_{j}$. We define cell-to-cell delay $d\left(v_{i}, v_{j}\right)$ as a time interval between arriving rise/fall edges at respective stages.

When the effect of distributed RC is considered, Sakurai [15] derived a closed-form expression for interconnection delay of a RC distributed line, and expressions given by Carter et al. [3, 9] fit this purpose. This model was also used by Prasitjutrakul and Kubitz [14] in their timing-driven global router. We simplify the model by decomposing the distance into $x$ and $y$ two portions and ignoring the interactive relation between $x$ and $y$, i.e.

$$
\begin{align*}
& d\left(v_{i}, v_{j}\right)=b d_{i}+\alpha \cdot\left(r_{h} c_{h}\left|x_{i}-x_{j}\right|^{2}+r_{v} c_{v}\left|y_{i}-y_{j}\right|^{2}\right) \\
& \quad+\beta \cdot C_{j}\left(r_{h}\left|x_{i}-x_{j}\right|+r_{v}\left|y_{i}-y_{j}\right|\right) \\
& \quad+\beta \cdot R_{i} \sum_{i, j}\left(c_{h}\left|x_{i}-x_{j}\right|+c_{v}\left|y_{i}-y_{j}\right|\right) \\
& \quad+\beta \cdot R_{i} C_{L} \tag{1}
\end{align*}
$$

where constant $b d_{i}$ is the intrinsic cell delay of cell $v_{i}, C_{j}$ is the input load capacitance at cell $v_{j}, R_{i}$ is the equivalent on-resistance of the output transistor of cell $v_{i}$, and $C_{L}$ is the capacitive load of cell $v_{i}$. Constants $c_{h}$ and $c_{v}$ are unit length wire capacitance in horizontal and vertical layers, and constants $r_{h}$ and $r_{v}$ are unit length wire resistance in horizontal and vertical layers, respectively. When metal wiring is used both for horizontal and vertical layers, the resistance $r_{h}=r_{v}$ and the capacitance $c_{h}=c_{v}$ are typically $0.05[\Omega / \mu \mathrm{m}]$ and $2.0 \times 10^{-4}[p F / \mu \mathrm{m}]$, respectively [21].

For the coefficients of $\alpha$ and $\beta$, the following values are commonly used $[15,3,9,14]: \alpha=1.02, \beta=2.21$ for $90 \%$ threshold, $\alpha=0.59, \beta=1.21$ for $70 \%$ case, and $\alpha=0.5$ and $\beta=1.0$ for $62 \%$ case.

Points $\left(x_{i}, y_{i}\right),\left(x_{j}, y_{j}\right)$ represent positions of the terminal points. Although this timing model is nondifferentiable at points $x_{i}=x_{j}$ and $y_{i}=y_{j}$ of term (3), it is convex and piecewise-differentiable. The second term of equation (1) corresponds to the distributed RC delay. When this term is dropped, the timing model can be written as a set of linear inequalities, and it reduces to the timing-requirements used by Jackson and Kuh [10] and others. The forth term represents horizontal and vertical wire length, respectively. Since exact wire length can not be known at the time of placement, they have to be approximated, either by half-perimeter wire length or single-trunk rectilinear Steiner tree (ST-RST), or some other approximation method.

## 3 Problem Formulation

We formulate the problem of path-oriented timingdriven placement based on the Lagrangian problem. Let us assume that all nets are two pin nets to simplify the description. In the implementation, we use
a weighted star to approximate the Steiner tree. Following the strategy of [10], we also decompose the two dimensional placement problem into two problems in $x$ and $y$ dimensions, respectively. Since the processes of $x$ and $y$ directions are symmetrical, our discussion concentrates on the process on the $x$ direction in the sequel. Let $c_{i j}$ be the connectivity between cell $i$ and cell $j$. We want to minimize the total wire length while keeping the clock cycle no larger than $\tau$, i.e.

$$
\begin{equation*}
\min z=\sum_{(i, j)} c_{i j}\left|x_{i}-x_{j}\right| \quad h_{k}(\mathrm{x}) \leq \tau, k=1 \ldots \pi \tag{2}
\end{equation*}
$$

where $\mathbf{x}$ is the vector of cell locations in $\boldsymbol{x}$ dimension, $h_{k}(\mathbf{x})$ is the delay equation of a path, and $\pi$ is the number of paths in the circuit. Note that each of the objective function and path delay functions $h_{k}(\mathbf{x})$ is a convex function. The number $\pi$ can be quite a large number, usually exponential to the number of cells in the circuit. We introduce slack variables $s_{k}$ such that the inequalities are written as follows;

$$
\begin{equation*}
h_{k}(\mathbf{x})+s_{k}=\tau \tag{3}
\end{equation*}
$$

3.1 The Lagrangian and Its Properties

Using Lagrangian multipliers, we turn (2) into an unconstrained optimization problem,

$$
\begin{gather*}
\max _{\lambda \geq 0} \min _{\mathbf{X}} L(\mathbf{x}, \lambda) \\
L(\mathbf{x}, \lambda)=\sum_{(i, j)} c_{i j}\left|x_{i}-x_{j}\right|+\sum_{k=1}^{\pi} \lambda_{k}\left(h_{k}(\mathbf{x})-\tau\right) \tag{4}
\end{gather*}
$$

where $L(\mathbf{x}, \lambda)$ is the Lagrangian. From Kuhn-Tucker theory [8] we obtain the following complementary slackness theorem. Let $\lambda_{k}^{*}$ and $s_{k}^{*}$ be their respective values at global optimum.

Theorem 3.1 At global optimum, $\lambda_{k}^{*} s_{k}^{*}=0$.
The theorem implies that there are active constraints ( $\lambda_{k}>0$ ) and inactive constraints ( $\lambda_{k}=0$ ), and we can effectively ignore those inactive constraints in the neighborhood of the global optimum. As observed by Srinivasan et al. [17], the number of active constraints are kept small throughout timing-driven placement.

We can look at the Lagrangian as a function of two sets of variables, primal variables $x$ and dual variables $\lambda$. We can use $L(\mathbf{x})$ or $L(\lambda)$ to denote the Lagrangian when the other variable set is fixed. The following two optimization problems can be formulated using the above definitions: the primal problem

$$
\min _{\mathbf{x}} L(\mathbf{x}) \text { s.t. } \quad h_{k}(\mathbf{x}) \leq \tau, \quad k=1,2, \ldots, \pi
$$

and the dual problem

$$
\max _{\lambda} L(\lambda) \text { s.t. } \quad \lambda \geq 0
$$

A saddlepoint of the Lagrangian is defined as such a pair ( $\mathbf{x}^{*}, \lambda^{*}$ ) that satisfies

$$
L\left(\mathbf{x}^{*}, \lambda\right) \leq L\left(\mathbf{x}^{*}, \lambda^{*}\right) \leq L\left(\mathbf{x}, \lambda^{*}\right)
$$

for all $\lambda \geq 0$ and all $x$. As customary, we assume constraint qualification, differentiability, and convexity of $h_{k}(x)$. The following duality theorem is due to Wolfe [22].

Theorem 3.2 If $\hat{\mathbf{x}}$ is the solution of the convex primal problem, then dual variables $\hat{\lambda}$ exist, such that ( $\hat{\mathbf{x}}$, $\hat{\lambda})$ solves the dual problem and the extremes of two problems are equal, that is

$$
L(\hat{\mathbf{x}}, \hat{\lambda})=\min _{\mathbf{x}} \max _{\lambda \geq 0} L(\mathbf{x}, \lambda)=\max _{\lambda \geq 0} \min _{\mathbf{x}} L(\mathbf{x}, \lambda) .
$$

The main idea of the primal-dual method [2] is to find a saddlepoint of the Lagrangian by solving primal problem and dual problem alternatively.

## 4 Primal Solution Using A Piecewise Linear Resistive Network Approach

We propose a piecewise linear resistive network approach to solve the primal problem. In the primal problem, the current $\lambda$ and the set of critical paths are given. We search for the cell positions to minimize the Lagrangian. Let $K_{i j}$ denote the set of the critical path passing net $(i, j)$. The primal problem can be rewritten as

$$
L(\mathbf{x})=\sum_{(i, j)}\left[c_{i j}\left|x_{i}-x_{j}\right|+\sum_{k \in K_{* j}} \lambda_{k} d\left(x_{i}, x_{j}\right)\right]+A
$$

where $A$ represents the constant contributed by $\tau$, the cell delay $b d_{i}$, the wire segment in the $y$ direction, etc. The contribution of a net $(i, j)$ to $L(x)$ can be written as a function of the coordinates of its two adjacent cells $i$, and $j$, i.e.

$$
\begin{align*}
& f\left(x_{i}, x_{j}\right)=c_{i j}\left|x_{i}-x_{j}\right|+\sum_{k \in K_{i j}} \lambda_{k} d\left(x_{i}, x_{j}\right)= \\
& c_{i j}\left|x_{i}-x_{j}\right|+\left(\sum_{k \in K_{i j}} \lambda_{k}\right)\left[\alpha \cdot r_{h} c_{h}\left|x_{i}-x_{j}\right|^{2}\right. \\
& \left.+\beta \cdot C_{j} r_{h}\left|x_{i}-x_{j}\right|+\beta \cdot R_{i} c_{h}\left|x_{i}-x_{j}\right|\right] . \tag{5}
\end{align*}
$$

We rewrite $f\left(x_{i}, x_{j}\right)$ in a simpler form:

$$
\begin{equation*}
f\left(x_{i}, x_{j}\right)=a_{i j}\left(x_{i}-x_{j}\right)^{2}+b_{i j}\left|x_{i}-x_{j}\right| \tag{6}
\end{equation*}
$$

where positive constants $a_{i j}, b_{i j}$ are given by

$$
\begin{aligned}
& a_{i j}=\left(\sum_{k \in K_{i j}} \lambda_{k}\right) \alpha r_{h} c_{h} \\
& b_{i j}=c_{i j}+\left(\sum_{k \in K_{i j}} \lambda_{k}\right) \beta\left(C_{j} r_{h}+R_{i} c_{h}\right)
\end{aligned}
$$

Let $E_{i}$ be the set of cells adjacent to cell $i$. The necessary condition [8] for the optimal solution requires that the gradient of $L(\mathbf{x})$ be equal to zero, i.e.

$$
\begin{equation*}
\partial L(x) / \partial x_{i}=\sum_{j \in E_{\imath}} \partial f\left(x_{i}, x_{j}\right) / \partial x_{i}=0 \tag{7}
\end{equation*}
$$

Suppose that $x_{i} \neq x_{j}$. Let $\delta_{i j}=1$, if $x_{i}>x_{j}$; $\delta_{i j}=-1$, otherwise. The partial derivative of $f\left(x_{i}, x_{j}\right)$ with respect to $x_{i}$ can be expressed as

$$
\begin{equation*}
\partial f\left(x_{i}, x_{j}\right) / \partial x_{i}=2 a_{i j}\left(x_{i}-x_{j}\right)+\delta_{i j} b_{i j} \tag{8}
\end{equation*}
$$

### 4.1 The A nalogy of A Nonlinear Resistive Network

We transform the primal problem to a nonlinear resistive network. The network has each node $i$ correspond to each cell $i$, and the voltage of node $i$ correspond to the location $x_{i}$ of cell $i$. For each net $(i, j)$, we construct a nonlinear resistor connecting nodes $i$ and $j$ with a conductance, $\epsilon, \sigma_{i j}$. Given a constant $\epsilon$, $\sigma_{i j}$ is defined as follows:

$$
\sigma_{i j}= \begin{cases}2 a_{i j}+b_{i j} /\left|x_{i}-x_{j}\right| & \text { if }\left|x_{i}-x_{j}\right|>\epsilon  \tag{9}\\ 2 a_{i j}+b_{i j} / \epsilon & \text { if }\left|x_{i}-x_{j}\right| \leq \epsilon\end{cases}
$$

The current flowing from node $i$ to node $j$ is equal to the product of the voltage difference $x_{i}-x_{j}$ and the conductance $\sigma_{i j}$, i.e.

$$
\begin{align*}
& \left(x_{i}-x_{j}\right) \cdot \sigma_{i j}=  \tag{10}\\
& \begin{cases}2 a_{i j}\left(x_{i}-x_{j}\right)+\delta_{i j} b_{i j} & \text { if }\left|x_{i}-x_{j}\right|>\epsilon \\
2 a_{i j}\left(x_{i}-x_{j}\right)+b_{i j}\left(x_{i}-x_{j}\right) / \epsilon & \text { if }\left|x_{i}-x_{j}\right| \leq \epsilon\end{cases}
\end{align*}
$$

which is an approximation of the partial derivative of $f\left(x_{i}, x_{j}\right)$ with respect to $x_{i}$, equation (8). Consequently, the Kirchhoff's current law [4] in the nonlinear resistive network corresponds to the necessary condition equation (7). Therefore we can claim the analogy between the primal problem and the resistive network.

Theorem 4.1 Given a constant $\Upsilon$, there exists an $\epsilon$ for conductance equation (9) such that the voltage solution of the transformed nonlinear resistive network is an approximate solution of the primal problem with an error bound $\Upsilon$ on the value of $L(\mathbf{x})$

Note that this analogy is an extension of the approach proposed by Sigl et al. [16]. In [16], a quadratic equation with nonlinear coefficients is derived to match the value of the objective function. Our approach derives the equivalence of the resistive network solution by matching the gradient constraint (7) of the original problem to the current law of the new problem, rather than by matching only on the value of the objective function.

### 4.2 A Piecewise Linear Resistive Network Algorithm

Figure 1 illustrates the concept of the piecewise linear resistive network, where $m$ is the number of cells, and $n-m$ is the number of $1 / O$ pads. The node voltages of $m$ free terminals correspond to the cell positions whereas the $n-m$ independent voltage sources correspond to the positions of the I/O pads.


Figure 1: An n-terminal Passive Resistive Network
Let $\mathbf{x}_{1}$ denotes cell positions, and $\mathbf{x}_{2}$ denotes fixed I/O pad positions. Let $G_{11}$ and $G_{12}$ denote the submatrices of the admittance matrix corresponding to vectors $x_{1}$ and $x_{2}$. The primal problem reduces to solving the following equation

$$
G_{11} \mathbf{x}_{1}=-G_{12} \mathbf{x}_{2}
$$

Note that the admittance matrix contains a nonzero entry ( $i, j$ ) only if there is a net $(i, j)$ in the placement problem. Utilizing the sparsity of the network, the complexity of the problem is drastically reduced. Since all the conductances are positive, the admittance matrix is symmetric and positive definite. Furthermore, after the approximation of the partial derivative of $f\left(x_{i}, x_{j}\right)$ by equation (9), the resistive network satisfies the Lipschitz condition [11].

We can approximate the nonlinear resistance by a piecewise linear resistive network. The piecewise linear resistive network approach proposed by Katzenelson, Fujisawa and Kuh ( $K-F-K$ ) [11, 6] can be exploited to obtain the solution. The Lipschitz condition and the positive definite condition of the admittance matrix constitute a sufficient condition for the convergence of the algorithm [6]. We state this fact as the following theorem.

Theorem 4.2 The $K-F-K$ piecewise linear resistive network approach proposed by [6] converges to a global minimum solution on the resistive network.

It is however costly to apply $K-F-K$ algorithm exactly to the placement problem, since the solution process must stop at the boundary of the regions, and the Hessian of the Lagrangian must be recalculated at
the point to search for a new direction of the solution curve.

We improve the efficiency of the algorithm by skipping the discontinuous points and jump to the local optimal solution. Our piecewise linear resistive network method starts from an initial node voltage vector $\overline{\mathbf{x}}$. Given a current voltage vector $\overline{\mathbf{x}}$, we define the conductance of the resistors according to (9). Next, we treat the circuit as a linear resistive network, and find the voltage solution with the Successive Over Relaxation $(S O R)$ method [20]. We then update $\overline{\mathbf{x}}$ to the derived voltage vector and repeat the process of the linear resistive network construction and the voltage solution derivation. The iteration continues until the voltage vector $\overline{\mathrm{x}}$ converges or the number of iterations is beyond a limit. In our implementation, we set this limit as a constant factor of the region size.

Algorithm 4.1
Piecewise Linear Resistive Network Approach \{

1. Set initial point $\mathbf{x}$ vector.
2. Calculate $\sigma_{i j}$ according to the given $\mathbf{x}$ vector.
3. Solve $G_{11} \mathbf{x}_{1}=-G_{12} \mathbf{x}_{2}$ as
a linear resistive network.
4. Repeat 2.-9., until convergence or number of iterations exceeds the limit.
\} /* end of algorithm. */
We devise a potential function $\Psi(\mathbf{x})$ and show the convergence of the algorithm.

Theorem 4.3 For each iteration of steps 289 of the algorithm, the potential function $\mathbf{\Psi}(\mathbf{x})$ strictly decreases.

## 5 Dual Solution

To solve the dual problem, we apply the Newton method to update dual variables $\lambda$. Suppose the dual function $L(\lambda)$ is twice differentiable regarding $\lambda$. Let $\lambda^{k}$ be the $k$-th iterative solution of $\lambda$ by the primaldual loop, and consider a second-order Taylor series of $L(\lambda)$ about $\lambda^{k}$,

$$
\begin{align*}
L(\lambda) & \approx L\left(\lambda^{k}\right)+\left(\lambda-\lambda^{k}\right) \nabla L\left(\lambda^{k}\right) \\
& +\frac{1}{2}\left(\lambda-\lambda^{k}\right)^{T} \operatorname{Hess}\left(\lambda^{k}\right)\left(\lambda-\lambda^{k}\right) \tag{11}
\end{align*}
$$

where $H e s s\left(\lambda^{k}\right)$ represents the Hessian of $L(\lambda)$ at $\lambda^{k}$. From Theorems 3.1 and $3.2 H e s s\left(\lambda^{k}\right)$ is negative semi-definite in the neighborhood of a saddlepoint. We want to achieve the maximum of $L(\lambda)$ such that $\nabla L(\lambda)=0$. This leads us to choose the next dual variables $\lambda^{k+1}$ by

$$
\begin{equation*}
\lambda^{k+1}-\lambda^{k}=-\left[H e s s\left(\lambda^{k}\right)\right]^{-1} \nabla L\left(\lambda^{k}\right) \tag{12}
\end{equation*}
$$

Note that $\lambda \geq 0$ must be observed, which requires that any of the variables of $\lambda^{k+1}$ that turn negative
during the update are set to 0 , and the corresponding path turns inactive.

Deriving from equations (3) and (4), we have

$$
\nabla L\left(\lambda^{k}\right)=-\mathbf{s}^{k}
$$

where $\mathbf{s}^{k}$ is a vector of slack variables at the $k$-th iteration given by (3). We further assume that Hess $\left(\lambda^{k}\right)$ is diagonally dominant, which implies that the paths corresponding to active constraints are relatively independent of each other. This assumption can be true only when the number of active paths is relatively small compared to the dimension of Hess $\left(\lambda^{k}\right)$, which equals to the number of movable cells in the circuit. In fact, the number of active paths are usually kept small, only a fraction of the number of movable cells, throughout the primal-dual iterations. We also assume that $H e s s\left(\lambda^{k}\right)$ is negative definite, such that a unique maximizer of (11) is found by (12). When $H e s s\left(\lambda^{k}\right)$ is negative definite, so is its inverse $\left[H e s s\left(\lambda^{k}\right)\right]^{-1}$. The inverse $\left[H e s s\left(\lambda^{k}\right)\right]^{-1}$ is then approximated by

$$
\left[H e s s\left(\lambda^{k}\right)\right]^{-1} \approx D\left(\lambda^{k}\right)+\alpha I
$$

where $D\left(\lambda^{k}\right)$ is a diagonal matrix, whereas $\alpha$ is a small negative constant to keep $[H e s s(\lambda)]^{-1}$ negative definite. The diagonal matrix $D\left(\lambda^{k}\right)$ is calculated numerically as follows:

$$
D_{i i}\left(\lambda^{k}\right)=-\frac{\lambda_{i}^{k}-\lambda_{i}^{k-1}}{s_{i}^{k}-s_{i}^{k-1}}
$$

where $s_{i}^{k}, s_{i}^{k+1}$ and $\lambda_{i}^{k}, \lambda_{i}^{k+1}$ are slack variables and dual variables of path $p_{i}$, corresponding to the $k$-th and $k+1$-th iterations, respectively.

## 6 Prime : A Path-Oriented TimingDriven Placement Tool

Algorithm 6.1 illustrates the proposed pathoriented timing-driven placement algorithm via pseudo-code.

This algorithm is composed of three parts; timingdriven global placement, initial detail placement by linear placement, and iterative refinement by pairwise swapping. The heart of timing-driven global placement is primal-dual iterations. The placement is performed within a region. Initially, there is only one region which covers the whole chip. In each iteration of the loop, a region with the largest size is taken.

First, global placement algorithm without timing constraints is applied to obtain an initial primal solution, from which the target clock cycle is calculated as $98 \%$ of the current critical timing. $L(\mathbf{x}, \lambda)$ represents the Lagrangian, whose saddle point gives a solution to the path-oriented timing-driven placement.

Second, timing requirements are added to the Lagrangian. Third, the primal-dual iteration is applied to the Lagrangian. After each iteration, convergence

Algorithm 6.1

```
Prime \(\{\)
    /* timing-driven global placement. */
    B: constant region size.
    while (region size \(>B\) ) \{
            take the largest region from the list;
            find initial primal solution;
            \(\tau\) : \(98 \%\) of the current critical timing.
            add timing requirements to
                Lagrangian \(L(\mathbf{x}, \lambda)\);
            /* primal-dual iteration. */
            \(L\) : loop limit;
            \(\epsilon\) : convergence error;
            for (loop \(=0 ;\) loop \(<L ;\) loop ++ ) \{
                /* dual problem. */
                maximize \(L(\mathrm{x}, \lambda)\) w.r.t. \(\lambda\);
                /* primal problem. */
                minimize \(L(\mathbf{x}, \lambda)\) w.r.t. \(\mathbf{x}\);
                if \(\left(\left\|x^{n e w}-x^{o l d}\right\|<\epsilon\right)\{\)
                        /*
                    solution converged.
                    end primal-dual loop.
                */
                break;
                \}
                if (target cycle time is achieved) \{
                    set new target cycle \(\tau\);
                    update Lagrangian;
            \}
            \}
            partition the current region;
    \}
    **initial detail placement. */
    linear placement of each region;
    /*iterative refinement. */
    pairwise swapping;
    row length equalization;
    orientation flipping;
\} /* end of algorithm. */
```

of the solution is examined. The convergence error $\epsilon$ is set to a small value such that the solution is accurate enough to represent cell positions. If the target cycle time is achieved before its full convergence, new target cycle time is set, and the Lagrangian is updated, continuing the primal-dual iteration. Note that the target clock cycle originally set may be overconstrained, and the primal-dual iteration may not converge. We set loop limit $L$, beyond which the iteration is terminated. The constant $L$ is set to a factor of the region size such that the iteration converges in normal cases. Lastly, the region is partitioned according to the positions of cells in the current solution, to satisfy slot constraints, forming a balanced slicing tree.

## 7 Experimental Results

Prime is implemented in C language. All the following experiments were performed on Sun 4/IX with

Table 1: Statistics on Test Cases

| circuit | cell | pad | net | pin | latch |
| :--- | ---: | ---: | ---: | ---: | ---: |
| (I) Primary1 | 752 | 81 | 1266 | 5888 | 269 |
| (II) Primary2 | 2907 | 107 | 3817 | 19195 | 603 |

Table 2: Experimental Results on Primary1 and Primary2

| Algo. | H-P <br> $[\mathrm{mm}]$ | ST-RST <br> $[\mathrm{mm}]$ | cycle <br> $[\mathrm{ns}]$ | delay <br> $[\mathrm{ns}]$ | CPU <br> $[\mathrm{s}]$ |
| :--- | :---: | :---: | :---: | :---: | :---: |
| (I) CV | 1055.7 | 1348.2 | 45.43 | 24.58 | 144.1 |
| (I) TD | 1111.7 | 1379.4 | 38.32 | 17.46 | 234.5 |
| (II) CV | 4439.1 | 7063.7 | 92.63 | 52.95 | 797.2 |
| (II) TD | 4512.9 | 6830.0 | 72.25 | 31.55 | 2049.2 |

48M bytes of main memory. We applied Prime to Primary 1 and Primary2 test cases from MCNC. Both test cases are sequential circuit. Table 1 shows statistics on the two test cases.

Table 2 summarizes experimental results on Primary1 and Primary2 test cases. We used Prime without timing constraints to obtain results by conventional placement (CV), and by timing-driven placement (TD) for respective test cases, Primary1 (I) and Primary2 (II). The table shows half-perimeter wire length (H-P), single-trunk rectilinear Steiner tree (STRST) wire length, clock cycle time, wire delay, which is the contribution from wire delay to the clock cycle time, and CPU time. We assumed metal wiring for both vertical and horizontal layers. We used the timing model of equation (1) to calculate cell-to-cell delay, and we used ST-RST to estimate load capacitance of wires. For Primary1 test case, $17.5 \%$ reduction in clock cycle time was obtained at the cost of $2.3 \%$ increase of ST-RST wire length and 1.63 times increase of CPU time. The percentage of the reduction in terms of the wire delay is $29.0 \%$. For Primary2 test case, $22.0 \%$ reduction in clock cycle time was obtained at the cost of 2.57 times increase of CPU time. The percentage of the reduction in terms of the wire delay only is $40.0 \%$.

## References

[1] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Reading, MA: Addison-Wesley, 1990.
[2] D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, San Diego, CA: Academic Press, 1982.
[3] D. L. Carter and D. F. Guise, "Analysis of Signal Propagation Delays and Chip Level Performance Due to OnChip Interconnections," Proc. of Int. Conf. on Computer Design, pp. 218-221, Nov. 1983.
[4] C. A. Desoer and E. S. Kuh, Basic Circuit Theory, New York, NY: McGraw-Hill, 1969.
[5] A. E. Dunlop, V. D. Agrawal et al., "Chip Layout Optimization Using Critical Path Weighting," Proc. of 21st Design Automation Conf., pp. 133-136, 1984.
[6] T. Fujisawa and E. S. Kuh, "Piecewise-linear Theory of Nonlinear Networks," SIAM J. Appl. Math., vol. 22, no. 2, pp.307-328, Mar. 1972.
[7] T. Gao, P. M. Vaidya, and C. L. Liu, "A Performance Driven Macro-Cell Placement Algorithm," Proc. of 29th Design Automation Conf., pp.147-152, June 1992.
[8] G. Hadley, Nonlinear and Dynamic Programming, Reading, MA: Addison-Wesley, 1964.
[9] E.-H. Horneber and W. Mathis, "A Closed-Form Expression for Signal Delay in CMOS-Driven Branched Transmission Lines" in VLSI'87, ed. by C. H. Sequin, pp. 353362.
[10] M. A. B. Jackson and E. S. Kuh, "Performance-Driven Placement of Cell Based IC's," Proc. of 26th Design Automation Conf., pp.370-375, June 1989.
[11] J. Katzenelson, "An algorithm for solving nonlinear resistive networks," Bell System 'Tech. J., vol. 44, pp.16051620,1965 .
[12] I. Lin, D. H. C. Du, "Performance-Driven Constructive Placement," Proc. of 27th Design Automation Conf., pp.103-106, June 1990.
[13] Y. Ogawa, et al., "Efficient Placement Algorithms Optimizing Delay for High-Speed ECL Masterslice LSI's," Proc. of \&Srd Design Automation Conf., Pp. 404-410, June 1986.
[14] S. Prasitjutrakul and W. J. Kubitz, "A Timing-Driven Global Router for Custom Chip Design," Proc. of Int. Conf. on Computer-Aided Design, pp.48-51, Nov. 1990.
[15] T. Sakurai, "Approximation of Wiring Delay in MOSFET LSI," IEEE Journal of Solid-State Circuits, vol. SC-18, no. 4, pp. 418-426, 1983.
[16] G. Sigl, K. Doll, F. M. Johannes, "Analytical Placement: A Linear or a Quadratic Objective Function?," Proc. of 28th Design Automation Conf., pp. 427-432, June 1991.
[17] A. Srinivasan, K. Chaudhary, E. S. Kuh, "RITUAL: A Performance Driven Placement Algorithm for Small Cell ICs", Proc. of Int. Conf. on Computer-Aided Design, pp. 48-51, Nov. 1991.
[18] S. Sutanthavibul and E. Shragowitz, "An Adaptive Timing-Driven Layout for High Speed VLSI," Proc. of 27th Design Automation Conf., pp. 90-95, June. 1990.
[19] S. Teig, R. L. Smith and J. Seaton, "Timing-Driven Layout of Cell-Based ICs," VLSI Systems Design, pp. 63-73, May 1986.
[20] R. S. Tsay and E. S. Kuh, "Module Placement for Large Chips Based on Sparse Linear Equations," Int. Journal of Circuit Theory and Applications, vol. 16, pp. 411-423, 1988.
[21] N. Weste, K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, Reading MA: AddisonWesley, 1985.
[22] P. Wolfe, "A Duality Theorem for Nonlinear Programming," Quarterly of Applied Math., vol. 19, 239, 1961.

