# CLIP: An Optimizing Layout Generator for Two-Dimensional CMOS Cells* 

Avaneendra Gupta ${ }^{\dagger}{ }^{\dagger}$ and John P. Hayes ${ }^{\dagger}$<br>\{avigupta, jhayes\}@eecs.umich.edu<br>${ }^{\dagger}$ Advanced Computer Architecture Laboratory Dept. of EECS, University of Michigan<br>Ann Arbor, MI 48109, U.S.A.<br>${ }^{\S}$ Design Technology, Intel Corporation<br>2200 Mission College Blvd. Santa Clara, CA 95052, U.S.A.


#### Abstract

We present a novel technique CLIP for optimizing both the height and width of CMOS cell layouts in the two-dimensional (2D) style. CLIP is based on integer-linear programming (ILP) and proceeds in two stages: First, an ILP model is used to determine a 2-D layout of minimum width $W_{\text {cell. }}$. Then, another model generates a 2-D layout that has width $W_{\text {cell }}$ and requires a minimum number of routing tracks. Run times are in seconds for circuits with up to 16 transistors. For larger circuits, we extend CLIP to a hierarchical method HCLIP that places series-connected transistors contiguously. This reduces run times by up to three orders of magnitude, and still yields optimal results in over $80 \%$ of cases.


## 1 Introduction

The objective of cell layout synthesis is to minimize the cell area subject to constraints. For one-dimensional (1-D) layouts, which use a single pair of parallel P and N diffusion rows, minimizing both cell width and height can yield up to $80 \%$ savings in area over width minimization alone [9]. Moreover, height reduction can reduce wire lengths and improve cell performance. In twodimensional (2-D) layouts, which allow multiple P/N rows, height minimization can have a larger impact on area and performance.

Even in the constrained 1-D style, most techniques that address both width and height minimization are heuristic [1, 4, 8, 11, 12]. Only a few methods [9] are exact in that they explore the entire range of possible layouts. The 2-D style has been relatively little studied and the few techniques proposed are also ad hoc [12, 14, 16]. Tools such as Virtuoso [3] support 2-D layout; they use heuristics that can handle large cells but yield sub-optimal layouts.

In [5, 6], the authors proposed an optimal technique based on integer-linear programming (ILP) to generate minimum-width 2D layouts. While the technique is viable for practical-sized circuits, it does not consider cell height. In this paper, we present a new ILP-based technique called CLIP (Cell Layout via Integer Programming) that generates 2-D layouts of minimum height from among all layouts of minimum width. We then extend CLIP to a hierarchical method HCLIP that clusters series-connected transistors, which are placed contiguously in the layout. HCLIP reduces run times by up to three orders of magnitude and still yields optimal results in over $80 \%$ of the circuits tested.

## 2 Width Minimization

The 2-D cell layout style is illustrated in Fig. 1; its assumptions are listed in Table 1. If $W_{r}$ is the width of the $r$-th row, then the 2D cell-width minimization problem is stated as follows: Place the pairs in a given number of rows such that the maximum width among all rows is minimized, i.e., minimize width $W_{\text {cell }}$, where

$$
W_{\text {cell }}=\max \left\{W_{r}: \text { for each P/N row } r=1,2, \ldots\right\}
$$

As discussed in [5], the width $W_{r}$ of row $r$ is given by:

$$
W_{r}=t_{r}+c_{r}-1+v_{r}
$$

* This research was supported by a grant from Intel Corporation.
"Permission to make digital/hard copy of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or com-
mercial advantage, the copy-right notice, the title of the publication and its date appear, and mercial advantage, the copy-right notice, the title of the publication and its date appear, and
notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, notice is given that copying is by permission of ACM, inc. To copy otherwise, to republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a fee." DAC 97, Anaheim, California
(c) 1997 ACM 0-89791-920-3/97/06 .. $\$ 3.50$


Fig. 1 : The 2-D cell layout style with routing regions

## 1. Only dual CMOS circuits of fixed structure are considered.

2. Alternate rows are flipped to allow power rails to be shared among adjacent rows.
3. P and N transistors belonging to a pair are vertically aligned so that their terminals that are on common nets can be connected using vertical wires.
4. Intra-cell routing is restricted to polysilicon and metal1.
5. Terminals on adjacent diffusion rows are routed in the channel between the rows.
6. Diffusion gaps do not permit a wire to be routed through them. Also, no wires are routed over diffusions which are strapped with diffusion-to-metal contacts. Hence, routes that span across rows must be routed along the sides of the cell.

Table 1: Assumptions underlying the 2-D cell layout problem
where $t_{r}, c_{r}$, and $v_{r}$ are the numbers of pairs, chains, and inter-row wires, respectively, in row $r$. The technique of [5] implicitly models diffusion sharing-it generates an exhaustive set of transistor chains and then determines the smallest subset of chains that covers each pair. However, since the position and orientation of each pair is unknown, the routing and, in turn, the height cannot be determined.
$\boldsymbol{C L I P}-\boldsymbol{W}$. We now describe an ILP-based width minimization method called CLIP-W which explicitly models transistor-pair locations, orientations, and diffusion sharing. The following parameters must be determined to specify a 2-D layout: the row, location, and orientation of each pair, the diffusion sharing among adjacent pairs, and the vertical nets that connect transistor terminals across different $\mathrm{P} / \mathrm{N}$ rows.

Table 2 lists the input and derived circuit parameters for CLIP-W. To represent the position of each pair in the 2-D plane, we introduce place-holders, called slots, in each row. For a circuit with numPairs pairs, a 2-D placement in numRows rows requires at least maxSlots $=\lceil$ numPairs $/$ numRows $\rceil$ slots in each row. Slots are numbered in increasing order from the left. We also define sets of integers slots $=$ $\{1,2, \ldots$ maxSlots $\}$, rows $=\{1,2, \ldots$, numRows $\}$, and orients $=\{1,2$, $3,4\}$, the four possible orientations for each pair. The $0-1$ array share is such that share $\left[p_{i}, o_{i}, p_{j}, o_{j}\right]=1$ if pairs $p_{i}$ and $p_{j}$ can share diffusions in orientations $o_{i}$ and $o_{j}$, respectively, when $p_{j}$ is placed to the right of $p_{i}$.
The basic variables for each pair are represented by $0-1$ arrays $X[$ pairs, slots, rows $]$ and $\operatorname{Xor}[$ pairs, orients $]$. While $X[p, \mathrm{~s}, r]=1$ implies that pair $p$ is placed in the $s$ slot of row $r, \operatorname{Xor}[p, o]=1$ states that $p$ is placed in orientation $o$. To model diffusion sharing, we define nogap $[$ slots, rows $]$ where nogap $[s, r]=1$ if adjacent slots $s$ and $s+1$ do not have a gap between them.

The goal (cost function) of $C L I P-W$ is to minimize $W_{\text {cell }}$, where $W_{\text {cell }} \geq W_{r}$ for each row $r$. Now, $W_{r}$ can be derived as follows:

$$
\begin{aligned}
W_{r} & =\text { \#pairs in row } r+\text { \#gaps in row } r+\text { \#vertical wires } \\
& =\Sigma \operatorname{Xrow}[p, r]+(\Sigma \operatorname{Xrow}[p, r]-1-\Sigma \text { nogap }[s, r])+v_{r}
\end{aligned}
$$

The constraints of $C L I P-W$ are described below.

1. Pair inclusion: A pair must be placed in one slot and orientation.

$$
\sum_{s \in \text { slots }} \sum_{r \in \text { rows }} X[p, s, r]=1 \quad \text { for all } p \in \text { pairs }
$$

| Parameters | Interpretation |
| :---: | :---: |
| 1.numPairs, numRows, maxSlots The number of pairs, rows, and slots |  |
| 2.pairs, rows, slots, nets | The set of pairs, rows, slots, and nets |
| 3.PpairNets, NpairNets | PpairNets $[p]=\{$ gate, source, drain nets of $P$ trans. of pair $p\}$ (NpairNets is similarly defined for $N$ trans.) |
| 4. Psrc[pairs, nets], Pgate[pairs, nets], Pdrn[pairs, nets] | $\operatorname{Psrc}[p, n]=1$ if pair $p$ has net $n$ on the source diffusion of its P transistor (Pgate $[p, n]$ and $P d r n[p, n]$ are similarly defined for gate/drain terminals) |
| 5. Nsrc[pairs, nets], Ngate[pairs, nets], Ndrn[pairs, nets] | $N s r c[p, n]=1$ if pair $p$ has net $n$ on the source diffusion of its N transistor (Ngate[p, $n]$ and $\operatorname{Ndrn}[p, n]$ are similarly defined for gate / drain terminals) |
| 6. share[pairs, orients, pairs, orients] | share $\left[p_{i}, o_{i}, p_{j}, o_{j}\right]=1$ if pair $p_{i}$ in orient $o_{i}$ can share diffusion with pair $p_{j}$ in orient $o_{j}$ |

Table 2: Input (1-3) and derived (4-6) parameters for CLIP-W

$$
\underset{o \in \text { orients }}{\sum \operatorname{Xo}[p, o]=1 \quad \text { for all } p \in \text { pairs }}
$$

2. Slot occupancy: We force the first slot in each row to be filled with exactly one pair, and slots to be filled in a left-justified order, i.e., slot $s$ should be occupied before its neighboring slot $s+1$.

$$
\begin{array}{cc}
\sum X[p, 1, r]=1 & \text { for all } r \in \text { rows } \\
\underset{p \in \text { pairs }}{\sum \underset{p \in \text { pairs }}{X[p, s-1, r]}} \underset{p \in \text { pairs }}{\geq \sum X[p, s, r]} & \text { for all } r \in \operatorname{rows}, \\
s \in \text { slots }
\end{array}
$$

3. Diffusion sharing: Nogap $[s, r]$ can be defined by the following logic equation, for every $p_{i}, p_{j} \in$ pairs and $o_{i}, o_{j} \in$ orients:

$$
\begin{aligned}
& \operatorname{nogap}[s, r] \\
& =\text { or }\left(p_{i} \text { is in slot } s \text { of row } r \text { and } p_{j} \text { is in slot } s+1 \text { of row } r\right. \\
& \left.\quad \text { and } p_{i} \text { in orient } o_{i} \text { and } p_{j} \text { in orient } o_{j} \text { and share }\left[p_{i}, o_{i}, p_{j}, o_{j}\right]=1\right) \\
& =\text { or }\left\{X [ p _ { i } , s , r ] \text { and or } \left\{X\left[p_{j}, s+1, r\right] \text { and merged }\left[p_{i}, p_{j}\right]:\right.\right. \\
& \text { for all } \left.\left.p_{j} \in \text { pairs }\right\}: \text { for all } p_{i} \in \text { pairs }\right\}
\end{aligned}
$$

Here merged $\left[p_{i}, p_{j}\right]=1$ if $p_{i}$ can share diffusion with $p_{j}$, that is,
$\operatorname{merged}\left[p_{i}, p_{j}\right]=$ or $\left\{\operatorname{Xor}\left[p_{i}, o_{i}\right]\right.$ and $\operatorname{Xor}\left[p_{j}, o_{j}\right]:$
for all $o_{i}, o_{j} \in$ orients such that $\left.\operatorname{share}\left[p_{i}, o_{\dot{v}} p_{j}, o_{j}\right]=1\right\}$
To prevent cyclic conditions in diffusion sharing, we ensure that a pair can share diffusion with at most one pair on either side:

$$
\begin{array}{ll}
\sum \operatorname{merged}\left[p_{1}, p_{2}\right] \leq 1 & \text { for all } p_{1} \in \text { pairs }(3) \\
\sum \operatorname{merged}\left[p_{1}, p_{2}\right] \leq 1 & \text { for all } p_{2} \in \text { pairs }(4)
\end{array}
$$

The logical constraints $(1,2)$ are linearized in the final ILP model. 4. Inter-row connectivity: This is modeled as described in [5].

Experimental results. Table 3 presents results of solving CLIP$W$ for optimum-width 2-D layouts with the $0-1$ solver $O P B D P$ [2].

The optimal cell widths obtained by $C L I P-W$ are about $15 \%$ smaller than those produced by the commercial tool Virtuoso; Virtuoso's run times are in seconds in all cases. For circuits with as many as 24 transistors, $C L I P-W$ has run times that are in seconds for minimum-width 2-D layouts. Run times can be reduced-by up to three orders of magnitude-by using hierarchical methods such as the circuit clustering approach proposed in Section 6.

In subsequent sections, we propose a model that extends CLIP$W$ to minimize the 2-D cell height in addition to the width.

## 3 Height Minimization

The height of a cell is determined by its horizontal routing (track) density [9], that is, the number of tracks needed to complete the cell's routing. This, in turn, depends on factors such as the layout style, the usage of metal, polysilicon, and diffusion layers, and the use of jogs and vias. The layout assumptions underlying the height minimization problem are summarized in Table 1.

Track density can be determined from the horizontal span of each net. A 2-D cell is composed of several routing regions as illustrated by the two-row layout in Fig. 1 which has five regions: B (cell bottom), $\mathrm{C}_{1}$ ( $\mathrm{P} / \mathrm{N}$ channel of the first row), $\mathrm{R}_{1,2}$ (inter-row channel between the $\mathrm{P} / \mathrm{N}$ rows $), \mathrm{C}_{2}(\mathrm{P} / \mathrm{N}$ channel of the second

| \# | Circuit | No. of trans. | No. of rows | Cell widthCLIP- $W^{b} \mid$ Virtuoso |  | $\begin{aligned} & \text { CPU time } \\ & \text { CLIP-W } \end{aligned}$ | $\begin{aligned} & \text { (secs) }^{\text {a }} \\ & \text { HCLIP } \end{aligned}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1. | Non-series-parallel bridge circuit [16] | 10 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | 6 5 4 4 | 6 5 5 5 | $\begin{aligned} & 0.03 \\ & 0.09 \\ & 0.07 \\ & 0.19 \\ & \hline \end{aligned}$ | $\begin{aligned} & 0.03 \\ & 0.09 \\ & 0.07 \\ & 0.19 \\ & \hline \end{aligned}$ |
| 2. | 2-to-1 multiplexer | 14 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 8 \\ & 4 \\ & 3 \\ & 3 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 8 \\ & 5 \\ & 3 \\ & 3 \\ & \hline \end{aligned}$ | $\begin{aligned} & 0.06 \\ & 0.25 \\ & 0.06 \\ & 0.25 \\ & \hline \end{aligned}$ | $\begin{aligned} & 0.04 \\ & 0.06 \\ & 0.04 \\ & 0.09 \\ & \hline \end{aligned}$ |
| 3. | Majority function $z=a \cdot b+b \cdot c+a \cdot c$ | 18 | $\begin{aligned} & \hline 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} 10 \\ 5 \\ 4 \\ 4 \\ \hline \end{gathered}$ | $\begin{gathered} 10 \\ 6 \\ 5 \\ 5 \\ \hline \end{gathered}$ | $\begin{gathered} 0.2 \\ 0.2 \\ 12 \\ 8 \\ \hline \end{gathered}$ | $\begin{aligned} & 0.05 \\ & 0.02 \\ & 0.07 \\ & 1 \\ & \hline \end{aligned}$ |
| 4. | $\begin{array}{\|c\|} \hline \text { Series-parallel } \\ \text { circuit for } z=\text { a.b.c.d } \\ + \text { e.f.g. } h+(i+j) \cdot(k+ \\ \eta))^{\prime}[9] \\ \hline \end{array}$ | 24 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} 13 \\ 9 \\ 5 \\ 5(6) \\ \hline \end{gathered}$ | $\begin{gathered} 14 \\ 10 \\ 6 \\ 6 \\ \hline \end{gathered}$ | $\begin{array}{r} \hline 0.3 \\ 11 \\ 10 \\ 390 \\ \hline \end{array}$ | $\begin{aligned} & 0.1 \\ & 0.7 \\ & 0.1 \\ & 0.1 \\ & \hline \end{aligned}$ |
| 5. | 8-input NAND circuit | 24 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} 14 \\ 7 \\ 7 \\ 5 \\ 4 \\ \hline \end{gathered}$ | $\begin{gathered} 15 \\ 8 \\ 6 \\ 4 \\ \hline \end{gathered}$ | $\begin{gathered} 19 \\ 9 \\ 43 \\ 58 \\ \hline \end{gathered}$ | $\begin{gathered} 1 \\ 5 \\ 0.1 \\ 0.7 \\ \hline \end{gathered}$ |
| 6. | Series-parallel circuit for $z=(a b c d+$ efgh $+(i+j)(k+h)(m$ $+n)(o+p))^{\prime}[9]$ | 32 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \end{aligned}$ | 18 $9(10)$ 8 6 | $\begin{gathered} 19 \\ 10 \\ 10 \\ 9 \\ \hline \end{gathered}$ | $\begin{gathered} 79 \\ 2 \\ 2446 \\ 5216 \end{gathered}$ | $\begin{aligned} & \hline 4 \\ & 1 \\ & 1 \\ & 1 \end{aligned}$ |

a. Run times are with $O P B D P$ using its -h 103 variable selection heuristic. b. Numbers in brackets refer to the cell width with HCLIP, when different from the optimum value for the original circuit.

Table 3: Minimum width 2-D placements with run times
row), and T (cell top). We assume that all nets in the P and N diffusions of the same $\mathrm{P} / \mathrm{N}$ row are routed in its $\mathrm{P} / \mathrm{N}$ channel. Thus, a net that appears on both sides of an inter-row channel is routed to connect just one of its terminals on each side.
In order to determine the track density in each region, the cell layout is considered to be made of vertical columns, where each column represents a transistor terminal. Then, the number of tracks $T_{R}$ required in a routing region $R$ can be expressed as follows:
$T_{R}=\boldsymbol{m a x}$ \{number of nets that span column $\left.c: c=1,2,3, \ldots\right\}$
Assuming equal transistor sizes, we define the total height $H_{\text {cell }}$ of the cell as follows:

$$
H_{\text {cell }}=\sum_{R \in \text { regions }} \text { No. of tracks in routing region } R=\sum_{R \in \text { regions }}^{\sum} T_{R}
$$

$T_{R}$ depends on the nets that occur in each vertical column which, in turn, depends on the position and orientation of each transistor. Thus, to compute $T_{R}$, we must determine the nets that must be routed horizontally in each column. This has traditionally been called the channel routing problem [13]. While channel routing considers both horizontal and vertical constraints, cell synthesis methods have generally ignored vertical constraints [11, 9].
The fundamental problem is to determine whether a net $n$ requires a track in a column $c$. We define $0-1$ variables net such that $n e t[n, c, r]=1$ if net $n$ exists on a terminal in column $c$ of row $r$. We define $0-1$ variables span such that $\operatorname{span}[n, c, r]=1$ if net $n$ requires a track in column $c$ of row $r$. Then, the total number of nets that span column $c$ of row $r=\Sigma \operatorname{span}[n, c, r], n \in$ nets. We use a dynamic programming approach to define the following two conditions under which $\operatorname{span}[n, c, r]=1$ :

- If $\operatorname{net}[n, c, r]=0$, then $\left(\operatorname{span}[n, c-1, r]=1\right.$ and $\operatorname{net}\left[n, c_{2}, r\right]=1$ for some $c_{2}>c$ )
- If $\operatorname{net}[n, c, r]=1$, then $\left(\operatorname{span}[n, c-1, r]=1\right.$ or $\operatorname{net}\left[n, c_{2}, r\right]=1$ for some $c_{2}>c$ )
If $x=\operatorname{net}[n, c, r], y=\operatorname{span}[n, c-1, r]$, and $z=$ or $\left\{n e t\left[n, c_{2}, r\right]: c_{2}\right.$ $>c\}, \operatorname{span}[n, c, r]$ can be defined by the logic equation below:
$\operatorname{span}[n, c, r]=\bar{x} \cdot(y \cdot z)+x \cdot(y+z)=$ majority $(x, y, z)$
The majority-function constraint is equivalent to the following pair of linear inequalities:

$$
\begin{gathered}
\operatorname{span}[n, c, r] \geq(x+y+z-1) / 2 \\
\operatorname{span}[n, c, r] \leq(x+y+z) / 2
\end{gathered}
$$

Based on the above algorithm, we now present an ILP model CLIP-WH that minimizes both the cell width and height.


Fig. 2 : Routing track requirements with and without diffusion gaps

## 4 Width and Height Minimization

The 2-D cell width and height minimization problem is defined as follows: Place the transistors in a given number of rows such that the layout has a minimum height among all layouts with minimum width. The CLIP method proceeds in two stages:

1. Use $C L I P-W$ to find $W_{\text {cell }}$, the minimum 2-D cell width.
2. Use CLIP-WH to find a 2-D layout of minimum height $H_{\text {cell }}$ from among all layouts of minimum width $W_{\text {cell }}$.

CLIP-WH. We define one column for each of the three terminals-source, gate, and drain-of the transistors placed in a row. Figure 2 shows how the columns are numbered in a given row.

In addition to the parameters of CLIP-W and $W_{\text {cell }}$, we have maxCols, the total number of columns in each row. Let the set cols $=\{1, \ldots$, maxCols $\}$. We define arrays net $[$ nets, cols, rows $]$ and span[nets, cols, rows] of 0-1 variables as described in Section 3. The variables height $[$ rows $]$ and interRow [rows $]$ are used to represent the height of the $\mathrm{P} / \mathrm{N}$ and inter-row channels, respectively; since they are integers, they are represented using arrays of boolean variables.

The goal of CLIP-WH is to minimize the cell height $H_{\text {cell }}$, i.e.,

$$
\text { minimize } H_{\text {cell }}=\sum_{r \in P / N \text { channels }}^{\text {height }[r]}+\underset{r \in \text { inter-row channels }}{\sum} \text { interRow }[r]
$$

We now describe the constraints in CLIP-WH; these are in addition to the constraints of CLIP-W.

1. The constraint on the height height $[r]$ of the $\mathrm{P} / \mathrm{N}$ channel in row $r$, defined by equation (5), is linearized as follows:

$$
\text { height }[r] \geq \sum_{n \in \text { nets }} \operatorname{span}[n, c, r] \quad \text { for all } c \in \text { cols }
$$

2. Cell width: For each row $r$, we ensure that its width is $\leq W_{\text {cell }}$.

$$
W_{\text {cell }} \geq 2 \times \sum_{p \in \text { pairs }}^{\sum \operatorname{Xrow}}[p, r]-\sum_{s \in \text { slots }} \text { nogap }[s, r]+v_{r}-1
$$

3. Net presence: If $c$ represents a diffusion terminal ( $c=1,3,4,6$, $\ldots$..), net $[n, c, r]$ depends on the pair placed in the corresponding slot, and its orientation. For example, for $c=4, \operatorname{net}[n, 4, r]=1$ if there is a pair placed in slot $s=2$, and its orientation causes its diffusion terminal on net $n$ to appear on its left. Thus, if $c$ is a left diffusion column ( $c=1,4,7, \ldots$ ), net $[n, c, r]$ is given by:

$$
\begin{gathered}
\operatorname{net}[n, c, r] \geq X[p,(c+2) / 3, r] \text { and } \quad \text { for all } p \in \text { pairs } \\
(\operatorname{Nsrc}[p, n] \text { and }(\operatorname{Xor}[p, 1] \text { or } \operatorname{Xor}[p, 3]) \\
\\
\\
\text { or } \operatorname{Ndrn}[p, n] \text { and }(\operatorname{Xor}[p, 2] \text { or } \operatorname{Xor}[p, 4]) \\
\\
\\
\\
\\
\\
\text { or } \operatorname{Pr} \operatorname{Prc}[p, n, n] \text { and }(\operatorname{Xor}[p, n] \text { and }(\operatorname{Xor}[p, 3] \text { or } \operatorname{Xor} \operatorname{Xor}[p, 2]) \\
[p, 4]))
\end{gathered}
$$

Net $[n, c, r]$ is similarly defined for right diffusion columns $(c=3$, $6, \ldots)$. If $c$ is a gate column $(c=2,5, \ldots), n e t[n, c, r]$ is independent of the orientation of the pair placed in that slot, and is given by:

$$
\operatorname{net}[n, c, r]=\sum \text { if }(N g a t e[p, n] \text { or Pgate }[p, n]) \text { then } X[p,(c+1) / 3, r]
$$

$$
p \in \text { pairs }
$$

Since the above constraint is an equation, it can be directly incorporated into the definitions for span $[n, c, r]$, thereby eliminating the variables net $[n, c, r]$ for $c=2,5,8$, etc.
4. Net span: The constraint for span $[n, c, r]$ is given by equation (6). However, a few special cases, illustrated in Fig. 2, must be considered to accommodate the presence of diffusion gaps.

- Net $a$ requires a track in columns 1,2 , and 3 ; hence, $\operatorname{span}[a, 1$, $r]=\operatorname{span}[a, 2, \mathrm{r}]=\operatorname{span}[a, 3, r]=1$. Also, since columns 3 and 4 are connected by diffusion sharing, we set $\operatorname{span}[a, 4, r]=1$.


Fig. 3 : (a) 2-to-1 multiplexer and its layouts in (b) one and (c) two rows

- However, net $b$ appears only in columns 9 and 10 that share diffusion. Hence, $\operatorname{span}[b, 9, \mathrm{r}]=\operatorname{span}[b, 10, r]=0$.
- For net $c$ that appears in columns 6 and 7 separated by a diffusion gap, $\operatorname{span}[\mathrm{c}, 6, r]=\operatorname{span}[c, 7, r]=1$.
- For nets such as $d$ that appear on the same ( P or N ) diffusion of two pairs separated by a gap, $\operatorname{span}[d, 12, r]=\operatorname{span}[d, 13, r]=1$. Thus, the constraint for $\operatorname{span}[n, c, r]$ for $c=3,6,9$, etc. is split into two: while (7) considers all columns $c_{2} \geq c+2$, (8) considers column $c+1$ on its right and takes into account the absence of a gap, represented by nogap $[c / 3, r]$, between columns $c$ and $c+1$.

$$
\begin{array}{r}
\operatorname{span}[n, c, r]=\text { majority }(\operatorname{span}[n, c-1, r], \text { net }[n, c, r], \\
\text { or } \left.\left\{\operatorname{net}\left[n, c_{2}, r\right]: c_{2} \geq c+2\right\}\right) \\
\operatorname{span}[n, c, r]=\text { majority }(\operatorname{span}[n, c-1, r], \text { net }[n, c, r],  \tag{8}\\
(\text { net }[n, c+1, r]-\operatorname{nogap}[c / 3, r]))
\end{array}
$$

In a dual CMOS circuit, since each pair has a common gate net, the track density of a gate column $c$ is no greater than the maximum of the track densities of its adjoining diffusion columns $c-1$ and $c+1$. Hence, the variables span $[n, c, r]$ are eliminated for gate columns, and constraints $(7,8)$ are suitably modified to consider span $[n, c-2, r]$ instead of span $[n, c-1, r]$.
5. Inter-row channel routing: As discussed earlier, the routing problem in an inter-row channel $r$ is defined as follows: For each net that appears on both sides of $r$, select one terminal to be connected from each side such that the overall track density in $r$ is minimized. CLIP-WH assumes that each net that appears on both sides of an inter-row channel requires a separate routing track. Hence, the density of an inter-row channel $r$ is equal to the total number of nets on both sides of $r$. For example, the two-row layout in Fig. $3 b$ requires two nets to be routed in its inter-row channel. Although both nets can be routed in one track, CLIP-WH assigns two tracks, one for each net. Thus, we have

$$
\text { interRow }[r]=\sum_{n \in \text { nets }}(n \text { on top diffusion and } n \text { on bottom diffusion })
$$

Both the terms in the above equation are available as variables that model inter-row connectivity, and can be used directly.

## 5 Experimental Results

We have applied CLIP-WH to the circuits of Table 3. The cell height $H_{\text {cell }}$ obtained and the run times of CLIP-WH using $O P B D P$ are given in Table 4. In most cases, the increase in $H_{\text {cell }}$ is much less than the decrease in $W_{\text {cell }}$ when the layout changes from one to two rows. This can translate into significant area savings. These savings are less pronounced for three and four-row layouts since both $W_{\text {cell }}$ and the number of tracks are seen to change very little while the number of transistor rows increases. As an example, Fig. $3 a$ shows a 2 -to- 1 multiplexer with its seven $\mathrm{P} / \mathrm{N}$ pairs highlighted. Its optimal CLIP-WH layouts in one and two rows are shown in Figs. $3 a$ and $b$, respectively.
CLIP-WH's overall run times for optimum layouts are in seconds for medium-sized circuits with up to 16 transistors. Moreover, an optimal solution is found in a relatively short time; the remaining time is utilized in verifying optimality. Hence, the ILP solver may be prematurely terminated to yield near-optimal, or
possibly even optimal, solutions in practical time. For larger circuits, we propose a practical hierarchical method HCLIP in the next section that extends our technique to circuits with over 30 transistors while yielding layouts that are at or near the optimum.

## 6 Circuit Clustering

An and-stack [5] of size $n$ is a group of $n \geq 2$ transistors connected in series. The circuit in Fig. $3 a$ has three pairs with and-stacks: ( $p_{1}$, $\left.p_{7}\right),\left(p_{3}, p_{4}\right)$, and ( $p_{5}, p_{6}$ ). Since the nets that connect two seriesconnected transistors (internal nets) do not connect to any other terminal, they do not require straps when these transistors are placed using diffusion abutment. This allows the transistors to be placed closer, which reduces area and enhances performance. Hence, most designs lay out and-stacks as single contiguous units.

HCLIP. We now outline HCLIP (Hierarchical CLIP), an extension of CLIP that efficiently implements and-stacks of arbitrary size. For each stack $S$ with pairs $p_{i}-p_{i+1}-\ldots-p_{j}$, it introduces constraints on the relative placement and diffusion sharing of its constituent pairs.

Let stacks be the set of and-stacks. Let Ssize[S] specify the number of pairs in stack $S$ with $\operatorname{Spairs}[S, S$ size $[S]]$ containing its list of pairs, ordered by their connectivity in $S$. We define 0-1 variables Srow[stacks, rows] where Srow[S, r] = 1 if stack $S$ is placed in row $r$. Further, we define variables $S d i r[s t a c k s]$ where $\operatorname{Sdir}[S]=0$ if stack $S$ is placed unflipped ( $p_{i}-p_{i+1}-\ldots-p_{j}$ ), and 1 if flipped ( $p_{j}-p_{j-1}-\ldots-p_{i}$ ). The constraints described next are in addition to those in CLIP-WH.

1. Stack placement: Each stack must be placed in exactly one row.

$$
\sum_{r \in \text { rows }}^{\operatorname{Srow}}[S, r]=1
$$

for all $S \in$ stacks
Also, all pairs of a stack $S$ must be placed in the same row as $S$.

$$
\operatorname{Ssize}[S] \times \operatorname{Srow}[S, r]=\underset{i \in 1 . . S s i z e[S]}{\sum \operatorname{Xrow}[\operatorname{Spairs}[S, i], r]}
$$

2. Stack pair placement: Adjacent pairs of a stack $S$ must be placed in contiguous slots. If $\operatorname{Sdir}[S]=0$, then the slot values of pairs $p_{i+1}$ and $p_{i}$ must differ by 1 ; if $\operatorname{Sdir}[S]=1$, then this difference is -1 .

$$
\begin{array}{r}
\sum_{s \in \text { slots } r} \sum_{\text {rows }}{ }_{\text {row }} \times X[\text { Spairs }[S, i+1], r] \quad \text { for all } i \in \operatorname{Ssize}[S \\
\quad-\sum_{s \in \text { slots }} r \in \sum_{\text {rows }} s \times X[\text { Spairs }[S, i], r]=1-2 \times \operatorname{Sdir}[S]
\end{array}
$$

3. Stack diffusion sharing: Adjacent pairs of $S$ must share their diffusions. While $\operatorname{merged}\left[p_{i}, p_{i+1}\right]=\operatorname{merged}\left[p_{i+1}, p_{i+2}\right]=\ldots=$ $\operatorname{merged}\left[p_{j-1}, p_{j}\right]=1$ in the unflipped orientation, the flipped orientation must have $\operatorname{merged}\left[p_{j}, p_{j-1}\right]=\operatorname{merged}\left[p_{j-1}, p_{j-2}\right]=\ldots=$ $\operatorname{merged}\left[p_{i+1}, p_{i}\right]=1$.
$\Sigma$ merged $[\operatorname{Spairs}[S, i], \operatorname{Spairs}[S, i+1]]=(S \operatorname{Size}[S]-1) \times(1-\operatorname{Sdir}[S])$ $i \in 1$..Ssize[S]-1
$\Sigma$ merged $[$ Spairs $[S, i]$, Spairs $[S, i-1]]=(S \operatorname{siz} e[S]-1) \times \operatorname{Sdir}[S]$ $i \in \operatorname{Ssize}[S] . .2$
Constraints $(3,4)$ for $\operatorname{merged}\left[p_{i}, p_{j}\right]$, that permit a pair to be merged with at most one pair on its left and right sides, implicitly ensure that all pairs of a stack are merged in the same direction.

Experimental results. Table 4 presents the values of $W_{\text {cell }}$, and $H_{\text {cell }}$ and the associated run times obtained with HCLIP. Where possible, we have compared these values with the corresponding optimum values for the non-hierarchical layout. For the circuits presented, $W_{\text {cell }}$ with and-stacking is optimum in all but one case. In addition, the number of tracks ( $H_{\text {cell }}$ ) required with and-stacking is optimum in over $80 \%$ of cases. Also, cell heights of HCLIP are $25 \%$ smaller on the average than those obtained using Virtuoso.

The run times of HCLIP are up to three orders of magnitude better than CLIP's. Also, the first optimum solution is found in just a few seconds. Thus, and-stacking can extend the ILP-based technique to larger circuits while still yielding layouts whose widths and heights are at or near the optimum.

| Cct. | No.of trans | No. of rows | Cell layout |  |  |  | CPU time (secs) ${ }^{\text {a }}$ |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  |  | CLIP ${ }^{\text {b }}$ |  | Virtuoso |  | CLIP-WH |  | HCLIP |  |
|  |  |  | Opt. <br> $W_{\text {cell }}$ | Opt. <br> $H_{\text {cell }}$ | $W_{\text {cell }}$ | $\mathrm{H}_{\text {cell }}$ | First <br> opt. sol. | $\begin{gathered} \text { Final } \\ \text { sol. } \end{gathered}$ | $\begin{array}{\|c\|} \hline \text { First } \\ \text { Opt. sol. } \\ \hline \end{array}$ | Final sol. |
| 1. | 10 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 6 \\ & \hline 5 \\ & 4 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{aligned} & 4 \\ & 4 \\ & 3 \\ & 3 \\ & 3 \\ & \hline \end{aligned}$ | $\begin{aligned} & 6 \\ & \hline 6 \\ & 5 \\ & 5 \\ & 5 \\ & \hline \end{aligned}$ | $\begin{aligned} & 4 \\ & 3 \\ & 4 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} \hline 0.1 \\ 2 \\ 4 \\ 2 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 0.5 \\ 2.5 \\ 5 \\ 2 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 0.1 \\ 2 \\ 4 \\ 2 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 0.5 \\ 3 \\ 5 \\ 2 \\ \hline \end{gathered}$ |
| 2. | 14 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 8 \\ & 4 \\ & 3 \\ & 3 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 3 \\ & 4 \\ & 5 \\ & 5 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 8 \\ & 5 \\ & 3 \\ & 3 \\ & \hline \end{aligned}$ | $\begin{aligned} & \hline 3 \\ & 4 \\ & 7 \\ & 5 \\ & \hline \end{aligned}$ | $\begin{gathered} 0.3 \\ 0.5 \\ 1 \\ 5 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 1 \\ 2 \\ 8 \\ 8 \\ 17 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 0.05 \\ 0.5 \\ 0.3 \\ 4 \\ \hline \end{gathered}$ | $\begin{gathered} 0.2 \\ 0.6 \\ 1 \\ 7 \\ \hline \end{gathered}$ |
| 3. | 18 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} \hline 10 \\ 5 \\ 4 \\ 4 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 5 \\ (7) \\ 6 \\ 6 \\ (7) \\ \hline \end{gathered}$ | $\begin{gathered} \hline 10 \\ 6 \\ 5 \\ 5 \\ \hline \end{gathered}$ | $\begin{aligned} & \hline 6 \\ & 7 \\ & 7 \\ & 8 \\ & \hline \end{aligned}$ | $\begin{gathered} 0.8 \\ \stackrel{\star}{237} \end{gathered}$ | $\begin{gathered} 5 \\ * \\ 3673 \end{gathered}$ | $\begin{aligned} & 0.2 \\ & 0.3 \\ & 0.6 \\ & 0.3 \\ & \hline \end{aligned}$ | $\begin{gathered} 0.5 \\ 1 \\ 2 \\ 15 \\ \hline \end{gathered}$ |
| 4. | 24 | $\begin{aligned} & 1 \\ & 2 \\ & 3 \\ & 4 \end{aligned}$ | $\begin{gathered} 13 \\ 9 \\ 5 \\ 5 \end{gathered}$ | $\begin{array}{r} \hline 3 \\ (5) \\ (7) \\ (8) \\ \hline \end{array}$ | $\begin{aligned} & 14 \\ & 10 \\ & 6 \\ & 6 \end{aligned}$ | $\begin{aligned} & \hline 3 \\ & 7 \\ & 8 \\ & 9 \end{aligned}$ | $\stackrel{2}{*}$ | 73 $*$ $*$ | $\begin{gathered} 0.2 \\ 9 \\ 1 \\ 3 \end{gathered}$ | $\begin{gathered} 1.7 \\ 16 \\ 4 \\ 55 \end{gathered}$ |
| 5. | 24 | $\begin{aligned} & \hline 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | $\begin{gathered} \hline 14 \\ 7 \\ 5 \\ 4 \\ \hline \end{gathered}$ | 2 <br> $(4)$ <br> $(4)$ <br> $(5)$ | 15 8 7 4 4 | $\begin{aligned} & \hline 3 \\ & 6 \\ & 6 \\ & 7 \\ & \hline \end{aligned}$ | $340$ | $$ | $\begin{gathered} \hline 1 \\ 15 \\ 36 \\ 2 \\ \hline \end{gathered}$ | $\begin{array}{r} \hline 9 \\ 53 \\ 59 \\ 31 \\ \hline \end{array}$ |
| 6. | 32 | $\begin{aligned} & \hline 1 \\ & 2 \\ & 3 \\ & 4 \\ & \hline \end{aligned}$ | 18 <br> $9(10)$ <br> 8 <br> 8 | $\begin{aligned} & \hline(4) \\ & (6) \\ & (8) \\ & (8) \\ & \hline \end{aligned}$ | $\begin{gathered} 19 \\ 10 \\ 10 \\ 9 \\ \hline \end{gathered}$ | $\begin{gathered} \hline 4 \\ 7 \\ 11 \\ 14 \\ \hline \end{gathered}$ | * | * | $\begin{gathered} \hline 200 \\ 1 \\ 809 \\ 65 \\ \hline \end{gathered}$ | $\begin{aligned} & \hline 695 \\ & 10 \\ & 930 \\ & 410 \\ & \hline \end{aligned}$ |

a. Run times are with OPBDP using its -h3 variable selection heuristic. An asterisk implies that OPBDP did not terminate after 5,000 seconds.
b. Numbers in brackets are the width or height obtained with HCLIP, when different from the optimum value, if known, for the original circuit.
Table 4: Minimum width and height 2-D layouts with run times

## 7 Conclusions

We have presented a novel technique CLIP for simultaneous height and width minimization of 2-D cell layout. It combines diffusion sharing, inter-row connectivity, and routing density in a common problem space that can be efficiently searched for optimal solutions using branch-and-bound methods such as those of ILP solvers. When used with and-stack clustering, it generates optimal or near-optimal layouts for practical-sized circuits in seconds.

## 8 References

[1] D. G. Baltus and J. Allen, "SOLO: A Generator of Efficient Layouts From Optimized MOS Circuit Schematics," Proc. 25th Design Automation Conf., pp. 445-452, June 1988
[2] P. Barth, Logic Based 0-1 Constraint Programming, Kluwer, Boston, 1995.
[3] Cadence Design Systems, Virtuoso Layout Synthesizer, 1992-94.
[4] A. Gupta, S-C. The, and J. P. Hayes, "XPRESS: A Cell Layout Generator with Integrated Transistor Folding," Proc. European Design \& Test Conf., pp. 393-400, March 1996.
5] A. Gupta and J. P. Hayes, "Width Minimization of Two-Dimensional CMOS Cells Using Integer Programming," Proc. Int'l Conf. on CAD, pp. 660-667, Nov. 1996.
6] A. Gupta and J. P. Hayes, "A Hierarchical Technique for Minimum-Width Layout of Two-Dimensional CMOS Cells," Proc. Int'l Conf. on VLSI Design, pp. 15-20, Jan. 1997.
[7] D. V. Heinbuch, CMOS3 Cell Library, Addison-Wesley, Reading, Mass., 1988.
[8] Y-C Hsieh, C-Y Hwang, Y-L Lin, and Y-C Hsu, "LiB: A CMOS Cell Compiler," IEEE Trans. on CAD, Vol. 10, pp. 994-1005, Aug. 1991
[9] R. L. Maziasz and J. P. Hayes, Layout Minimization of CMOS Cells, Kluwer, Boston, 1992.
[10] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley, New York, 1988.
[11] C.L. Ong, J.T. Li, and C.Y. Lo, "GENAC: An Automatic Cell Synthesis Tool," Proc. 26th Design Automation Conf., pp. 239-244, June 1989.
[12] C.J. Poirier, "Excellerator: Custom CMOS Leaf Cell Layout Generator," IEEE Trans. on CAD, Vol. 8, pp. 744-755, July 1989.
[13] R. L. Rivest and C. M. Fiduccia, "A ‘Greedy' Channel Router," Proc. 19th Design Automation Conf., pp. 120-125, June 1991.
[14] K. Tani, et al., "Two-Dimensional Layout Synthesis for Large-Scale CMOS Circuits," Proc. Int'l Conf. on CAD, pp. 490-493, Nov. 1991.
[15] T. Uehara and W.M. VanCleemput, "Optimal Layout of CMOS Functional Arrays," IEEE Trans. on Computers, vol. C-30, pp. 305-312, May 1981.
[16] H. Zhang and K. Asada, "An Improved Algorithm of Transistors Pairing for Compact Layout of Non-Series-Parallel CMOS Networks," Proc. Custom Integrated Circuits Conf., pp. 17.2.1-17.2.4, 1993.

