# A Hierarchical Analysis Methodology for Chip-Level Power Delivery with Realizable Model Reduction

Yu-Min Lee

Electrical and Computer Engineering University of Wisconsin at Madison Madison, WI 53706 Tel: 1-608-265-3789 Fax: 1-608-232-5429 e-mail: vu-min@cae.wisc.edu

Abstract— In this paper, we propose a novel hierarchical analysis methodology to facilitate efficient chip-level power fluctuation analysis. With extreme efficiency and simplicity, our design methodology first builds time-varying multiport Norton equivalent circuits in a row-by-row or block-by-block based followed by global analysis on the integrated reduced models. After generating the Norton equivalent sources at external ports, we apply realizable model order reduction technologies to further reduce model. Since the elements of our reduced model are also RC devices, they are fully compatible with general circuit simulation engines. The experimental results demonstrate more than 4X speed up with the flat simulation while maintaining within 5% of accuracy.

#### I. INTRODUCTION

The relentless push for power reduction drives the supply voltage from 5 Volts down to 1 Volt level. It also has been shown that 5% of power fluctuation will induce 15% performance degradation. Excessive power fluctuation may also cause soft errors due to the timing variation. As a result, the robust power delivery is essential for both performance and signal integrity. However, the high quality power delivery requires extensive simulation and analysis. With more than millions of gates and wires, the turn around time for the full simulation is enormous. Furthermore, circuits are subject to frequent change during design process. As a result, efficient, accurate, and hierarchical power delivery analysis methodologies are of great importance.

To perform the hierarchical analysis of a given circuit is to partition the circuit into several small modules first. Then, those small blocks are analyzed separately. Finally, the results of the modules are combined back into the original circuit. By this way, we can speed up the simulation and reduce the turn around time during the circuit design. For example, to analyze a circuit with N elements, the run time of the general flat method is  $N^r$ , where  $r \ge 1$ . While the run time of the hierarchical method could be  $N^r \frac{K}{Kr}$ when the circuit can be properly partitioned into K small modules. Hence, we can speed up the simulation around  $K^{r-1}$  times. If one of the modules has been changed, the flat method will need to do the analysis from scratch. On the other hand, the hierarchical way only needs analyze the module which has been changed to fulfill this task. Charlie Chung-Ping Chen

Electrical and Computer Engineering University of Wisconsin at Madison Madison, WI 53706 Tel: 1-608-265-1145 Fax: 1-608-265-4623 e-mail: chen@engr.wisc.edu

Therefore, the hierarchical method can reduce the turn around time.

For the above reasons, several efficient or hierarchical power delivery analysis methods and algorithms have been proposed [1], [2]. In particular, [1] proposed a hierarchical analysis method and macromodeling sparsfication algorithm on linearized power delivery and gate currents models. [2] proposed a multi-grid like algorithm to significantly speed up the simulation.

In this paper, we propose to further combine realizable model order reduction techniques with multiport Norton equivalent sources to hierarchically generate compact and accurate macromodels. Given circuits with both gates and power delivery, using the Norton theorem, we systematically move the internal current sources to the ports of the circuit. Afterward, we apply realizable model reduction techniques to reduce the impedance model which is obtained from both the nonlinear and linear devices. In particular, we utilize a pattern-matching model reduction technique which iteratively searches for matched primitives and applies rule-based reduction to those primitives [3]. These reduction techniques are able to match up to the second moments and are not constrained by the number of ports of the reduction circuits. The experimental results demonstrate more than 4X speed up with the flat simulation while maintaining within 5% of accuracy.

The organization of the remainder of this paper is as follows. In the beginning of Section II, we present the basic frame of our hierarchical and realizable model reduction algorithm. In Section II.A we propose a general gate parasitics extraction algorithm. In Section II.B, we propose the primitives which can be matched up to the second order moments. At the end of Section II, we introduce multiple Norton equivalent theory and our local voltage drop analysis method. Finally, the experimental results are presented in Section III.

### II. HIERARCHICAL ANALYSIS ALGORITHM

Given a circuit with both gates and power delivery as shown in Figure 1, we first partition the given circuit into several small modules. Then we apply the Norton Theorem to the circuit at each block to generate the Norton



Fig. 1. Power delivery network

equivalent circuit. We perform a full circuit simulation row by row with both ends attached with ideal power supply voltage sources to find out the current waveform at each voltage source. If the circuit only consists of linear elements, then under Norton theorem, we can safely remove all the internal sources and move those sources to the ports without changing the external behaviors. Although our circuit contains nonlinear elements such as gates, the experimental results show that Norton theorem gives good match in our cases. We then calculate both the device parasitics such as diffusion and gate capacitance as well as the interconnect parasitics, and get the transfer function form both ends to each node of this module by RICE [4]. Later, we apply a realizable model reduction technique to reduce the impedance model which is obtained from both the nonlinear and linear devices. This model order reduction technique utilizes a pattern-matching model reduction technique which iteratively searches for matched primitives and applies rule-based reduction to those primitives [3]. This reduction technique is able to match up to 2nd moments and is not constrained by the number of ports of the reduction circuits.

After that, we form the equivalent circuits by attaching the equivalent current waveforms at each port to the reduced circuit, and combine all the reduced modules to form the integrated circuit. Finally, we simulate the integrated circuit to get the voltage waveform at the each port by using the HSPICE [5] simulator, and utilize the transfer functions which are obtained by RICE [4] and the superposition method to do the local voltage drop analysis at the arbitrarily nodes. The detail description of each procedure will be presented in the next four subsections.

Figure 2 illustrates the flowchart of the proposed algorithm. In the flowchart, an oval indicates a specific operation of our method and the result is shown following. The proposed hierarchical and realizable model order reduction algorithm are shown in Figure 3.

# A. Parasitics of Gates and Interconnects

We illustrate our analysis in terms of inverters. Given a row of inverters as Figure 4.A, a logic low signal is applied at the input of the first inverter. So the pmos of the first inverter is turned on and the nmos is off. As the second inverter in the row, the pmos is turned off and the nmos



Fig. 2. Flowchart of the hierarchical and realizable model reduction algorithm

| Algorithm<br>A1. Parti<br>modul<br>A2. For e                                | <b>n:</b> Hierarchical and Realizable Model Reduction<br>tion the given circuit into $K$ modules, where each<br>has $N$ ports<br>ach module, we generate multiport time-variant |  |  |
|-----------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Nort                                                                        | on equivalent reduced circuits as follows:                                                                                                                                      |  |  |
| A2.1                                                                        | Calculate the equivalent Norton current source at                                                                                                                               |  |  |
| 40.0                                                                        | each port using SPICE simulation                                                                                                                                                |  |  |
| A2.2                                                                        | Generate the gate parasitics and interconnects                                                                                                                                  |  |  |
|                                                                             | RC models and call RICE to get the transfer function                                                                                                                            |  |  |
| A2.3                                                                        | Construct the realizable order-reduced model of                                                                                                                                 |  |  |
|                                                                             | RC segments using realizable model order reduction                                                                                                                              |  |  |
|                                                                             | algorithm                                                                                                                                                                       |  |  |
| A2.4                                                                        | Form the Equivalent circuits by attaching the                                                                                                                                   |  |  |
|                                                                             | Norton equivalent sources at each port to the                                                                                                                                   |  |  |
|                                                                             | reduced circuit generated by the model reduction                                                                                                                                |  |  |
|                                                                             | algorithms in Step A2.2.                                                                                                                                                        |  |  |
| A3. Form the integrated circuit by combining all the                        |                                                                                                                                                                                 |  |  |
| reduced modules. Perform higher level of model order                        |                                                                                                                                                                                 |  |  |
| reduction when necessary.                                                   |                                                                                                                                                                                 |  |  |
| A4. Use the transfer function from A2.2, and the superposition $\mathbf{A}$ |                                                                                                                                                                                 |  |  |
| to do the local voltage drop analysis.                                      |                                                                                                                                                                                 |  |  |

Fig. 3. The hierarchical and realizable model order reduction algorithm.

is turned on. The input signal propagates through the line till it reaches the last one. The equivalent circuit is represented in Figure 4.B. Then, we get rid of the current sources by using the multiple Norton equivalent circuits and the equivalent capacitances of each gate are shown in Figure 4.C.

The gate capacitances can be simplified to Figure 4.D, where

$$C_0 = C_{gsp1} + C_{gdp1} + C_{gdn1} + C_{gdp2} + C_{gdn2} + C_{gsn2}$$





Fig. 5. L1-Reduction (A) Before reduction (B) After reduction

written as follow

$$R = R' \tag{1}$$

$$C_1 + C_2 = C' \tag{2}$$

$$RC_1 = M_{1L} \tag{3}$$

$$RC_2 = M_{1R}. (4)$$

The solutions for the above simultaneous equations are

$$R = R' \tag{5}$$

$$C_1 = \frac{M_{1L}}{R'} \tag{6}$$

$$C_2 = \frac{M_{1R}}{R'}.$$
 (7)

Fig. 4. Equivalent parasitic capacitance and resistance generation of a row of inverters: (A) Row of inverters, (B) Equivalent circuit, (C) Equivalent capacitance and resistance, (D) Simplified equivalent capacitance and resistance, (E) Norton equivalent circuit of the reduced circuit.

$$\begin{array}{lcl} C_i & = & C_{gsp_{2i+1}} + C_{gdp_{2i+1}} + C_{gdn_{2i+1}} + C_{gdp_{2i+2}} \\ & & + C_{gdn_{2i+2}} + C_{gsn_{2i+2}}, & i \in [1, n-1] \\ C_n & = & C_{gsp_{2n+1}} + C_{gdp_{2n+1}} + C_{gdn_{2n+1}} \end{array}$$

All the  $C_{gsp}$ ,  $C_{gdp}$ ,  $C_{gsn}$  and  $C_{gdn}$  can be obtained by the HSPICE simulator. Finally, we can use the L1- or L2reductions to reduce the circuit, and also get the Norton equivalent reduced circuit as shown in Figure 4.E.

#### B. RC-in RC-out model order reduction

Our algorithm uses the pattern matching model order reduction. We will introduce two primitives in our work.

#### **B.1** L1-Reduction

The L1-Reduction replaces two-port circuit such as Figure 5.A by a single  $\pi$  segment with resistance, R, and left and right capacitance,  $C_1$  and  $C_2$ , as illustrated in Figure 5.B. The new  $\pi$  segment will match total resistance, R', total capacitance, C', and the Elmore delay from left to right hand side  $M_{1R}$ , and the Elmore delay from the right to the left hand side  $M_{1L}$ . These condition can be

## B.2 L2-Reduction

The basic primitive of L2-reduction is a  $2-\pi$  reduction as shown in Figure 6. The replacement as indicated in Figure 6 is repeated till no further reduction is possible. This primitive is much more accurate than L1-Reduction since it matches two moments rather than one. It matches second order moments from both directions, total resistance, and total capacitance of the original circuit.

Given total resistance, R', total capacitance, C', Elmore delay from left to right,  $M_{1L}$ , and from right to left,  $M_{1R}$ , and the second order moments from left to right,  $M_{2L}$ , and from right to left,  $M_{2R}$ , L2-Reduction constructs a 2- $\pi$  circuit where all the above parameters are matched. Let  $C_1$ ,  $C_2$ ,  $C_3$ ,  $R_1$ ,  $R_2$  be values of the capacitance and resistance in Figure 6.B. The following equations need to be satisfied.

$$R_1 + R_2 = R' \quad (8)$$

$$C_1 + C_2 + C_3 = C' \quad (9)$$

$$R_2(C_1 + C_2) + R_2C_2 = M_{12} \quad (10)$$

$$R_1(C_2 + C_3) + R_2C_3 = M_{1R}$$
(11)

$$R_2(C_2R_2(C_2+C_1)+C_1M_{1L})+R_1C_1M_{1L}=M_{2L} (12)$$

$$R_1(C_2R_1(C_2+C_3)+C_3M_{1R})+R_2C_3M_{1R}=M_{2R}$$
 (13)

$$(M_{2L} - M_{1L}^2) = P \qquad (14)$$

$$(R'^2 M_{1R}C' - R' M_{1R}M_{1L} - M_{2R}R') = K \quad (15)$$



Fig. 6. L2 Reduction (A) Before reduction (B) After reduction

After rearranging the terms, and substituting for P and K, it results in

$$C_{1}^{4}(KR'^{2}) + C_{1}^{3}(-2M_{1L}R'K + PR'^{2}M_{1R}) + C_{1}^{2}(P^{2}R' - PR'M_{1R}M_{1L} - PC'R'^{2}M_{1R} + M_{2R}R'P + KC'M_{1L}^{2}) + C_{1}(-P^{2}R'C' - P^{2}M_{1L} + PM_{1R}M_{1L}R'C' - PM_{2R}M_{1L}) + P^{2}M_{1L}C' = 0$$

We can obtain  $C_1$  from the analytical solutions of the above equation. The value of  $R_1$ ,  $R_2$ ,  $C_2$ ,  $C_3$  are calculated by the following theorem.

Theorem 1: Let  $R_1^* = \frac{M_{2L} - M_{1L}^2}{C_1^*(R'C_1^* - M_{1L})}$ ,  $R_2^* = R' - R_1^*$ ,  $C_2^* = \frac{M_{1L} - R'C_1^*}{R_2^*}$ , and  $C_3^* = C' - C_1^* - C_2^*$ . We have  $C_1^*$ ,  $C_2^*, C_3^*, R_1^*$ , and  $R_2^*$  which satisfy the Equations (8)–(13).

#### C. Multiport Norton Equivalent Circuits

The basis of our macromodeling algorithm is the multiport version of Norton equivalent theory which will be presented in the following theorem. This theorem enables us to move the internal sources to the external ports. After removing all the sources to the ports, we can safely apply the model reduction algorithm to the internal passive circuits.

Theorem 2: Multi-port network of linear elements and power sources can be represented by an equivalent circuit. This equivalent circuit consists of multiple segments and each segment has a single ideal current in parallel with linear elements.

*Proof:* An arbitrary multi-port network, N, can be presented in the equivalent voltage-controlled form,  $N_v$ , as below

$$\mathbf{i} = \mathbf{Y}\mathbf{v}_{ext} + \mathbf{i}_s \tag{16}$$

where  $\mathbf{i} = \begin{bmatrix} i_1 & i_2 & \cdots & i_N \end{bmatrix}^T$  is the vector of port currents,  $\mathbf{v}_{ext} = \begin{bmatrix} v_1 & v_2 & \cdots & v_N \end{bmatrix}^T$  is the vector of external independent voltage sources,  $\mathbf{i}_s = \begin{bmatrix} i_{s1} & i_{s2} & \cdots & i_{sN} \end{bmatrix}^T$  is the vector of equivalent inside independent current sources, and  $\mathbf{Y}$  is the admittance matrix. Now we will introduce how to get the multiport representation of the general circuit. Let us label the vector of the inside independent voltage sources by  $\mathbf{v}_{inside} = \begin{bmatrix} v_{m1}(t) & v_{m2}(t) & \cdots & v_{m\alpha}(t) \end{bmatrix}^T$ , and the vector of the inside independent current sources by  $\mathbf{i}_{inside} = \begin{bmatrix} i_{m1}(t) & i_{m2}(t) & \cdots & i_{m\beta}(t) \end{bmatrix}^T$ . Since, by hypotheses, the linear resistive circuit obtained by replacing the external blocks with independent voltage sources,  $\mathbf{v}_{ext}$ , has a unique solution, it follows from the superposition theorem that  $i_k, k \in [1, N]$  can be assumed as the following form

$$i_k = \mathbf{Y}_k^T \mathbf{v}_{ext} + \mathbf{H}_k^T \mathbf{v}_{inside} + \mathbf{K}_k^T \mathbf{i}_{inside} \qquad (17)$$

where  $\mathbf{Y}_{k}^{T}$  is the  $k^{th}$  row of the admittance matrix,  $\mathbf{Y}$ ,  $\mathbf{H}_{k}$ , and  $\mathbf{K}_{k}$  are the transfer vectors from  $\mathbf{v}_{inside}$ ,  $\mathbf{i}_{inside}$ , to  $i_{k}$ , with respectively. Now if  $\mathbf{v}_{ext}$  is a zero vector for all t, and  $i_{sk}(t)$  is defined as  $i_{k}(t)|_{\mathbf{v}_{ext}=\mathbf{0}}$  which is the sum of the last two terms in Equation (17). Then, Equation (17) can be rewritten as

$$i_k = \mathbf{Y}_k^T \mathbf{i}_{ext} + i_{sk}(t) \tag{18}$$

Now since the multi-port network, N, is completely characterized by Equation (18), it is indistinguishable externally from  $N_v$ . Consequently N and  $N_v$  are equivalent.

### D. Local Voltage Drop Analysis by Superposition

The local voltage drop at an arbitrary point,  $O_j$ , can be represented as

$$v_j = \mathbf{W}_j^T \mathbf{v}_{inside} + \mathbf{Z}_j^T \mathbf{i}_{inside} + \mathbf{T}_j^T \mathbf{v}_{ext}$$
(19)

where  $\mathbf{W}_j$ ,  $\mathbf{Z}_j$ , and  $\mathbf{T}_j$  are the transfer vectors from  $\mathbf{v}_{inside}$ ,  $\mathbf{i}_{inside}$ , and  $\mathbf{v}_{ext}$  to  $v_j$ , respectively. The sum of the first two terms is the internal fluctuation which is caused by the internal sources of the local circuit. The last term is the external fluctuation which is caused by the external sources.

We use the superposition method to do the local voltage drops analysis. First, by using the HSPICE [5] simulator as **A2.2** in Figure 3, we can get the internal fluctuation of each nodal voltage. After that, the voltage waveform of each port can be measured by using the HSPICE [5] simulator after integrating the circuit as shown in Figure 2. Then, by combining those voltage waveforms of the output ports and the transfer functions which are got by the RICE [4], the external fluctuation of each nodal voltage can be calculated. Finally, the local voltage drop at each node is equal to the summation of the internal fluctuation and external fluctuation.

#### III. EXPERIMENTAL RESULTS

In this section, the reduced models generated by our method are shown to be accurate and efficient. We will show in the following that the runtime is further significantly reduced while the same accuracy is preserved. The power grid model we use is a mesh structured network.



Fig. 7. Waveforms of the reduced and original models



Fig. 8. Waveforms of the reduced and original models

The test cases are meshes consisting of lumped RC segments with switching devices which are attached inside the meshes. R is equal to  $0.2\Omega$ , and C is equal to 0.024 fF for each lumped RC segment. The voltage waveforms obtained from simulating the hierarchical order-reduced model are compared with those from the original model. The result is shown in Figure 7.

Other test cases we use are meshes consisting of lumped RC segments with random current sources, which model the switching devices, attached inside the meshes. In Figure 8, we can see the waveforms of the proposed hierarchical order-reduced model, and the original model are virtually identical. And the error is shown in Figure 9.

From the above results, we observe that the hierarchical order-reduced model can precisely approximate the voltage drop of the original model.

Our experiment also shows that the runtime has been significantly reduced. Table I gives the runtime information of our reduced model vs. the original model on several test cases. The speed up increases as the size of test circuit increases. This tendency is shown in Figure 10.

### IV. ACKNOWLEDGMENTS

This work is partially supported by NSF Grant CCR-0093309.



Fig. 9. Error  $(\mu V)$  at the same node of the reduced and original models

#### TABLE I

RUNTIME OF REDUCED MODELS VS. ORIGINAL MODELS

| Circuit Size | Reduced (s) | Original (s) | Speedup (X) |
|--------------|-------------|--------------|-------------|
| 100          | 1.96        | 0.50         | 0.25        |
| 400          | 7.33        | 6.13         | 0.83        |
| 900          | 16.95       | 22.53        | 1.33        |
| 1600         | 39.84       | 62.69        | 1.57        |
| 3600         | 90.38       | 197.11       | 2.18        |
| 6400         | 187.24      | 536          | 2.86        |
| 10000        | 369         | 1574         | 4.26        |



Fig. 10. Runtime of the reduced models vs. original models

#### References

- Min Zhao, Rajendran V. Panda, Sachin S. Sapatnekar, Tim Edwardsand, Rajat Chaudhry, and David Blaauw, Hierarchical Analysis of Power Distribution Networks, *DAC*, pp. 150–155, 2000.
- [2] Joseph N. Kozhaya, Sani R. Nassif, and Farid N. Najm, A Multigrid-like Techniques for Power Grid Analysis, *ICCAD*, 2001.
- [3] Pradeepsunder Ganesh, and Charlie Chung-Ping Chen, RC-in RC-out Model Order Reduction Accurate Up to Second Order Moments, International Conference on Computer Design, pp. 505–506, 2001.
- [4] L.T. Pillage, R.A. Rohrer, C. Visweswariah, *Electronic Circuit* and System Simulation Methods, McGRAW-HILL Book Co.
- [5] L. W. Nagel, SPICE2, A Computer Program to Simulate Semiconductor Circuits, Technical Report ERL-M520, UC-Berkeley, May 1975.