# **UC Santa Cruz** # **UC Santa Cruz Previously Published Works** ## **Title** Redundant C4 Power Pin Placement to Ensure Robust Power Grid Delivery ## **Permalink** https://escholarship.org/uc/item/81d2471h ## **Authors** Guthaus, Matthew Logan, Sheldon ## **Publication Date** 2013-08-01 Peer reviewed # Redundant C4 Power Pin Placement to Ensure Robust Power Grid Delivery Sheldon Logan, Matthew R. Guthaus Department of CE, University of California Santa Cruz, Santa Cruz, CA 95064 {slogan,mrg}@soe.ucsc.edu Abstract—Power supply C4 (flip-chip) bonds are susceptible to failures due to electro-migration caused by high on-chip temperatures, large currents and manufacturing variability. A single C4 bond failure can result in catastrophic failures in the power supply network and thus redundant bonds are naively added to the circuit to mitigate the impact of bond failures. We propose a method for improved redundant bond placement using an Integer Linear Program (ILP) that reduces the required number of redundant bonds by 33% while ensuring power supply integrity in the presence of a single-bond failure. #### I. INTRODUCTION Robust power delivery is an increasingly important design task in modern integrated circuits (ICs) due to increasing power and power density and large on-chip temperatures. The power increases complicates power supply network (PSN) design as it increases the difficulty of meeting static IR (voltage) drop, transient $(L\frac{di}{dt})$ voltage droop and electro-migration constraints. The large on-chip temperatures further exaceberate the rate of electromigration in PSN bonds and wires which increases the difficulty of meeting the electromigration constraints In most modern ICs, power is supplied to the PSN by the use C4 bonds. However, these C4 bonds are becoming more prone to yield issues due to the increase in bond (pin) count and the shift to no-flow under-fill processes [?]. Specifically, no-flow under-fill processes are susceptible to void formation, non-wetting of solders and chip floating [?]. These problems only exacerbate failures in PSNs and can result in expensive product recalls [?]. Power supply integrity must be addressed during the different phases of PSN design: wire sizing [?], decoupling capacitance insertion [?] and power pad/pin/C4 placement [?], [?]. All previous works on power pin placement have focused on creating an initial pin placement that minimizes the total number of power pins while ensuring power supply integrity including static voltage violations [?], [?] and mean-time-to-failure (MTTF) of power pins [?]. The power pin placements created from the aforementioned algorithms, however, are not robust to single-pin failures or defects. The formation of a defect in a C4 pin increases the resistance which further increases the temperature and ultimately leads to failure. The failure of a single pin can then cascade to nearby pins by increasing their current supply and reducing their MTTF leading to more pin failures and potentially voltage violations in the PSN. While most chips use *ad hoc* naive pin redundancy, our work is the first to systematically augment power supply pin placements with redundant pins so that the pin placement is single-pin redundant. If any one pin fails due to a manufacturing defect or electromigration, the design will still meet both the electro-migration and static/dynamic voltage constraints. Our methodology uses thermal modeling and power grid simulation to determine pins that are likely to fail and then finds the placement of the smallest set of pins to protect against failure. Specifically, our contributions are: - the first consideration of C4 pin manufacturing defects in PSNs, - a methodology for determining which pins are critical to a PSN, - a methodology for characterizing the effectiveness of adding a redundant pin to a PSN, - an Integer Linear Program (ILP) formulation to generate the placement of redundant pins to prevent catastrophic electromigration and static/dynamic voltage drop due to single power pin failure. Our work proceeds as follows: Section II contains background information on PSNs and electro-migration. Section III introduces the different problems that arise from single-pin failures. Section IV introduces various concepts regarding redundant pins and their placement. The redundant power pin placement algorithm is detailed in Section V. Section VI provides the experimental setup and then the results obtained from our method are presented in Section VII. Finally, Section VIII concludes our results. #### II. POWER SUPPLY NETWORKS A typical PSN contains power pins (both VDD and GND), wires and decoupling capacitors (decaps). The PSN is modeled using resistors, capacitors, inductors, voltage sources and current sources. The transistors are modeled as current sources with a small amount of capacitance, the wires are modeled as resistors and the power pins are modeled as voltage sources connected to the PSN with some resistance and inductance. In this paper, we assume flipchip packaging which allows power pins to be located anywhere throughout the die of the chip. The voltages for a PSN are calculated using Modified Nodal Analysis (MNA): $$G \cdot v(t) + C \cdot \dot{v}(t) = i(t) \tag{1}$$ where G is the conductance matrix, C is the admittance matrix (inductance and capacitance elements), v(t) represents the time varying nodal voltages and i(t) represents the vector of current sources corresponding to the transistors in the design. The backward Euler method is used to solve the transient system. Figure 1(a) shows the simulation result of a functional power supply. Electro-migration has steadily become a concern in the design of power supply networks [?] due to the significant increases in the power demanded by modern ICs. The Mean Time to Failure (MTTF) of a power pin due to electro-migration is calculated using Black's Equation: $$MTTF = A \frac{1}{j^n} \exp\left(\frac{Q}{kT}\right)$$ (2) where A is a constant based on the cross-sectional area of the pin, j is the current density measuring in $(\frac{A}{cm^2})$ , Q is the activation energy, n is a model parameter, k is the Boltzmann's constant, and T is the power pin temperature. High temperatures exponentially affect the MTTF of pins and consequently overall reliability, consequently pins located in temperature hot spots should carry less current in order to extend their MTTF. - (a) Voltage map corresponding to all functional pins. Dimensions in $\mu$ m. - (b) Static voltage violations and MTTF failures (hollow pin) caused by a single-pin defect (white cross). Dimensions in $\mu$ m. - (c) Redundant pin (square) removes static voltage violations and electro-migration failures caused by single-pin defect. Dimensions in $\mu$ m. Fig. 1. Example from ibmpg2 benchmark showing voltage violations and electro-migration failures caused by a single-pin defect. Pins are represented as large circles and node voltages represented as small circles. Only a small fraction of the entire benchmark is shown for image clarity with dimensions in $\mu$ m. #### III. FAILURE CHARACTERIZATION A pin defect can cause static and transient voltage violations in the power grid. Figure 1(b) shows the static voltage violations caused by a single-pin defect. However, not all the pins in a design will cause static voltage violations if they have a pin defect. We define the voltage failure pin set $(V_p)$ as the pins who cause static voltage violations if they have a pin defect/failure. A pin defect can also cause an electro-migration failure in neighboring pins due to increased current load on those neighboring pins as shown in Figure 1(b). We define the electro-migration failure critical pin set $(C_p)$ as the pins that cause electro-migration failure in nearby pins due to a pin defect/failure. We define the electro-migration failing pin set $(F_p)$ as the pins that fail due to a defect/failure of pins in $C_p$ . The $C_p$ , $F_p$ and $V_p$ sets are not mutually exclusive and it is possible for a pin to belong to multiple sets. The $C_p$ , $F_p$ and $V_p$ sets are determined by performing N static PSN simulations with a different pin removed during each simulation where N represents the number of power pins in the design. For each simulation, the current through each pin is checked to ensure that it is less than the maximum current threshold $(I_{thres})$ for that pin location including the effect of intra-die temperature. The $I_{thres}$ for pin is calculated using Equation 2 and the temperature map of the circuit. The static IR (voltage) drop is also checked. ## IV. PIN REDUNDANCY Pin redundancy is a possible method to fix power supply network failures due to a single-pin defect/failure. Many designs in practice guard-band by inserting extra pins but not in a systematic manner. Figure 1(c) shows the same portion of the ibmpg2 benchmark as Figure 1 with a redundant pin (square) which prevents static voltage violations and electro-migration failures caused by the failing pin. There are two major challenges with generating a redundant pin set $(R_p)$ for single-pin robustness, namely, selecting which pins need to be made redundant, and determining which additional pins can provide coverage. The pins to be made redundant are selected from the $F_p$ , $C_p$ , and $V_p$ sets to form an overall critical pin set $(O_p)$ . The coverage for each possible redundant pin location is calculated and then tabulated as a set $p_{cov_i}$ that contains the pins from the $O_p$ set that are covered by redundant pin i. The possible redundant pin locations are selected from the top level nodes of the PSN that are not located within the minimum pin spacing requirement of other pins in the circuit. A summary of the different set definitions are given in Table I. TABLE I DEFINITIONS OF TERMINOLOGY | Set | Definition | |-------------|-------------------------------------------------------------------| | $V_p$ | Pins that cause voltage violations if they have a defect/failure. | | $C_p$ | Pins that cause electro-migration failures in other pins if they | | | have a defect/failure. | | $F_p$ | Pins that will have an electro-migration failure due to a | | | defect/failure of a pin in $C_p$ | | $R_p$ | Redundant pins added to design for robustness. | | $O_p$ | Pins from $F_p$ , $C_p$ , and $V_p$ to be made redundant | | $p_{cov_i}$ | Pins from $O_p$ that are covered by redundant pin $i$ | #### A. Calculating Redundancy Coverage Adding a redundant pin to a design will affect the power supply network within a certain distance of the redundant pin due to the locality effect [?]. The effectiveness of a redundant pin depends on several factors: power grid resistivity, the total nearby current draw, the number of nearby pins and the magnitude of the failure caused by the single-pin defect. Using these varying factors, the effectiveness of a redundant pin can be empirically estimated for each pin defect. 1) Electro-migration coverage: Redundant pin coverage for pins in $F_p$ entails reducing the extra current in the pin as a result of a single-pin failure. We define the current slack $(I_{slack})$ in a pin as: $I_{thres} - I$ , where I is the current through the pin. A redundant pin covers a pin in $F_p$ to ensure that the $I_{slack}$ always remains positive irregardless of which pin from the $C_p$ set fails. The relationship between the current slack in the $F_p$ pin and the distance of an added redundant pin has an exponential relationship as shown in Figure 2(a) and is modeled as: $$I_{slack} = A \exp\left(-B \cdot p_{dist}\right) + C \tag{3}$$ where A and B and C are constants and $p_{dist}$ is the distance of the added pin to the failing pin. It is not practical to simulate the effect on the slack for each candidate pin location, consequently, for our experiments the current slack vs. distance was estimated from 4 distances of 100, 500, 1000 and 2000 $\mu$ m. These distances were chosen to capture enough information about the relationship to obtain a good approximation for the A and B and C constants without sacrificing accuracy. The redundant coverage for each pin in $F_p$ is calculated using Equation 3 whose root $(d_{coverEM})$ corresponds to a distance value at which the slack of the failing pin becomes zero. Any redundant pin placed at a distance less than or equal to $d_{coverEM}$ will ensure there are no MTTF violations. 2) Voltage coverage: The coverage for a pin in $V_p$ is calculated similarly to a pin in $F_p$ . If a pin in $V_p$ fails, nodes within close - (a) Example showing exponential relationship between current slack and distance of added pin from failing pin - (b) Example showing linear relationship between voltage violations and distance of added pin Fig. 2. Example from the ibmpg2 benchmark showing redundancy models. proximity will have voltage violations. Adding a pin at a location close enough to the pin from $V_p$ will remove all the static violations. The distance of the added pin that will remove all the static violations was observed to be linear as shown in Figure 2(b) Consequently, the relationship is modeled as: $$V_n = M \cdot p_{dist} + D \tag{4}$$ where M and D are constants and $V_n$ is the number of node voltage violations. After a certain distance, $d_{coverV}$ there are no more voltage violations. Consequently, the potential redundant pins located within the $d_{coverV}$ of a voltage failing pin provide redundant coverage for that pin. #### V. REDUNDANT PIN SET GENERATION Single-pin redundancy augments a power pin placement with a minimal redundancy pin set $(R_p)$ such that for any single-pin defect, nearby pins do not suffer from a cascading electro-migration failure and all nodes within the design continue to satisfy minimum static and transient voltage constraints. The $C_p$ , $F_p$ and $V_p$ sets guide the redundant pin placement since they identify which pins need to be made redundant. Consequently, $R_p$ set generation finds the mapping of redundant pins locations to pins in the various failure sets to minimize the number of redundant pins required. Since there are no prior works in this regards, we propose and compare three methods. #### A. Naive Greedy Pin Redundancy The Naive method uses a simple greedy algorithm as detailed in Algorithm 1. First $C_p$ , $F_p$ , and $V_p$ are computed. The $O_p$ set is then computed based on the size of $C_p$ and $F_p$ . If $|C_p| < |F_p|$ then $O_p = C_p \cup V_p$ otherwise $O_p = F_p \cup V_p$ . Finally, $R_p$ is computed by visiting every pin in $O_p$ and adding the closest redundant pin to $R_p$ . Figure 3 shows a simple example to illustrate the Naive algorithm process. The algorithm first computes $O_p$ and then iterates through each member selecting a redundant pin to provide coverage. For our example, the algorithm selects redundant pin 1 to cover pins 2 and 4, then redundant pin 4 to cover pins 5 and 7, etc., until all pins in $O_p$ are covered. The final $R_p$ is thus $\{1,4,5,3\}$ . #### Algorithm 1 Naive Greedy Algorithm Input: Original pin placement **Output:** Set of redundant pins locations $R_p$ - 1: Generate $C_p$ , $F_p$ , $V_p$ and $O_p$ sets. - 2: repeat - 3: Select pin j in $O_p$ - 4: Select redundant pin closest to j and add to $R_p$ - 5: **until** All pins in $O_p$ are visited ### B. ILP Pin Redundancy Selecting a minimal $R_p$ based on pins in $O_p$ is a set covering problem which is solvable as an Integer Linear Program (ILP). Given the possible redundant pin locations and the pins to be covered in $O_p$ , the ILP picks the least number of redundant pins so that every pin within $O_p$ is covered. More formally the problem is formulated as: $$\min \quad \sum_{i} r_i, \ r_i \in \{0, 1\}$$ s.t. $$A_f \cdot r \ge \mathbf{1}$$ where r is a binary vector for all possible redundant pin locations and specifies whether a pin is placed at that location or not, $A_f$ is an $M \times N$ matrix that contains the coverage information for all possible redundant pin locations obtained from the coverage sets $p_{cov_i}$ where N is the number of possible redundant pin locations and M is the size of the $O_p$ set to be covered. Column i in $A_f$ contains the pins from $O_p$ that are covered by placing a redundant pin at position i. An example of an $A_f$ matrix for the example displayed in Figure 3(e) is shown in Figure 4(b). The ILP solver will choose the smallest $R_p$ set to ensure all the pins in the $O_p$ set are covered, which in this case is $\{1,4,3\}$ , which is smaller than the $R_p$ from the Naive method. #### VI. EXPERIMENTAL SETUP We implemented the prior algorithms in C++ and use HotSpot 5.0 [?] for thermal analysis. The direct solver CHOLMOD from the UFsparse matrix packages [?] was used for final power grid analysis, while a preconditioned conjugate gradient (PCG) solver was used to generate the coverage sets and also the $V_p$ , $F_p$ and $C_p$ sets. We use GLPK [?] for solving the ILPs. Our results are run on an Ubuntu 10.04 Linux system with a 3.4GHz Intel i7-2600 processor and 8GB of memory. We use the IBM power grid benchmarks [?] for our experiments. These are extracted from real-world high-performance microprocessors. However, since no physical dimensions are given in the benchmarks, we scaled them so that each chip has an average power density of 250 W/cm² [?]. The temperature map for each benchmark was in regional blocks and the total current for each block is summed from the contained current sources. For the electromigration calculations, we assume the following values from [?]: Q = 0.8 eV, n = 1.8 and $A = 2.54 \times 10^{-8}$ . #### VII. EXPERIMENTS Several experiments demonstrate the effectiveness of our various pin redundancy algorithms. The ibmpg4 and ibmpg5 benchmarks are not used in our experiments due to the small total current in these circuits, hence they have no pins with potential electro-migration failures. For all experimental results, each power grid is simulated with the redundancy set to confirm there are no electro-migration problems or static/dynamic voltage violations. The initial power pin placements for each benchmark are generated using the algorithm presented in [?] due to the large static IR drops found in some of the original benchmarks (0.81V for ibmpg1 benchmark). #### A. Baseline experiments First, we demonstrate the necessity for single-pin redundancy by showing the voltage and electro-migration violations that occur when no redundant pins are added. The results of this experiment are shown in Table II under the no redundancy column. The Volt. Viols. column represents percentage of pins that cause voltage violations if they have a defect/failure and the EM Viols. column represents the percentage of pins that have an electro-migration violation for any pin defect/failure. locations (hollow). Arrows show the pins in $F_p$ that fail from pins in $C_p$ (e) Relationship between potential redundant pins and pins in $O_p\left(F_p \cup V_p\right)$ Fig. 3. Example pin placement, showing corresponding redundancy mappings. An arrow from a the redundant pin location to a pin in the opposing set, means that pin is covered by that redundant pin location. TABLE II COMPARISON OF DIFFERENT PIN REDUNDANCY SCHEMES | | | | No Redundancy | | Naive Greedy (Baseline) | | ILP | | |-----------|--------------|--------|---------------|-------------|-------------------------|---------|------------|---------| | Benchmark | # Cand. Pins | # Pins | EM Viols | Volt. Viols | Added Pins | Runtime | Added Pins | Runtime | | ibmpg1 | 6085 | 269 | 53.2% | 25.7% | 198 | 1.55 | 44.8% | 8.9 | | ibmpg2 | 2095 | 220 | 65.0% | 11.4% | 154 | 18.49 | 18.6% | 134.0 | | ibmpg3 | 4423 | 183 | 9.8% | 32.2% | 71 | 54.91 | 41.2% | 310.0 | | ibmpg6 | 22400 | 84 | 58.3% | 42.9% | 64 | 140.9 | 38.1% | 948.2 | | mean | | | 46.6% | 28.1% | | | 35.7% | 6.4x | (a) ILP constraint equations (b) ILP matrix Fig. 4. ILP constraints and corresponding matrix for $O_p$ set. We then demonstrate the effectiveness of using the Naive Greedy method for providing single-pin redundancy coverage. The results for these experiments are shown in Table II under the Naive Greedy column and shows the large number of redundant pins used for this method, 64.6% on average. ## B. ILP The size of the $R_p$ sets generated using the Naive Greedy method are large, consequently we investigated the results of using the proposed ILP method. The size of the $R_p$ sets generated using the ILP is much smaller than those using the Naive Greedy method as shown in Table II. However, the runtime increases significantly due to the time to taken to calculate the coverage sets. #### VIII. CONCLUSION Guranteeing PSN robustness has become a challenge for modern ICs, especially in the presence of flip-chip pin failures. In this paper we presented two methods for generating a redundant power pin set to guarantee single-pin redundancy. Our ILP formulation was able to generate redundant pin sets for the IBM power grid benchmarks that use 33% of the total redundant pins that are required for a typical naive greedy method while still providing robust power delivery. #### REFERENCES - [1] E. Chiprout. Fast flip-chip power grid analysis via locality and grid shells. In *ICCAD*, pages 485 488, 2004. - [2] W. Choi, E. Yeh, and K. Tu. Mean-time-to-failure study of flip chip solder joints on cu/ni(v)/al thin-film under-bump-metallization. *Journal* of Applied Physics, 94, 2003. - [3] T. Davis. Ufsparse. http://www.cise.ufl.edu/research/ sparse/. - [4] GLPK. http://www.gnu.org/software/glpk/. - [5] C. Kim and D. Baldwin. No-flow underfill process modeling and analysis for low cost, high throughput flip chip assembly. *Trans. on Electronics Packaging Manufacturing*, 26(2):156 – 165, april 2003. - [6] G. M. Link and N. Vijaykrishnan. Thermal trends in emerging technologies. In *ISQED*, pages 625–632, 2006. - [7] R. Master, A. Marathe, V. Pham, and D. Morken. Electromigration of C4 bumps in ceramic and organic flip-chip packages. In *Electronic Components and Technology Conference*, 2006. - [8] S. R. Nassif. Power grid analysis benchmarks. In ASPDAC, pages 376–381, 2008. - [9] NVIDIA. Second quarter release. http://www.nvidia.com/ object/io\_1215037160521.html. - [10] T. Sato, H. Onodera, and M. Hashimoto. Successive pad assignment algorithm to optimize number and location of power supply pad using incremental matrix inversion. In ASPDAC, pages 723 – 728, 2005. - [11] M. Stan et al. Hotspot: A dynamic compact thermal model at the processor-architecture level. *Microelectronics Journal*, pages 1153– 1165, 2003. - [12] H. Su, S. S. Sapatnekar, and S. R. Nassif. An algorithm for optimal decoupling capacitor sizing and placement for standard cell layouts. In *ISPD*, pages 68–73, 2002. - [13] R. Thorpe, D. Baldwin, and L. McGovern. High throughput flip chip processing and reliability analysis using no-flow underfills. In *IECTC*, pages 419–425, 1999. - [14] T.-Y. Wang and C. C.-P. Chen. Optimization of the power/ground network wire-sizing and spacing based on sequential network simplex algorithm. In *ISQED*, pages 157–162, 2002. - [15] M. Zhao, Y. Fu, V. Zolotov, S. Sundareswaran, and R. Panda. Optimal placement of power supply pads and pins. In *DAC*, pages 165–170, 2004