# A Routing Algorithm for Flip-Chip Design

Jia-Wei Fang<sup>1</sup>, I-Jye Lin<sup>2</sup>, and Yao-Wen Chang<sup>1,2</sup>

Graduate Institute of Electronics Engineering, National Taiwan University, Taipei<sup>1</sup> Department of Electrical Engineering, National Taiwan University, Taipei<sup>2</sup>

#### **Abstract**

The flip-chip package gives the highest chip density of any packaging method to support the pad-limited Application-Specific Integrated Circuit (ASIC) designs. In this paper, we propose the first router for the flip-chip package. The router can redistribute nets from wire-bonding pads to bump pads and then route each of them. The router adopts a two-stage technique of global routing followed by detailed routing. In global routing, we use the network flow algorithm to solve the assignment problem from the wire-bonding pads to the bump pads, and then create the global routing path for each net. The detailed routing consists of three stages, cross point assignment, net ordering determination, and track assignment, to complete the routing. Experimental results based on seven real designs from the industry demonstrate that the router can reduce the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews by 13.9%, compared with a heuristic algorithm currently used in industry.

# 1 Introduction

# 1.1 Flip-Chip Design

Due to the increasing complexity and decreasing feature size of Very Large Scale Integration (VLSI) designs, the demand of more I/O pads has become a significant problem of package technologies. Hence a relatively new packaging technology, the *flip chip (FC) package*, as shown in Figure 1, is created for higher integration density and rising power consumption. Flip-chip bonding was first developed by IBM in 1960's. It gives the highest chip density of any packaging method to support the pad-limited ASIC designs.



Figure 1: (a) A Flip Chip. (b) A Flip Chip Package.

Flip-chip is not a specific package, or even a package type (like PGA or BGA). Flip-chip describes the method of electrically connecting the die to the package carrier. The package car-

rier, either a substrate or a lead-frame, provides the connection from the die to the outside devices of the package. The die is attached to the carrier face up, and later a wire is bonded first to the die, then looped and bonded to the carrier. In contrast, the interconnection between the die and carrier in the flip-chip package is made through a conductive bump ball that is placed directly on the die surface. Finally, the bumped die is flipped over and placed face down, with the bump balls connecting to the carrier directly. The flip-chip technology is the choice in high clock speed applications because of the following advantages: reduced signal inductance (high speed), reduced power/ground inductance (low power), higher signal density, die shrink, reduced package footprint, and lower thermal effect. However, in recent IC designs, the I/O pads are still placed along the boundary of the die. This placement does not suit for the flip-chip package. As a result, we use the top metal or an extra metal layer, called Re-Distributed Layer (RDL) as shown in Figure 2, to redistribute the wire-bonding pads to the *bump pads* without changing the placement of the I/O pads. Since the RDL is the top metal layer of the die, the routing angle in RDL cannot be any-angle. Bump balls are placed on RDL and use the RDL to connect to wire-bonding pads by bump pads.



Figure 2: Cross Section of RDL

The flip-chip package is generally classified into two types: the peripheral array as shown in Figure 3(a) and the area array as shown in Figure 3(b). In the peripheral array, the bump balls are placed along the boundary of the flip-chip package. The disadvantage of the peripheral array is that we only have the limited number of bump balls. In the area array, the bump balls are placed in the whole area of the flip-chip package. The advantage of the area array is that the number of bump balls is much more than that of the peripheral array, so it is more suitable for modern VLSI designs. Since the flip-chip design is for high speed circuits, the issue of signal skews is also important. Thus a special router, the Redistribution Layer (RDL) router [15], is needed to reroute the peripheral wire-bonding pads to the bump pads and then connect the bump pads to the bump balls. Considering the routing of multi-pin nets and the minimization of total wirelenth and the signal skews are also needed for an RDL router. Figure 3(c) shows one RDL routing result for an area-array flip chip.

# 1.2 Previous Work

To the best knowledge of the authors, there is no previous work in the literature on the routing problem for flip-chip



Figure 3: (a) A Peripheral Array. (b) An Area Array. (c) An RDL Routing Result.

Similar works are the routing for ball grid array (BGA) packages and pin grid array (PGA) packages, including [5], [12], [13], [14], [16], [18] and [19]. The work [18] used the geometric and symmetric attributes of the pin positions in the BGA packages to assign pins of the BGA. However, in flip-chip designs the positions of wire-bonding pads and bump pads do not always have these geometric and symmetric attributes. The works [5] and [13] presented PGA routers while [14] provided a BGA router. These three routers are any-angle and multi-layer routers without considering the pin assignment problem. They did not consider single-layer routing and total wirelength minimization. The works [16] and [19] applied the min-cost network flow algorithm to solve the I/O pin routing problems. All these routers focused only on routability and did not consider multipin nets and signal skews. Furthermore, they assumed that wires can be any-angle, so their methods are not suitable for the RDL routing, typically with 90-degree angle routing.

#### 1.3 Our Contributions

To our best knowledge, this paper is the first work to propose an RDL router to handle the routing problem of flip-chip designs with real industry applications. We propose a unified networkflow formulation to simultaneously consider the assignment of the wire-bonding pads to the bump pads and the routing between them. Our algorithm consists of two phases. The first phase is the global routing that assigns each wire-bonding pad to a unique bump pad. By formulating the assignment as a maximum flow problem and applying the min-cost maximum-flow algorithm, we are able to guarantee 100% routing completion after the assignment. The second phase is the detail routing that efficiently distributes the routing points between two bump pads and assigns wires into tracks. In addition to the traditional single-layer routing with only routability optimization, our RDL router also tries to optimize the total wirelength and the signal skews between a pair of signal nets under the 100% routing completion constraint. Experimental results based on seven real designs from the industry demonstrate that the router can reduce the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews

by 13.9%, compared with a heuristic algorithm currently used in industry.

The rest of this paper is organized as follows. Section 2 gives the formulation of the RDL routing problems. In Section 3, we detail our algorithm, including the global routing and the detailed routing. Section 4 shows the experimental results. Finally, conclusions are given in section 5.

#### 2 Problem Formulation

We introduce the notations used in this paper and formally define the routing problem for flip-chip package. Figure 4 shows the modeling of the routing structure of the flip-chip package. Let P be the set of wire-bonding pads, and B be the set of bump pads. For practical application, the number of bump pads is larger than or equal to the number of wire-bonding pads, i.e.,  $|B| \ge |P|$ , and each bump pad can be assigned to more than one wire-bonding pad. Let  $R_b = \{r_1^b, r_2^b, ..., r_m^b\}$  be a set of m bump pad rings in the center of the package, and let  $R_p = \{r_1^p, r_2^p, ..., r_k^p\}$  be a set of k wire-bonding pad rings at the boundary of the package. Each bump pad ring  $r_i^b$  consists of a set of q bump pads  $\{b_1^i, b_2^i, ..., b_q^i\}$ , and each wire-bonding pad  $r_j^p$  consists of l wire-bonding pads  $\{p_1^j, p_2^j, ..., p_l^j\}$ . Let N be the set of nets for routing. Each net n in N is defined by a set of wire-bonding pads and a set of bump pads that should be connected. Thus n can be a multi-pin net. Since the RDL routing for current technology is typically on a single layer, it does not allow wire crossings, for which two wires intersects each other in the routing layer. As shown in Figure 4, based on the two diagonals of the flip-chip package, we partition the whole package into four sectors:  $East = \{P_E, B_E, R_p^E, R_b^E\}$ ,  $West = \{P_W, B_W, R_p^W, R_b^W\}$ ,  $South = \{P_S, B_S, R_p^S, R_b^S\}$ , and  $North = \{P_N, B_N, R_p^N, R_b^N\}$ . For practical applications, the wire-bonding pads in one sector connects only to the bump pads in the same sector.



Figure 4: Four Sectors in a Flip-Chip Package.

We define an *interval* to be the segment between two adjacent bump pads in the same ring  $r_i^b$  or the segment between two adjacent wire-bonding pads in the same ring  $r_j^p$ . Given a flip-chip routing instance, there are two types of routing, the *monotonic routing* and the *non-monotonic routing*. A monotonic routing can be formally defined as the follows:

**Definition 1** A monotonic routing is a routing such that for each net n connecting from a wire-bonding pad p to a bump pad b,

n intersects exactly one interval in each ring  $r_i^b$  and exactly one interval in each ring  $r_i^p$ .

As showing in Figure 5(a), the nets  $n_2$  and  $n_4$  are monotonic routes. If we exchange the positions of two bump pads  $b_2$  and  $b_4$ , the routing of  $n_2$  and  $n_4$  are non-monotonic routing as shown in Figure 5(b). A good flip-chip package routing should be a monotonic routing because the monotonic routing results in smaller total wirelength and higher routing completion, compared to the non-monotonic routing.



Figure 5: (a) Monotonic Routing. (b) Non-monotonic Routing.

Based on the definition above, the routing problem can be formally defined as the follows:

Problem 1 The single-layer flip-chip routing problem is to connect a set of  $p \in P$  and a set of  $b \in B$  so that no wire crosses each other and the routing is monotonic, the total wirelength is minimized, and the signal skew is minimized.

#### The Routing Algorithm

In this section, we present our routing algorithm. First we give the overview of our algorithm. Then we detail the methods used in each phase.

#### **Algorithm Overview** 3.1

According to the routing flow shown in Figure 6, our algorithm consists of two phases: (1) the global routing based on the mincost max-flow (MCMF) algorithm [4], and (2) the detailed routing based on the cross point assignment, net ordering determination, and the track assignment.

In the first phase, we construct four flow networks  $G_E$ ,  $G_W$ ,



Figure 6: The RDL Routing Flow

one layer for routing, the assignment should not create any wire crossings. We avoid the wire crossings by restricting the edges in the networks not to intersect each other. We first consider 2-pin nets and then multi-pin nets. The reason is that 2-pin nets have less freedom to choose the routing path, so it needs to be considered first. After applying MCMF, we obtain the flows denoting the routes from wiring-bonding pads to bump pads for the nets. Those flows give the global paths for the nets.

In the second phase, we use the cross point assignment, net ordering determination, and the track assignment to achieve fast detailed routing. A *cross point* is the point for a net to pass through an interval. First, we find the cross points for all nets passing through the same interval. For all nets that pass through the same interval, we evenly distribute these cross points. Second, we use the net ordering determination technique presented in [7] to create the routing sequence between two adjacent rings so that we can guarantee to route all nets. Finally, we assign at least one track to each net based on the routing sequence obtained from the net ordering determination algorithm. Figure 7 shows the overview of our routing algorithm.

# 3.2 Global Routing

In this subsection, we first show the basic flow network formulation. Then we detail the capacity of each edge, the intermediate nodes, the tile nodes, and the cost of each edge. Finally, we discuss how to handle the multi-pin nets.

#### 3.2.1 Basic Network Formulation

We describe how to construct the flow network  $G_S$  to perform the assignment for the South sector. The other three sectors can be processed similarly. As shown in Figure 8 (a), we define  $D_S = \{d_1^S, d_2^S, ..., d_h^S\}$  to be a set of h intermediate nodes. Each intermediate node represents an interval  $(b_x^i, b_{x+1}^i)$  $((p_y^j, p_{y+1}^j))$  in a bump pad ring (wire-bonding pad ring).  $T_S =$ The limit phase, we constitute four now networks  $G_E$ ,  $G_W$ ,  $G_N$ , and  $G_S$ , one for each sector, to solve the assignment of  $\{t_1^S, t_2^S, ..., t_u^S\}$  is a set of u tile nodes. Each tile node represents a the wire-bonding pads to the bump pads. Since we only have tile  $(b_x^i, b_{x+1}^i, b_{x'}^{i+1}, b_{x'+1}^{i+1})$  if  $((p_y^i, p_{y+1}^i, p_{y'+1}^{j+1}, p_{y'+1}^{j+1}))$  between two

```
Algorithm: RDL Routing(P, B, N)
P: set of all wire-bonding pads;
B: set of all bump pads;
N: set of all nets;
1 begin
     Construct four graphs G_E, G_W, G_N, G_S with only
3
4
     Apply MCMF to find the assignment of each p \in P to b \in B
5
        in the same sector and the global path for each 2-pin net;
    Add additional edges to represent the multi-pin net in the
6
        four graphs;
8
    Apply MCMF to find the assignment of each p \in P to b \in B
        in the same sector and the global path for each multi-pin net;
    Find all cross points in all intervals for each net n \in N;
    for the outermost ring r_i^p to the innermost ring r_i^b
11
12
        S \leftarrow \text{Net\_Ordering\_Determination()};
13
        /\!/\,S contains the routing sequence;
14
        Assign_track(S);
15 end
```

Figure 7: Overview of the RDL Routing Algorithm.

adjacent bump pad rings (wire-bonding pad rings). We construct a graph  $G_S = (P_S \cup D_S \cup B_S \cup T_S, E)$  and add a source node s and a target node t to  $G_S$ . Each intermediate node has a capacity K, where K represents the maximum number of nets that are allowed to pass through an interval. Each tile node has a capacity L, where L represents the maximum number of nets that are allowed to pass through a tile. We will detail how to handle the capacity of the intermediate nodes and the tile nodes so that MCMF can be applied in Section 3.2.2. There are eight types of edges:

- 1. edges from a wire-bonding pad to a bump pad,
- 2. edges from a wire-bonding pad to an intermediate node,
- 3. edges from an intermediate node to a bump pad,
- 4. edges from an intermediate node to another intermediate node,
- 5. edges from an intermediate node to a tile node,
- 6. edges from a wire-bonding pad to a tile node,
- 7. edges from a tile node to a bump pad, and
- 8. edges from a tile node to an intermediate node.

The source s has an edge to every node in  $P_S$ , and there is an edge from every node in  $B_S$  to the target t. Each edge is associated with a (cost, capacity) tuple to be described in the following subsections. Recall that we do not allow wire crossings for all wires. Since E represents the possible global paths for all nets, we can guarantee that no wire crossings will occur if there are not any crossings in edges. Thus, we construct all the edges and avoid crossings of all edges at the same time. Figure 8(b) shows an example flow network  $G_S$  for the South sector. We can solve MCMF in time  $O(V^2E^{\frac{1}{2}})$  [4].

**Theorem 1** Given a flow network with the vertex set V and edge set E, the global routing problem can be solved in  $O(|V|^2|E|^{\frac{1}{2}})$  time.



Figure 8: (a) Intermediate Nodes and Tile Nodes. (b) Flow Network for the South Sector.

#### 3.2.2 Capacity Assignment and Node Construction

Now we introduce the capacity of each edge, the intermediate nodes, and the tile nodes. For an edge e, if e is from a wirebonding pad to a bump pad or an intermediate node or a tile node, the capacity of e is set to 1. If e is from an intermediate node or a tile node to a bump pad, then the capacity of e is set to M, where M is the maximum number of nets that are allowed to connect to the bump pad. Recall that an intermediate node has the capacity of K, where K is the maximum number of nets that are allowed to pass through this intermediate node. This means that the summation of all outgoing edges of an intermediate node d is equal to K. The same condition holds for all incoming edges of d. As shown in Figure 9, in order to model this situation, we decompose each intermediate node d into two intermediate nodes d' and d'' and an edge is connected from d'' to d' with capacity K. All outgoing edges of d are now connected from d' with capacity K, and all incoming edges of d are now connected to d'' with capacity K. In Figure 10, a tile node is also decomposed into two tile nodes t' and t'', and the capacity of a tile node is set to L, where L is the maximum number of nets that are allowed to pass through this tile node. The capacity of the edges from the source node to the wire-bonding pads is set to 1, and the capacity of the edges from the bump pads to the sink node is set to M. There are three worst cases of congestion in a tile, as shown in Figure 11. The four nodes in the three figures are all bump pads. In Figures 11(a) and (b), the maximum number of nets passing through the tile is 2K. In Figure 11(c), the maximum number of nets passing through the tile is 3K. If we do not use the tile node, the maximum number of nets in Figures 11(a), (b), and (c) could exceed the capacity of a tile (2K > L or 3K > L). Since the capacity of each tile node is well modeled in our flow network, we can totally avoid this congestion problem.



Figure 9: Capacity and Cost on Intermediate Nodes.



Figure 10: Capacity and Cost on Tile Nodes.

### 3.2.3 The Cost of Edges

The cost function of each edge is defined by the following equation:

$$Cost = \alpha \times WL, \tag{1}$$

where WL denotes the Manhattan distance between two terminals of an edge, and  $\alpha$  is an adaptive parameter to adjust the cost of different types of edges. We assign the smallest  $\alpha$  to the edge that connects an intermediate node and a bump pad to assure that the intermediate nodes are assigned to bump pads first. The edges which connect a tile node and a bump pad are also assigned the smallest  $\alpha$ . The edge that connects two intermediate nodes or an intermediate node to a tile node is assigned the largest  $\alpha$ . By adjusting the value of  $\alpha$ , we can control the wirelength of each net to avoid large signal skews between different nets. The cost of the edges from the source node to the wire-bonding pads and the cost of the edges from the bump pads to the sink node are both set to 0. Figure 12 shows the capacity and cost for all four types of edges.

#### 3.2.4 Multi-pin Net Handling

Finally, we describe how to deal with the multi-pin nets. As stated before, we first assign the 2-pin nets and then the multi-pin nets. We only construct the edges that are associated with the 2-pin nets and apply MCMF for the assignment. After the assignment, we delete all edges from the source node s and all edges to the target node t. The global paths of the 2-pin nets are not deleted and considered as the blockages F when we construct the edges for the multi-pin nets. Recall that if there are no edge crossings



Figure 11: Three Kinds of Congestion in a Tile.



Figure 12: Capacity and Cost on Edges.

in the flow network, then there are no wire crossings in the final routing solution. When we construct the edges for the multi-pin nets, an edge e exists only if e does not intersect any blockages. Then we add the edges from the source node to the wire-bonding pads associated with the multi-pin nets and the edges from the bump pads associated with the multi-pin nets to the target node. Figure 13 illustrates an example. Assume that a multi-pin net n consists of  $((p_2, p_4, p_5), (b_3, b_9))$ , which means that  $p_2$ ,  $p_4$ , and  $p_5$  are free to be assigned to one of the two bump pads  $p_5$  and  $p_6$  are deleted by  $p_6$ . For example, the edge from  $p_6$  to the intermediate node between  $p_6$  and  $p_6$  is deleted because it intersects the blockage  $p_6$ . By using MCMF, the wirebonding pads and bump pads are grouped into two sets:  $p_6$ ,  $p_6$ , and  $p_6$ ,  $p_6$ ,  $p_6$ .

Since MCMF is optimal and we will never assign nets to exceed the capacity of an interval or a tile, we will never violate the design rules. Also because we do not allow edge crossings during flow network construction, the final routing solution will not generate wire crossings. So after the assignment, all global paths are routable. Based on above discussions, we have the following theorem.

**Theorem 2** Given a set of wire-bonding pads, a set of bump pads, and a set of nets, if there exists a feasible solution computed by the MCMF algorithm, we can guarantee 100% routing completion.



Figure 13: Group Multi-pin Nets.

#### 3.3 Detailed Routing

In this subsection, we detail the three methods used in our efficient detailed routing. As shown in Figure 14, after the global routing, each global path contains only wire-bonding pads, intermediate nodes, and bump pads. The two global paths  $< d_k, t, d_l >$  and  $< d_k, t, b_x >$  which pass trough the tile node t are redefined into  $< d_k, d_l >$  and  $< d_k, b_x >$ . Hence tile nodes are not needed for the final expression of the global paths because a tile node is just used to avoid the congestion of a tile.



Figure 14: Redefined Global Paths.

### 3.3.1 Cross Point Assignment

Based on the global routing result (discussed in Section 3.2), we use the cross point assignment algorithm to evenly distribute each net which passes through the same interval. We use the example of Figure 15 to describe the process for the cross point assignment. As shown in Figure 15, the two nets from wire-bonding pads  $p_2$  and  $p_3$  pass through the same intermediate node. So we split the intermediate node into two cross points.

**Theorem 3** The cross point assignment problem can be solved in  $O(|B| + |P| - |R_b| - |R_p|)$  time.



Figure 15: Cross Point Assignment.

#### 3.3.2 Net Ordering Determination

After the assignment of cross points, each net has its path to cross each interval. For two adjacent rings, we can treat it as a channel routing. So we use the net ordering determination algorithm presented in [7] to generate a routing sequence  $S \ = < \ (n_1^s, n_1^t), (n_2^s, n_2^t), .., (n_k^s, n_k^t) \ > \ \text{with} \ k \ \ \text{net segments}.$ Each net segment  $n_j$  is represented by a pair (source, target) = $(n_i^s, n_i^t)$ . First, we generate a circular list for all terminals ordered counter-clockwise according to their positions on the boundaries. Then, a stack is used to check if there exist crossovers among all net segments. For each terminal of net  $n_i$ , if it is a source, then we push it into the stack. Otherwise, if this terminal and the top element of the stack belong to the same net, then net  $n_i$  is matched. We keep searching the circular list until all nets are matched. With this sequence S, we can guarantee that each net segment between two adjacent rings can be routed without intersecting each other. For example, given an instance shown in Figure 16(a), according to the net ordering determination algorithm, we can obtain the sequence S = < $(n_1, n'_1), (n'_{10}, n_{10}), (n'_9, n_9), (n'_8, n_8), (n'_7, n_7), (n'_6, n_6), (n'_5, n_5),$  $(n_2, n'_2), (n_3, n'_3), (n_4, n'_4) >.$ 

**Theorem 4** Given a set of nets N, the net ordering determination problem can be solved in  $O(|N|^2)$  time.



Routing Sequence: {(1, 1'), (10', 10), (9', 9), (8', 8), (7', 7), (6', 6), (5', 5), (2, 2'), (3, 3'), (4, 4')}



Figure 16: (a) An example for Track Assignment. (b) Blocking Point.

#### 3.3.3 Track Assignment

With the net ordering, we can use maze routing to route all nets for any two adjacent rings. However, maze routing is quite slow.

```
Algorithm: Track Assignment(S_i, L)
S_j: a routing sequence between ring r_j and r_{j+1};
L: the maximum number of tracks;
1 begin
2
     for each net segment n_i in S_i
         Let (x_i^s, y_i^s) ((x_i^t, y_i^t)) be the coordinate of the
3
4
         source (target) of n_i;
5
         if ((x_i^s \ge x_i^t \text{ and } y_i^s \ge y_i^t) \text{ or } (x_i^s \ge x_i^t \text{ and } y_i^s \le y_i^t))
             Find a track l of L from the top to the bottom without
6
             creating overlap of other wires;
8
9
             Find a track l of L from bottom to top without
10
             creating overlap of other wires;
11
         if such l exists
12
             assign l to n_i;
13
         else
14
             for all pre-routed net n_k
15
                 divide into two segments according to q_k;
16
                 assign the segment not overlapping with q_k
17
                  to the lowest (highest) possible track;
18 end
```

Figure 17: Algorithm for Track Assignment.

(For example, for a small test case with 513 nets, we need 25 minutes to complete the detailed routing.) So we propose a track assignment algorithm to assign tracks to each net segment of any two adjacent rings. For each net segment  $n_i$  in S, according to the relative locations of  $n_i^s$  and  $n_i^t$ , we search a track to be assigned to  $n_i$  from the top to the bottom or from the bottom to the top. We search the tracks from the top to the bottom if  $n_i^s$  is in the top-right side of  $n_i^t$ , or  $n_i^s$  is in the bottom-right side of  $n_i^t$ . Otherwise, we search the tracks from the bottom to the top. If we find a track land it does not create any overlap with other wires, then we assign l to  $n_i$ . As shown in Figure 16(a),  $n_1$  is assigned to track 1 first, and  $n_5$  is assigned to track 4 first. Also we record the blocking points Q for  $n_i$ . A blocking segment is a wire on track l+1 (if we search from the top to the bottom) or l-1 (if we search from the bottom to the top) to stop  $n_i$  from being assigned to l+1 or l-1 without creating any overlap with it. A blocking point  $q_i$  is a terminal of the blocking segment whose projection on *l* overlaps with  $n_i$ . As shown in Figure 16(b), the point  $q_3$  on track  $l_2$  is the blocking point for net  $n_3$ . If we cannot find such l, we rip-up and reroute all net segments  $n_1$  to  $n_{i-1}$ . For each net  $n_k$  to be rerouted, we use the concept of the dogleg in the channel routing to break a segment into two segments based on the blocking point  $q_k$  such as  $n_3$  in Figure 16(b). Then we assign the segment that will not overlap with  $q_k$  on the lowest possible track (if we search from the top to the bottom) or on the highest possible track (if we search from the bottom to the top). After assigning tracks, we record the new blocking points for  $n_k$ . Note that since now each net segment may be assigned with more than one track, we may have more than one blocking point for each net. Figure 17 summarizes the track assignment algorithm.

**Theorem 5** Given a set of nets N and a constant number of tracks L, the track assignment problem can be solved in  $O(|N|^2L(|R_b|+|R_p|))$  time.



Figure 18: RDL Routing Solution of fs900.

# 4 Experimental Results

We implemented our algorithm in the C++ programming language on a 1.2GHz SUN Blade 2000 workstation with 8 GB memory. The benchmark circuits fs90b740, fsa0ac013aa, fsa0ac015aa, fwaa281, fs900, fs2116, and fs4096 are real industry designs. In Table 1, "Case name" denotes the names of circuits, "#Nets" denotes the number of nets, "# $R_p$ " denotes the number of wire-bonding pad rings, "#p" denotes the number of wire-bonding pads, "# $R_b$ " denotes the number of bump pad rings, and "#b" denotes the number of bump pads. In each of fs900, fs2116, and fs4096, the number of wire-bonding pads equals the number of bump pads. So each wire-bonding pad needs to be assigned to exactly one bump pad. Hence these three cases are more difficult for routing than the other four cases.

Since there are no flip-chip routing algorithms in the literature, we compared our algorithm with the following heuristic algorithm currently used in a design service company. This heuristic is called the nearest node connection (NNC) algorithm. In NNC, the wires are routed sequentially. If a wire-bonding pad p can find a free bump pad b in a restricted area of the nearest bump pad ring  $r_m^b$ , then it connects p to b. If there are no free bump pads in  $r_m^b$ , then we search for a free bump pad in the next bump pad ring  $r_{m+1}^b$ . This process is repeated until we find a free bump pad.

The experimental results are shown in Table 2. We report the total wirelength, the critical wirelength, the maximum signal skews, and the CPU times. Since the routability is guaranteed to be 100%, we do not report it. Compared with NNC, the experimental results show that our network flow based algorithm reduces the total wirelength by 10.2%, the critical wirelength by 13.4%, and the signal skews by 13.9% in reasonably longer running time. Note that for fs2116 and fs4096, NNC fails to find a routing solution. Figure 18 shows the RDL routing result of fs900. The experimental results demonstrates the effectiveness of our network flow based algorithm for the routing for flip-chip designs.

| Case name   | # Nets<br>(2-pin/multi-pin) | #Rp | #p   | #Rb | #b   |  |
|-------------|-----------------------------|-----|------|-----|------|--|
| fs90b740    | 646/0                       | 2   | 646  | 7   | 812  |  |
| fsa0ac013aa | 657/4                       | 2   | 657  | 17  | 1156 |  |
| fsa0ac015aa | 639/6                       | 2   | 639  | 17  | 1156 |  |
| fwaa281     | 513/24                      | 2   | 513  | 13  | 676  |  |
| fs900       | 900/0                       | 4   | 900  | 15  | 900  |  |
| fs2116      | 2116/0                      | 6   | 2116 | 23  | 2116 |  |
| fs4096      | 4096/0                      | 8   | 4096 | 32  | 4096 |  |

Table 1: Test Cases for RDL Router.

| Algorithm  Case name | Total wirelength (μm) |               |                 | Critical wirelength ( $\mu$ m) |               | Skew            |      |               | CPU time (s)    |      |               |
|----------------------|-----------------------|---------------|-----------------|--------------------------------|---------------|-----------------|------|---------------|-----------------|------|---------------|
|                      | NNC                   | Our<br>method | Improve<br>ment | NNC                            | Our<br>method | Improve<br>ment | NNC  | Our<br>method | Improve<br>ment | NNC  | Our<br>method |
| fs90b740             | 814927                | 779089        | 4.6%            | 3682                           | 3357          | 8.9%            | 3392 | 3067          | 9.6%            | 0.28 | 0.68          |
| fsa0ac013aa          | 773717                | 700831        | 10.4%           | 5274                           | 4539          | 13.9%           | 5139 | 4404          | 14.3%           | 0.39 | 0.87          |
| fsa0ac015aa          | 699986                | 618363        | 13.2%           | 5254                           | 4068          | 22.5%           | 5118 | 3932          | 23.2%           | 0.34 | 0.79          |
| fwaa281              | 663762                | 579199        | 14.6%           | 4755                           | 4208          | 11.5%           | 4496 | 3949          | 12.2%           | 0.24 | 0.54          |
| fs900                | 1888992               | 1745834       | 8.2%            | 6000                           | 5400          | 10.0%           | 5700 | 5100          | 10.5%           | 0.71 | 1.39          |
| fs2116               | fail                  | 6208840       | N/A             | fail                           | 8800          | N/A             | fail | 8500          | N/A             | fail | 9.46          |
| fs4096               | fail                  | 16807614      | N/A             | fail                           | 13300         | N/A             | fail | 13000         | N/A             | fail | 43.79         |
| Average              |                       | 10.2%         | ·               |                                | 13.4%         |                 |      | 13.9%         |                 |      | ·             |

Table 2: RDL Routing Results.

#### 5 Conclusion

In this paper, we have developed an RDL router for the flip-chip package. The RDL router consists of the two stages of global routing followed by detailed routing. The global routing applies the network flow algorithm to solve the assignment problem from the wire-bonding pads to the bump pads and then creates the global routing path for each net. The detailed routing uses cross point assignment, net ordering determination, and track assignment to complete the routing. Experimental results show that our router can achieve much better results in routability, wirelength, critical wirelength, and signal skews, compared with a heuristic algorithm currently used in industry.

# References

- R. K. Ahuja, A. V. Goldberg, J. B. Orlin, and R. E. Tarjan, Finding minimum-cost flows by double scaling, Math. Program., vol. 53, pp. 243-266, 1992.
- [2] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, Prentice Hall, Englewood Cliffs, N.J., 1993.
- [3] D. Chang, T. F. Gonzalez, and O. H. Ibarra, "A flow based approach to the pin redistribution problem for multi-chip modules," *Proc. GVLSI*, pp.114-119, 1994.
- [4] B. cherkasssky, "Efficient Algorithms for the Maximum Flow Problem," Mathematical Methods for the Solution of Economical Problems, vol. 7, pp.117-126, 1977.
- [5] S.-S. Chen, J.-J. Chen, S.-J. Chen, and C.-C. Tsai, "An Automatic Router for the Pin Grid Array Package," Proc. ASP-DAC, pp.133-136, 1999.
- [6] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT Press, 2000.
- [7] C.-P. Hsu, "General River Routing Algorithm," Proc. DAC, pp.578-583, 1983.
- [8] J. Hu and S. S. Sapatnekar, "A Timing-constrained Algorithm for Simultaneous Global Routing of Multiple Nets," Proc. ICCAD, pp.99-103, 2000.

- [9] Kennington, Helgason, Algorithms for Network Programming, John Wiley & Sons, 1980.
- [10] Y. Kubo and A. Takahashi, "A Global Routing Method for 2-Layer Ball Grid Array Packages," Proc. ISPD, pp.36-43, 2005.
- [11] E. S. Kuh, T. K. Kashiwabara, and T. Fujisawa, "On optimum single row routing," *IEEE Transactions on Computer-Aided Design of Integrated Cir*cuits and Systems, vol. 26, pp. 361-368, 1979.
- [12] A. Titus, B. Jaiswal, T. J. Dishongh, and A. N. Cartwright, "Innovative Circuit Board Level Routing Designs for BGA Packages," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 27, pp. 630-639, 2004.
- [13] C.-C. Tsai, C.-M. Wang, and S.-J. Chen, "NEWS: A Net-Even-Wiring System for the Routing on a Multilayer PGA Package," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 17, pp.182-189, 1998.
- [14] S.-S. Chen, J.-J. Chen, C.-C. Tsai, and S.-J. Chen, "An Even Wiring Approach to the Ball Grid Array Package Routing," *Proc. ICCD*, pp.303-306, 1999
- [15] UMC, " $0.13\mu m$  Flip Chip Layout Guideline," pp.6, 2004.
- [16] D. Wang, P. Zhang, C.-K. Chang, and A. Sen, "A Performance-Driven I/O Pin Routing Algorithm," Proc. ASP-DAC, pp.129-132, 1999.
- [17] X. Xiang, X. Tang, and D.-F. Wang, "Min-Cost Flow-Based Algorithm for Simultaneous Pin Assignment and Routing," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, Vol. 22, pp.870-878, 2003.
- [18] M.-F. Yu and W.-M. Dai, "Single-Layer Fanout Routing and Routability Analysis for Ball Grid Arrays," Proc. ICCAD, pp.581-586, 1995.
- [19] M.-F. Yu, J. Darnauer and W.-M. Dai, "Interchangeable Pin Rouing with Application to Package Layout," *Proc. ICCAD*, pp.668-673, 1996.