# LETTER Application-Dependent Interconnect Testing of Xilinx FPGAs Based on Line Branches Partitioning\*

Teng LIN<sup> $\dagger a$ </sup>, Jianhua FENG<sup> $\dagger$ </sup>, Nonmembers, and Dunshan YU<sup> $\dagger$ </sup>, Member

**SUMMARY** A novel application-dependent interconnect testing scheme of Xilinx Field Programmable Gate Arrays (FPGAs) based on line branches partitioning is presented. The targeted line branches of the interconnects in FPGAs' Application Configurations (ACs) are partitioned into multiple subsets, so that they can be tested with compatible Configurable Logic Blocks (CLBs) configurations in multiple Test Configurations (TCs). Experimental results show that for ISCAS89 and ITC99 benchmarks, this scheme can obtain a stuck-at fault coverage higher than 99% in less than 11 TCs.

key words: FPGA, application-dependent testing, line branch, Test Configuration

# 1. Introduction

Application-dependent testing of FPGAs tries to detect the defects in the resources utilized in a specific AC of FPGAs to ensure the application is fault-free. It can be adopted by FPGA manufacturers to provide customer-specific FPGAs and may also be performed by the users to find the faulty FPGAs in high reliability systems. The "Test pattern generation Optimization for FPGA" (TOF) procedure is proposed in [1] to speed up the test pattern generation process for the ACs, and a new CLB architecture with implicit scan feature is then presented to improve the fault coverage obtained by TOF for sequential ACs [2]. However, this feature is still unavailable in Xilinx FPGAs, and scan insertion is too costly for the ACs of Xilinx FPGAs. In [3]-[5], TCs are generated by modifying the used CLBs' configurations to facilitate the testing of the interconnects' faults. Although the schemes in [3]–[5] can obtain high coverage for the targeted faults, they make a common implicit assumption, which is that all the logic resources used in the ACs are Look-Up Tables (LUTs), while, in fact, modern FPGAs typically contain many non-LUT logic resources which are commonly used in the ACs.

We present a novel application-dependent interconnect testing scheme of Xilinx FPGAs in this letter. The interconnects in FPGAs' ACs are decomposed into line branches, and the targeted line branches are then partitioned into multiple subsets which satisfy that CLBs' configurations required to test the line branches in each subset are compatible. Multiple TCs are then generated with one TC for each

a) E-mail: linteng@ime.pku.edu.cn

subset. This scheme requires no scan insertion and poses no limitation on the types of the logic resources used in the ACs, and experimental results show that it can obtain high stuck-at fault coverage in only a few TCs. Therefore, it is effective for Xilinx FPGAs.

The rest of this letter is organized as follows. In Sect. 2, we describe the presented application-dependent interconnect testing scheme. Theoretical analysis and experimental results are then given in Sects. 3 and 4. Finally, Sect. 5 concludes this letter.

# 2. The Presented Scheme

This scheme first decomposes the interconnects in Xilinx FPGAs' ACs into line branches, which are the paths connecting the interconnects' sources and sinks, and then tests the stuck-at 0/1 faults of all targeted line branches by applying the 0-1-0 stimulus at their sources and observing the responses at their sinks.

In Xilinx FPGAs, a line branch may connect between Input/Output Blocks (IOBs), CLBs and hard IPs. As shown in Fig. 1, if a line branch takes IOB as its source (sink), then the 0-1-0 stimulus (the response) can be applied (observed) directly via the corresponding pin. For those line branches with CLBs' outputs as their sources, the CLBs are configured as Stimulus Generators (SGs) which have the general form shown in Fig. 1, to generate the 0-1-0 stimulus. Since it is hard to generate the 0-1-0 stimulus on hard IPs' outputs, our scheme doesn't test the line branches of which the sources are hard IPs' outputs. Because the number of these line branches is typically small compared with the total number of all line branches, excluding them from the targeted line branches can only cause little lose of the fault coverage. For the line branches of which the sinks are CLBs and hard IPs' inputs, since the values of all inputs of CLBs and



Fig. 1 Test setups for various types of line branches.

Manuscript received December 16, 2008.

Manuscript revised February 2, 2009.

<sup>&</sup>lt;sup>†</sup>The authors are with the Department of Microelectronics, Peking University, P.R.China.

<sup>\*</sup>This research was supported by National Natural Science Foundation of China under grant No. 90207018 and 60576030.

DOI: 10.1587/transinf.E92.D.1197

the hard IPs can be registered, we can construct Registering Circuits (RCs) at the sink terminals of these line branches and then get the testing results via the readback mechanism of Xilinx FPGAs [6]. For CLB's different inputs and outputs, the detailed configurations to implement the corresponding RCs and SGs are different. Therefore, if the set of CLB's ports is denoted by  $P = \{p_1, p_2, \dots, p_m\}$ , in which  $m = m_i + m_o, m_i (m_o)$  is the number of the input (output) ports and  $p_i$  is an input (output) when  $1 \le i \le m_i$  ( $m_i + 1 \le i \le m$ ), then a CLB's configuration library set  $T = \{t_1, t_2, \dots, t_m\}$ can be constructed, in which  $t_i$  implements the RC (SG) for  $p_i$  when  $1 \le i \le m_i$   $(m_i + 1 \le i \le m)$ . Note that for some  $\langle i, j \rangle$  pairs  $(1 \le i, j \le m)$ , one CLB may not be configured as  $t_i$  and  $t_j$  at the same time. The configurations  $t_i$  and  $t_j$  are said to be *imcompatible* in such cases, and *compatible* in the other cases. As a consequence, it may be impossible to simultaneously test all line branches connected some CLBs in one TC. Therefore, the targeted line branches need to be partitioned into multiple subsets that satisfy the condition that the CLBs' configurations required to test the line branches in each subset be *compatible*. Then, multiple TCs can be generated with one TC for each subset. Because the test time is dominated by the configuration time, the number of TCs, that is the number of the subsets, should be minimized. This partitioning problem can be converted into the vertex coloring problem of graph theory as follows. Suppose the set of the targeted line branches is  $L = \{l_1, l_2, \dots, l_n\}$ , and the two line branches  $l_i$  and  $l_j$  are said to be *conflictive*, iff the required configurations of their connected CLBs are *imcompatible*. Then an undirected graph  $G = \langle V, E \rangle$  with |V| = n can be formed as follows. In graph G, vertex  $v_i$  corresponds to  $l_i$ , and edge  $e_{ij}$  connecting  $v_i$  and  $v_j$  belongs to E iff  $l_i$  and  $l_j$  are conflictive. After determining an optimal vertex coloring of graph G, the line branches which have the same coloring of their corresponding vertices are partitioned into the same subset. Thus minimum TCs can be obtained to test all targeted line branches. Since the vertex coloring problem is a well-known NP-complete problem, the simple Recursive-Largest-First algorithm is adopted to solve it [7]. An illustrative example of which *n* is 7 is given in Fig. 2, and the resultant partitioning subsets are  $\{l_1, l_2, l_5\}, \{l_3, l_7\}$ and  $\{l_4, l_6\}$ . Consequently, three TCs are required to test all the 7 line branches.



Fig. 2 The example of line branches partitioning.

#### 3. Theoretical Analysis

If we denote the obtained coloring number of the used algorithm by O(G) and the maximum degree of the vertices in graph G by  $\triangle(G)$ , then it can be easily proved that  $O(G) \le \triangle(G)+1$ . Now let's consider the vertices' degrees of graph G. First we define the matrices A, B and C as follows. A is a  $m \times m$  matrix, in which  $a_{ij} \in \{0, 1\}$ , and  $a_{ij} = 0$  iff  $t_i$  and  $t_j$  are compatible  $(1 \le i, j \le m)$ . B is a  $(k + 1) \times m$  matrix, in which k is the number of the used CLBs in the AC,  $b_{1j} = 0$ and  $b_{ij}$  is the number of line branches connected to the *j*th port of the (i - 1)th CLB  $(2 \le i \le k + 1, 1 \le j \le m)$ . C is an  $n \times 4$  matrix, in which the  $c_{i1}$ th CLB's  $c_{i2}$ th port is  $l_i$ 's source, the  $c_{i3}$ th CLB's  $c_{i4}$ th port is  $l_i$ 's sink  $(1 \le i \le n)$ . If  $l_i$ 's source or sink is not connected to any CLB, then  $c_{ij} = 1$   $(1 \le i \le n,$  $1 \le j \le 4)$ . Based on these definitions, the degree of vertex  $v_i$   $(1 \le i \le n)$  can be expressed by:

$$d(v_i) = \sum_{1 \le j \le m} (a_{c_{i2}j} \cdot b_{c_{i1}j} + a_{c_{i4}j} \cdot b_{c_{i3}j})$$
(1)

Note that the maximum value of  $b_{ij}$  is the maximum fanout of CLBs' outputs, which is limited by the maximum fanout constraint. The value of this constraint may set to be hundreds, so the maximum value of  $d(v_i)$ , that is  $\triangle(G)$ , can be hundreds, which is a large number as the upper limit of O(G). However, with the same maximum fanout constraint, the values' distribution of  $b_{ij}$  is independent of the size of the application circuits. As a result, the vertices' average degree of graph G will not grow with n, and is typically small. Suppose the vertices' average degree is z, and the edges of G are independent. Then the probability f for one vertex to be connected to any other vertex is z/(n-1), which can be approximated by z/n for large n. Consequently, the probability of this vertex to have a degree of q, which is denoted by  $f_q$ , can be given by:

$$f_q = \begin{pmatrix} n-1\\q \end{pmatrix} f^q (1-f)^{n-1-q} \approx \frac{z^q e^{-z}}{q}$$
(2)

The second equality of Eq. (2) becomes exact when  $n \gg q$ . It shows that the vertices' degrees are distributed approximately as a Poisson distribution [8]. Based on this observation, for an AC of which O(G) is too large, we can make a tradeoff between the fault coverage and the test time as follows. We can choose an appropriate integer u, and then by deleting all vertices of which the degrees are larger than u from G, we can obtain an induced subgraph  $G_s$  which satisfy that  $O(G_s) \le u + 1$ . This means that the remained line branches can be tested with less than u + 1 TCs, and the fault coverage with the tradeoff can be denoted by:

$$\frac{G_s|}{n} \approx \sum_{0 \le q \le u} f_q \tag{3}$$

which shows that different u can lead to different tradeoff between the test time and the fault coverage.

| circuit | k    | N <sub>IP</sub> | N <sub>AL</sub> | п      | E      | z     | N <sub>TC</sub> | FC     | и  | $N'_{TC}$ | FC'    | $FC_T$ |
|---------|------|-----------------|-----------------|--------|--------|-------|-----------------|--------|----|-----------|--------|--------|
| s5378   | 101  | 0               | 1508            | 1508   | 5762   | 7.64  | 7               | 100%   | 18 | 6         | 97.87% | 99.96% |
| s9234   | 76   | 0               | 1238            | 1238   | 4987   | 7.96  | 7               | 100%   | 18 | 6         | 98.06% | 99.93% |
| s13207  | 306  | 0               | 3351            | 3351   | 12464  | 7.44  | 7               | 100%   | 18 | 7         | 97.38% | 99.97% |
| s15850  | 253  | 0               | 4116            | 4116   | 16611  | 8.07  | 8               | 100%   | 18 | 7         | 97.42% | 99.92% |
| s35932  | 679  | 0               | 11684           | 11684  | 38537  | 6.59  | 7               | 100%   | 14 | 5         | 99.79% | 98.23% |
| s38417  | 701  | 0               | 10696           | 10696  | 41771  | 7.81  | 8               | 100%   | 18 | 6         | 98.10% | 99.95% |
| s38584  | 796  | 0               | 14145           | 14145  | 63248  | 8.94  | 9               | 100%   | 18 | 7         | 95.38% | 99.77% |
| b17     | 1482 | 0               | 29368           | 29368  | 148194 | 10.09 | 10              | 100%   | 18 | 8         | 96.19% | 99.21% |
| b18     | 3591 | 7               | 72106           | 71687  | 361242 | 10.08 | 10              | 99.42% | 20 | 8         | 95.55% | 99.82% |
| b19     | 6003 | 14              | 143231          | 142240 | 735439 | 10.34 | 9               | 99.31% | 18 | 8         | 95.18% | 99.01% |
| b20     | 748  | 0               | 17293           | 17293  | 91837  | 10.62 | 9               | 100%   | 18 | 8         | 96.96% | 98.71% |
| b21     | 834  | 0               | 18048           | 18048  | 94015  | 10.42 | 9               | 100%   | 18 | 7         | 97.52% | 98.92% |
| b22     | 1198 | 0               | 26311           | 26311  | 139892 | 10.63 | 9               | 100%   | 18 | 7         | 95.65% | 98.70% |

Table 1 Experimental results of ISCAS89 and ITC99 benchmarks on xc4vlx60 FPGA.



Fig. 3 The degrees' distributions in graph G for s5378 and b22.

### 4. Experimental Results

The experiments are carried out for the largest ISCAS 89 and ITC99 benchmarks on Xilinx Virtex-4 xc4vlx60 FPGA. The CLB architecture of Virtex-4 FPGAs is analyzed to construct the CLB configuration library set T. Then, based on the presented scheme, we write a tool in C and Perl to automatically generate the TCs for the ACs. Table 1 gives the experimental data, in which  $N_{IP}(N_{AL})$  is the number of the used IPs (all line branches), |E| is the number of the edges in G,  $N_{TC}(N'_{TC})$  is the number of the generated TCs without (with) a tradeoff, and FC(FC') is the obtained fault coverage without (with) a tradeoff.  $FC_T$  is the theoretical coverage with the upper degree limit (u) chosen as in Table 1. We can find that for all experimental benchmarks,  $N_{TC}$  are less than 11, and FC are higher than 99%. The average degrees z are small compared with n, and almost the same despite of the benchmarks' sizes. Although the values of  $N_{TC}$ are already as small as z for all experimental benchmarks,

we still perform the tradeoff experiments, and the results show that FC' conform well to  $FC_T$ . Figure 3 shows that the real degrees' distributions in graph *G* for s5378 and b22 are close to the ideal Poisson distributions, which further verifies our theoretical analysis.

# 5. Conclusion

A novel application-dependent interconnect testing scheme of Xilinx FPGAs is presented. This scheme doesn't need to insert scan chains in FPGAs' ACs, and it also doesn't limit the types of the used logic resources. Experimental results show that it can obtain high stuck-at fault coverage in only a few TCs. Therefore, this scheme is effective for Xilinx FPGAs.

#### References

- M. Renovell, P. Faure, J.M. Portal, J. Figueras, and Y. Zorian, "TOF: A tool for test pattern generation optimization of an FPGA application oriented test," Proc. Asian Test Symp., pp.323–328, 2000.
- [2] M. Renovell, P. Faure, J.M. Portal, J. Figueras, and Y. Zorian, "IS-FPGA: A new symmetric FPGA architecture with implicit scan," Proc. IEEE Int. Test Conf., pp.924–931, 2001.
- [3] D. Das and N.A. Touba, "A low cost approach for detecting, locating, and avoiding interconnect faults in FPGA-based reconfigurable systems," Proc. IEEE Int. Conf. on VLSI Design, pp.266–269, 1999.
- [4] M.B. Tahoori, E.J. McCluskey, M. Renovell, and P. Faure, "A multiconfiguration strategy for an application dependent testing of FPGAs," Proc. IEEE VLSI Test Symp., pp.154–159, 2004.
- [5] M.B. Tahoori, "Application-dependent testing of FPGAs," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.14, no.9, pp.1024– 1033, 2006.
- [6] http://www.xilinx.com
- [7] W. Klotz, Graph Coloring Algorithms, Mathematics Report, Technical University Clausthal, 2002.
- [8] D. West, Introduction to Graph Theory, Prentice Hall, 1996.