# A Unified Framework for Equivalence Verification of Datapath Oriented Applications 

Bijan ALIZADEH ${ }^{\dagger \mathrm{a})}$, Member and Masahiro FUJITA ${ }^{\dagger}$, Nonmember


#### Abstract

SUMMARY In this paper, we introduce a unified framework based on a canonical decision diagram called Horner Expansion Diagram (HED) [1] for the purpose of equivalence checking of datapath oriented hardware designs in various design stages from an algorithmic description to the gatelevel implementation. The HED is not only able to represent and manipulate algorithmic specifications in terms of polynomial expressions with modulo equivalence but also express bit level adder (BLA) description of gate-level implementations. Our HED can support modular arithmetic operations over integer rings of the form $Z_{2^{n}}$. The proposed techniques have successfully been applied to equivalence checking on industrial benchmarks. The experimental results on different applications have shown the significant advantages over existing bit-level and also word-level equivalence checking techniques.


key words: equivalence verification, canonical form, RTL model, gatelevel implementation, decision diagram

## 1. Introduction

As system on a chip (SoC) designs continue to increase in size and complexity, more attention has been paid to design descriptions at higher levels of abstraction due to faster design changes and higher simulation speed. In such cases, a C-based high level (an algorithmic level) specification is described and then refined to a Register Transfer Level (RTL) description by adding more and more implementation details at different steps. These refinement/optimization steps are performed manually or by using various automated tools such as [2]. Subsequently, gate-level implementations are synthesized using RTL synthesis tools. Therefore there is a significant increase in the amount of verification efforts to achieve functionally correct description at each step, if traditional dynamic techniques such as simulation are used. This has led to a trend away from dynamic approaches and therefore formal equivalence checking methods have become very important to reduce time-to-market as much as possible. Although contemporary verification approaches have proposed different modeling and manipulation methods for each stage of this complete design flow [3]-[5], there is no total and uniform solution to perform equivalence checking of different models in a unified representation.

Figure 1 depicts a complete design flow where algorithmic specifications usually implement sequences of arithmetic polynomial computations with integer variables of

[^0]

Fig. 1 HED-based equivalence checking with a complete design flow for different applications.
infinite bit-widths, which can be directly represented by HED [1]. After refining algorithmic specifications to RTL descriptions, it is evident that RTL models are implemented with fixed-bit-width datapath architectures, so that polynomial computations are carried out over $n$-bit integers where the sizes of the entire datapaths are kept constant by way of signal truncation. In this paper we provide an extension to the HED for supporting modular arithmetic polynomials which can directly manipulate these RTL design descriptions.

The final issue explored in this paper is the representation of gate level implementation. After synthesizing the RTL models into gate-level implementations, arithmetic bit level netlists extracted from the gate-level circuits are also represented in HED, which gives efficient ways for equivalence checking of designs in various design stages. We use an adder-extraction technique proposed in [3] for equivalence checking of arithmetic circuits based on a Bit-Level Adder (BLA) representation in order to efficiently represent gate-level circuits in HED. This technique benefits from a more efficient reverse engineering process compared to the conventional methods such as arithmetic bit-level (ABL) description in [4]. The BLA directly maps a gate-level description to a word-level addition network, while the ABL is an intermediate representation between gate-level and wordlevel descriptions and further reverse-engineering process is necessary to extract the word-level description from ABL representation. Moreover, the HED is strong enough to be used as a formal model for different levels of abstraction during refinements from high-level specification to RTL as well as gate-level implementation models.

The paper is organized as follows: Section 2 provides a brief review of related works. In Sect. 3 we briefly describe
the HED package as a canonical decision diagram. Section 4 presents how to implement modular polynomial as well as arithmetic bit level in HED. In Sect. 5 we discuss about experimental results, and finally concluding remarks and future works are shown in Sect. 6.

## 2. Related Work

In the literature of graph-based canonical representation, various extensions over the classical Binary Decision Diagram (BDD) introduced in [6] have been derived to reduce the size of the graph or to speed up the construction process. Although BDD and their variants have found wide application in formal verification methods, they suffer from memory explosion problems when the designs grow in size and complexity with arithmetic operations.

To alleviate this issue, Word Level Decision Diagrams (WLDDs) have been proposed. WLDDs are graph-based representations for functions with a Boolean domain and an integer range [5]. Furthermore, Binary Moment Diagrams (BMD), Multiplicative BMD (*BMD) and Kronecker *BMD ( $\mathrm{K} * \mathrm{BMD}$ ) provide representations of integer-valued functions defined over bit-vectors and attempt to make the decomposition more efficient in terms of the graph size [7]. However, they fail to represent Boolean functions that can easily be represented using BDD. They still suffer from memory explosion when dealing with wide arithmetic operations due to defining functions over binary variables as a bit vector rather than integer variables.

In these approaches, BDD or WLDDs are utilized to represent symbolic expressions. However, system-level specifications such as those for digital signal processing contain a lot of arithmetic operations that must be encoded into bit level operations. Thus these techniques are not able to handle these designs due to the large number of Boolean variables. The latest word-level representation in the literature, i.e., Taylor Expansion Diagram (TED), supports multiplication and is able to represent functions with an integer domain and range [8]. It uses the Taylor series expansion as its decomposition method to represent a multivariate polynomial expression. However, TED does not model modulo arithmetic and therefore is not able to prove computational equivalence of polynomials over finite integer rings.

Verification approaches for bit-vector arithmetic such as term-rewriting, arithmetic decision procedures, polynomial decision diagrams and word-level ATPG have been studied in [9]-[11]. The authors in [11] have used integer arithmetic in constraint satisfaction for ILP-based simulation vector generation. However, these methods have tried to solve linear congruence using modulo arithmetic concepts and therefore are not applicable to prove polynomial equivalence modulo $2^{n}$.

In [12], [13], the properties of polynomials over finite integer ring have been used to analyze and verify polynomial expressions with module $2^{n}$. These are useful techniques for word-level reasoning. Their implementations are, however, based on manipulations of symbolic expressions
and they are not applicable to bit level equivalence checking problems.

In [4], people have proposed a normalization technique for verifying arithmetic circuits in a bounded model checking environment. Their technique operates on the arithmetic bit level (ABL) description of a given circuit which contains three objects: partial product generator, addition network and comparator. These objects can always be decomposed into a netlist of half-adders. After extracting an ABL representation from the gate netlist, they generate the reduced normal form by applying a complex process to keep the intermediate size of the ABL description as small as possible. In this paper, however, we utilize a new technique presented in [3] which gives a scalable implementation for large industrial benchmarks as will be discussed in Sect. 4.2.

## 3. Horner Expansion Diagram (HED)

The goal of this section is to introduce a graph-based representation called HED [1] for functions with a mixed Boolean and integer domain and an integer range to represent arithmetic operations at a high level of abstraction, while other proposed Word Level Decision Diagrams (WLDDs) [5], [7] are graph-based representations for functions with a Boolean domain and an integer range. In HED, functions to be represented are maintained as a single graph in canonical form. We assume that the set of variables is totally ordered and that all of the vertices constructed obey this ordering. Maintaining a canonical form requires obeying a set of conventions for vertex creation as well as weight manipulation.

HED is a binary graph-based representation which supports polynomial function by factorizing variables recursively as shown in Eq. (1), where const is a term which is independent of variable $X$, while linear is another term which is served as the coefficient of variable $X$.

$$
\begin{align*}
F(X, \ldots) & =F(X=0, \ldots)+X *\left[F^{\prime}(X=0, \ldots)+\cdots\right] \\
& =\text { const }+X * \text { linear } \tag{1}
\end{align*}
$$

In order to normalize the weights, any common factor is extracted by taking the greatest common divisor (gcd) of the argument weights. Once the weights have been normalized the hash table is searched for an existing vertex or creates a new one. Similar to that of BDDs, each entry in the hash table is indexed by a key formed from the variable and the two children, i.e. const and linear parts. As long as all vertices are created, the graph will remain in canonical form.
Example. Figure 2 illustrates how $f(X, Y, Z)=24-8 * Z+$ $12 * Y * Z-6 * X^{2}-6 * X^{2} * Z$ is represented by HED. Let the ordering of variables be $X, Y, Z$. First the decomposition w.r.t. variable $X$ is taken into account. As shown in Fig. 2 (a), after rewriting $f(X, Y, Z)=(24-8 Z+12 Y Z)+X *(-6 X-6 X Z)$ based on Eq. (1), const and linear parts will be $24-8 * Z+12 *$ $Y * Z$ and $-6 * X-6 * X * Z$ respectively. The linear part is decomposed w.r.t. variable $X$ again due to $X^{2}$ term. After that, the decomposition is performed w.r.t. variable $Y$ and then $Z$


Fig. 2 HED representation of $24-8 Z+12 Y Z-6 X^{2}-6 X^{2} Z$.
as shown in Fig. 2 (b). In order to reduce the size of an HED, redundant nodes are removed and isomorphic sub-graphs are merged. For this purpose the greatest common divisor of the argument weights are taken to figure out isomorphic sub-graphs as well as redundant nodes. In Fig. 2 (b), 24-8Z, $12 Z$ and $-6-6 Z$ are rewritten by $8[3+Z *(-1)], 12[0+Z *(1)]$ and $-6[1+Z *(1)]$ respectively. In order to normalize the weights, $\operatorname{gcd}(8,12)=4$ and $\operatorname{gcd}(0,-6)=-6$ are taken to extract common factors. Finally, Fig. 2 (c) shows the normalized graph where $\operatorname{gcd}(4,-6)=2$ is taken to extract common factor between out-going edges from $X$ node. In this representation, dashed and solid lines indicate const and linear parts respectively. Note that in order to have a simpler graph; paths to 0-terminal have not been drawn in Fig. 2 (c).

## 4. Extensions to HED

### 4.1 Modular Arithmetic in HED

In this section, we present how to implement modular multi-variate polynomials in HED by using ideas presented in [12], [13]. In order to keep the formulas short, we use the following multi-index notation. For $\mathbf{k}=\left\langle k_{1}, k_{2}, \ldots, k_{d}\right\rangle$ as the degrees corresponding to the $d$ variables $\mathbf{x}:=$ $\left\langle x_{1}, x_{2}, \ldots, x_{d}\right\rangle$, let

$$
\mathbf{x}^{\mathbf{k}}:=\prod_{i=1}^{d} x_{i}^{k_{i}} \quad \mathbf{k}!:=\prod_{i=1}^{d} k_{i}!\quad\binom{x}{k}:=\prod_{i=1}^{d}\binom{x_{i}}{k_{i}}
$$

The Smarandache function $S(m)$ in number theory is defined for a given positive integer $m$ as the smallest positive integer such that its factorial $S(m)$ ! is dividable by $m$. For example, the number 8 does not divide 1!, 2!, 3!, but does divide 4 !, so that $S(8)=4$. Generally, in the ring of interest, $Z_{2^{n}}$, let $S\left(2^{n}\right)=k$, such that $k$ is the smallest number satisfying $2^{n} \mid k$ !. For example, $S\left(8=2^{3}\right)=4$ as 8 divides $4!=4 * 3 * 2 * 1=2^{3} * 3$. This property can be used to find out shrinking polynomials from the original one.
Theorem 1: The polynomial $g(x)=\prod_{i=1}^{S(m)}(x+i)$ is equivalent to 0 in $Z_{m}$ and it is called vanishing polynomial. Here $S(m)$ denotes the Smarandache function. The proof is available in [14].

This theory indicates that if we can factorize a polynomial function $g(x)$ into a product of $S(m)$ consecutive numbers, then $g(x)$ can be reduced to 0 in $Z_{m}$. For example,
consider polynomial $f(x)=x^{6}+21 x^{5}+175 x^{4}+735 x^{3}+$ $1624 x^{2}+1764 x+720$ over $Z_{2^{4}}$, where $S\left(2^{4}\right)=6$. Since $f(x)$ can be described as a product of 6 consecutive numbers, i.e. $(x+6)(x+5)(x+4)(x+3)(x+2)(x+1)$, then $f(x) \bmod 2^{4} \equiv 0$. Lemma 1: If $2^{n} \mid a \mathbf{k}$ !, then $a \mathbf{x}^{\mathbf{k}}$ is reducible modulo $2^{n}$. The reduced polynomial is $a \mathbf{x}^{\mathbf{k}}-a * \prod_{j=1}^{d} \prod_{i=1}^{k_{j}}\left(x_{j}+i\right)$, where $d$ is the number of variables and $k_{j}$ denotes the degree of $j^{\text {th }}$ variable in the monomial $\mathbf{x}^{\mathbf{k}}$.

For example consider the monomial $f\left(x_{1}, x_{2}, x_{3}\right)=$ $2 x_{1}{ }^{3} x_{2}{ }^{2} x_{3}$ in $Z_{2}{ }^{3}$. Here $a=2$ and $\mathbf{k}!=k_{1}!\cdot k_{2}!\cdot k_{3}!=$ $3!.2!.1!=12$. Since $2^{3} \mid 2 * 12$, therefore this monomial is reducible. On the other hand, based on a generalization of theorem 1 to multi-variate polynomials, we know that $V\left(x_{1}, x_{2}, x_{3}\right)=\prod_{j=1}^{3} \prod_{i=1}^{k_{j}}\left(x_{j}+i\right)$ is a vanishing polynomial in $Z_{2}{ }^{3}$. Therefore, we can rewrite $f_{\text {REDUCED }}\left(x_{1}, x_{2}, x_{3}\right)=$ $f\left(x_{1}, x_{2}, x_{3}\right)-a * V\left(x_{1}, x_{2}, x_{3}\right)=2 x_{1}^{3} x_{2}^{2} x_{3}-2\left(x_{1}\right)\left(x_{1}+1\right)$ $\left(x_{1}+2\right)\left(x_{2}\right)\left(x_{2}+1\right)\left(x_{3}\right)$ to obtain a reduced polynomial function where the monomial $2 x_{1}^{3} x_{2}^{2} x_{3}$ has been removed.

If $a \mathbf{k}$ ! is not dividable by $2^{n}$, although it is not possible to reduce the degrees of monomials, it might be possible to reduce the coefficient of the term of maximal degree. The following theorem helps us to do so, while the proof of this theorem is provided in [13].
Theorem 2: Every polynomial $f \in Z_{2^{n}}^{d}$ has a unique representation of the form $f(\mathbf{x})=\sum_{\mathbf{k} \in N}{ }^{d} \alpha_{\mathbf{k}} \cdot \mathbf{x}^{\mathbf{k}}$, where $\alpha_{\mathbf{k}} \in$ $\left\{0,1, \ldots, 2^{n-v 2(\mathbf{k}!)}-1\right\}$ and $v 2(\mathbf{k}!)<n . v 2(\mathbf{k}!)$ is defined as the maximum degree $x$ such that $2^{x}$ divides $\mathbf{k}$ !. In other words, $v 2(\mathbf{k}!)$ gives the number of factors 2 in $\mathbf{k}$ !.

Now, let us consider those monomials in the function where $a \mathbf{k}$ ! is not dividable by $2^{n}$. In this case, if $a>2^{n-\nu 2(\mathbf{k}!)}-1$, it means that $a \notin \alpha_{\mathbf{k}}$ (from theorem 2) and therefore the coefficient $a$ is reducible. We write the coefficient $a=q \cdot 2^{n-v 2(\mathbf{k}!)}+r$, where $q$ is the quotient and $r$ is the remainder. Therefore $a \mathbf{x}^{\mathbf{k}}=q \cdot 2^{n-v 2(\mathbf{k}!)} \cdot \mathbf{x}^{\mathbf{k}}+r \cdot \mathbf{x}^{\mathbf{k}}$. It should be noted that the term $q \cdot 2^{n-v 2(\mathbf{k}!)} \cdot \mathbf{x}^{\mathbf{k}}$ is again reducible from lemma 1. The second term, i.e. $r . \mathbf{x}^{\mathbf{k}}$, is already in reduced form since $r<2^{n-\nu 2(\mathbf{k}!)}$. For example, consider the monomial $3 x_{1}{ }^{3} x_{2}{ }^{2} x_{3}$ in $Z_{2}{ }^{3}$, where $a=3$ and $v 2(\mathbf{k}!)=2\left(3!2!1!\right.$ is dividable by $\left.2^{2}\right)$. Since $2^{3}$ does not divide 3 .(3! 2 !), and $3>2^{3-2}-1\left(a>2^{n-v 2(\mathbf{k}!)}-1\right)$, we represent $3 x_{1}^{3} x_{2}^{2} x_{3}=q \cdot 2^{3-2} \cdot x_{1}^{3} x_{2}^{2} x_{3}+r \cdot x_{1}^{3} x_{2}{ }^{2} x_{3}$, where $q=1$ and $r=1$. The monomial $2 \cdot x_{1}^{3} x_{2}^{2} x_{3}$ can be reduced to a lower total degree, as shown before, but $x_{1}{ }^{3} x_{2}{ }^{2} x_{3}$ is already in reduced form and further reduction is not possible.

Figure 3 depicts the algorithm for reducing a given polynomial into a unique form based on HED manipulations. If $2^{n}$ divides $a \times \bar{k}!$, the monomial is reduced as shown in line 11. Otherwise, if $a>\alpha_{\bar{k}}$, any monomial $a \bar{x}^{\bar{k}}$ is written as $q \times\left(\alpha_{\bar{k}}+1\right) \bar{x}^{\bar{k}}+r \times \bar{x}^{\bar{k}}$, where $q \times\left(\alpha_{\bar{k}}+1\right) \bar{x}^{\bar{k}}$ is reducible from Lemma 1 (line 14), while $r \times \bar{x}^{\bar{k}}$ is not reducible any more since $r<\alpha_{\bar{k}}$. Hence we consider the later term as a part of final result in another HED (result in line 15). If $a \leq \alpha_{\bar{k}}$, we say the monomial is neither degree-reducible nor coefficient-reducible and the monomial is added to the

```
Modular_HED (HED poly, int n)
1 poly = multi-variate polynomial in HED before reduction;
\(2 n=\) the number of bits (variable's bit-width); result=0;
3 for each monomial mon (with the largest degree) in poly do
    \(\mathrm{a}=\) Coefficient (mon);
    for each variable \(X_{i}\) do
            \(k_{i}=\operatorname{Degree}\left(x_{i}\right)\) in mon;
    end for
    \(\bar{k}!=\prod k_{i}!; \quad v 2(\bar{k}!)=\sum v 2\left(k_{i}!\right) ;\)
    \(\alpha_{\bar{k}}=2^{n-v 2(\bar{k}!)}-1 ;\)
    if ( \(\left.2^{n} \mid a \times \bar{k}!\right)\) then \(/ / m o n\) is degree-reducible
        reduced \(Q=\) mon \(-a \times \prod_{j=1}^{d} \prod_{i=0}^{k_{j}-1}\left(x_{j}+i\right)\)
    else if \(\left(a>\alpha_{\bar{k}}\right)\) then \(\quad / / m o n\) is coefficient-reducible
        \(q=a /\left(\alpha_{\bar{k}}+1\right) ; r=a \%\left(\alpha_{\bar{k}}+1\right) ;\)
        reduced \(Q=q\left(\alpha_{\bar{k}}+1\right)\left(\prod_{j=1}^{d} x_{j}^{k_{j}}-\prod_{j=1}^{d} \prod_{i=0}^{k_{j}-1}\left(x_{j}+i\right)\right)\)
        result \(=\) result \(+r \times \prod_{j=1}^{d} x_{j}^{k_{j}}\);
    else \(\quad\) reduced \(Q=0 ; \quad\) result \(=\) result + mon;
    end if
    updated poly \(=\) poly - mon + reduced \(Q\);
19 if (updated poly \(=0\) ) then return result;
20 else poly = updated poly; end if end for
21 return result;
```

Fig. 3 Modular arithmetic in HED.
result (line 16). Finally the polynomial is updated with the reduced mon (line 18). This concept of monomial reduction is iteratively applied to a given polynomial, in order to reduce it to a unique form in HED according to Theorem 2. In the worst case, in each iteration, the mon computation requires $O(d * m)$ multiplications, where $m$ is the maximum degree of each variable and $d$ is the number of variables. Therefore, the worst case complexity of the algorithm is $O\left(d * m^{d+1}\right)$.
Example: Let us consider the following polynomial which has 2 variables and 9 monomials and suppose variables' bitwidth is $4(n=4)$ :

$$
\begin{aligned}
\text { poly }= & 4 x^{3} y^{3}+17 x^{3} y^{2}+12 x^{3} y+14 x^{2} y^{3}+48 x^{2} y^{2} \\
& +36 x^{2} y+13 x y^{3}+32 x y^{2}+24 x y
\end{aligned}
$$

Iteration 1. mon $=4 x^{3} y^{3}$ is taken into account since its variables have the largest degree w.r.t. those of other monomials and $a=4 ; k_{1}=3 ; k_{2}=3 ; \mathbf{k}!=k_{1}!* k_{2}!=3!* 3!=36$. Since $a * \mathbf{k}!/ 2^{n}=4 * 36 / 2^{4}=9$, the monomial is degree-reducible which means this monomial can be removed. The reduced monomial is computed as follows (line 11 in Fig. 3):

$$
\left.\begin{array}{rl}
\text { reduced } Q= & 4 x^{3} y^{3}-4(x)(x+1)(x+2)(y)(y+1)(y+2) \\
= & -12 x^{3} y^{2}-8 x^{3} y-12 x^{2} y^{3}-36 x^{2} y^{2} \\
& -24 x^{2} y-8 x y^{3}-24 x y^{2}-16 x y
\end{array}\right\}
$$

$$
\begin{aligned}
= & 5 x^{3} y^{2}+4 x^{3} y+2 x^{2} y^{3}+12 x^{2} y^{2}+12 x^{2} y \\
& +5 x y^{3}+8 x y^{2}+8 x y
\end{aligned}
$$

Iteration 2. poly $=$ updated poly and mon $=5 x^{3} y^{2}$ are taken into account, where $a=5 ; k_{1}=3 ; k_{2}=2 ; \mathbf{k}!=$ $k_{1}!* k_{2}!=3!* 2!=12 ; v 2(\mathbf{k}!)=2 ; \alpha_{\mathbf{k}}=2^{n-v 2(\mathbf{k}!)}-1=$ $2^{4-2}-1=3$. Since $a * \mathbf{k}!/ 2^{n}=5 * 12 / 2^{4}=15 / 4$, the monomial is not degree-reducible, however it is coefficient reducible because $a>\alpha_{\mathbf{k}}(5>3)$. Therefore we rewrite $5 x^{3} y^{2}=q *\left(\alpha_{\mathbf{k}}+1\right) * x^{3} y^{2}+r * x^{3} y^{2}$, where $q=r=1$. The second term $r * x^{3} y^{2}=x^{3} y^{2}$ is not reducible, while the first term $q *\left(\alpha_{\mathbf{k}}+1\right) * x^{3} y^{2}=4 x^{3} y^{2}$ can be removed as follows (lines 13-15 in Fig. 3):

$$
\begin{aligned}
\text { reduced } Q & =4\left[x^{3} y^{2}-x(x+1)(x+2)(y)(y+1)\right] \\
& =-4 x^{3} y-12 x^{2} y^{2}-12 x^{2} y-8 x y^{2}-8 x y
\end{aligned}
$$

result $=$ result $+r * x^{3} y^{2}=0+x^{3} y^{2}=x^{3} y^{2}$
updated poly $=$ poly - mon + reduced $Q=2 x^{2} y^{3}+5 x y^{3}$
Iteration 3. poly $=$ updated_poly and $m o n=2 x^{2} y^{3}$ are taken into account, where $a=2 ; k_{1}=2 ; k_{2}=3 ; \mathbf{k !}=$ $k_{1}!* k_{2}!=2!* 3!=12 ; v 2(\mathbf{k}!)=2 ; \alpha_{\mathbf{k}}=2^{4-2}-1=3$. Since $a * \mathbf{k}!/ 2^{n}=2 * 12 / 2^{4}=3 / 2$ and $a<\alpha_{\mathbf{k}}(2<3)$ the monomial is neither degree-reducible nor coefficient reducible. Therefore this monomial can not be reduced (line 16 in Fig. 3):

$$
\begin{aligned}
& \text { reduced } Q=0 \\
& \text { result }=\text { result }+ \text { mon }=x^{3} y^{2}+2 x^{2} y^{3} \\
& \text { updated poly }=\text { poly }- \text { mon }+ \text { reduced } Q=5 x y^{3}
\end{aligned}
$$

Iteration 4. poly $=$ updated_poly and $m o n=5 x y^{3}$ are taken into account, where $a=5 ; k_{1}=1 ; k_{2}=3 ; \mathbf{k}$ ! = $k_{1}!* k_{2}!=1!* 3!=6 ; v 2(\mathbf{k}!)=1 ; \alpha_{\mathbf{k}}=2^{4-1}-1=7$. Since $a * \mathbf{k}!/ 2^{n}=5 * 6 / 2^{4}=15 / 8$ and $a<\alpha_{\mathbf{k}}(5<7)$ the monomial is neither degree-reducible nor coefficient reducible. Therefore the final polynomial is as follows:

$$
\text { result }=\text { result }+ \text { mon }=x^{3} y^{2}+2 x^{2} y^{3}+5 x y^{3}
$$

### 4.2 Bit-Level Adder Extraction

Arithmetic functions in digital circuits are almost always implemented using addition as the base function. For instance, multiplication is based on addition where two stages are required. In the first stage, the partial products are generated which are the inputs to the second stage that is a collection of addition circuits. The question is which pairs of vectors have been added with each other in the addition network. The proposed method consists of an initialization phase, which will be followed by a stepwise word-level adder-extraction process. The adder-extraction approach maps the logic-optimized gate net-list to a global Bit-Level Adder (BLA) representation.

### 4.2.1 Definitions

In this section, we are going to present a compatible representation to cover all possible addition processes for a multiplier without exhaustively checking all of them. For this
purpose, the following definitions are used.
Definition 1 (ADD_SET): $A D D_{-} S E T$ is the set of word-level results obtained from extracted adders. In the initialization, this set consists of initial partial product vectors. Then after extracting each word-level adder, this set will be updated by merging the two added partial product vectors.
Definition2 (LSB_POS(X)): The function: $L S B \_P O S(X)$ gives the Least-Significant Bit (LSB) position of $X$, which is a member of $A D D_{-} S E T$. In the initialization phase of an $n$-bit multiplier, where the members of $A D D-S E T$ are partial product vectors $\left(P_{i}\right)$, the $L S B \_P O S$ for a $P_{i}$ can easily be calculated. If the partial product vectors are initially placed in $A D D \_S E T$ in an ascending order in terms of their LSB positions, then the LSB position of $P_{i}$ will be equal to $i$ :

$$
\begin{equation*}
L S B \_P O S\left(P_{i}\right)=i \tag{2}
\end{equation*}
$$

Definition3 (MSB_POS(X)): The same function can be defined to determine the position of the Most-Significant Bit (MSB) of a member in $A D D \_S E T$. In the initialization phase of an $n$-bit multiplier, if the partial product vectors are sorted in an ascending order in terms of their LSB positions, we have:

$$
\begin{equation*}
M S B \_P O S\left(P_{i}\right)=i+n-1 \tag{3}
\end{equation*}
$$

Definition2 and Definition3 determine the range of bits in a vector, which can be either " 1 " or " 0 ". Calculating the $M S B \_P O S$ and $L S B \_P O S$ functions for all the $A D D S E T$ members makes it possible to evaluate the ABL representation of all the possible adder-extractions from the $A D D \_S E T$ members.

### 4.2.2 ADD_SET Initialization

For each multiplier, there are two possible $A D D_{-} S E T s$ in the initialization. Exchanging the input vectors: $A$ and $B$ results in two different partial product vector initializations. Consider we have all the bit-level partial products of $A$ and $B$ as inputs: $A_{i} B_{j}(i=1,2, \ldots, n$ and $j=1,2, \ldots, n)$. It means that for an $n$-bit multiplier, there are $n^{2}$ numbers of available bit-level partial products. The initial word-level partial products can be arranged in two ways leading to two separate initial ADD_SETs. The two possible initial ADD_SETs are listed below, where a vector $P_{i}$ has been expressed by a $1 \times n$ matrix.

$$
\begin{gather*}
A D D_{-} \text {SET\# } 1=\left\{\hat{P}_{i} \mid \hat{P}_{i}=\left[A_{i} B_{n}, A_{i} B_{n-1}, \ldots, A_{i} B_{1}\right]\right\}  \tag{4}\\
A D D_{-} \text {SET\# } 2=\left\{P_{i} \mid P_{i}=\left[A_{n} B_{i}, A_{n-1} B_{i}, \ldots, A_{1} B_{i}\right]\right\}  \tag{5}\\
(i=1,2, \ldots, n)
\end{gather*}
$$

Our algorithm starts with the above two possible initial $A D D_{-} S E T s$. Then it will search for word-level adders using an efficient bit-level representation of adders. As soon as the first adder is extracted, one of the above ADD_SETs, which does not match the extracted adder, will be eliminated and the algorithm carries on with the other $A D D \_S E T$.

### 4.2.3 Example

Consider $F=A \times B+3 \times C=P_{1}+P_{2}+P_{3}+P_{4}+C+2 \times C$, where $A, B$ and $C$ are 4 -bit unsigned integer vectors and $P_{i}$ is a partial product vector, i.e., $P_{i}=A_{i} * B_{i}$. The algorithm starts with a set of vectors for the addition network, called $A D D \_S E T$. In this example the initial $A D D_{-} S E T$ is $\left\{P_{1}, P_{2}, P_{3}, P_{4}, C, 2 C\right\}$. Then a first-level XOR search will be executed to extract all the XOR terms, while their input signals are from the $A D D-S E T$. Each extracted XOR term may refer to a word-level adder with respect to its input signals. In this way the extracted XOR terms will be categorized into some groups, in which each group refers to a specific word-level adder. Generally we can represent each word-level addition in an $A D D S E T$ by the Bit-Level Adder (BLA) schematic in Fig. 4. If $X_{1}$ and $X_{2}$ are unsigned integers, we have:

$$
\begin{align*}
& k=L S B \_P O S\left(X_{2}\right)-L S B \_P O S\left(X_{1}\right)  \tag{6}\\
& F A \_N U M=M S B \_P O S\left(X_{1}\right)-L S B \_P O S\left(X_{2}\right)  \tag{7}\\
& H A \_N U M=M S B \_P O S\left(X_{2}\right)-\operatorname{MSB\_ POS}\left(X_{1}\right) \tag{8}
\end{align*}
$$

Where $H A \_N U M$ and $F A \_N U M$ are the number of halfadders and full-adders in the BLA representation of $X_{1}+X_{2}$ and $L S B / M S B \_P O S\left(X_{i}\right)$ gives the Least/Most-significant bit position of $X_{i}$. Also we have:

$$
\begin{align*}
& L S B \_P O S\left(X_{1}+X_{2}\right) \\
& \quad=\min \left\{L S B \_P O S\left(X_{1}\right), L S B \_P O S\left(X_{2}\right)\right\}  \tag{9}\\
& M S B \_P O S\left(X_{1}+X_{2}\right) \\
& \quad=\max \left\{M S B \_P O S\left(X_{1}\right), M S B \_P O S\left(X_{2}\right)\right\}+\varepsilon \tag{10}
\end{align*}
$$

If adding $X_{1}$ and $X_{2}$, results in a carry overflow, $\varepsilon$ in (10) will be equal to " 1 ", otherwise it would be zero.

After the first-level XOR extraction and categorizing them based on their references adder representation in Fig. 4, the number of XOR terms in each category must be equal to $F A \_N U M+1$. Therefore, if this condition is true,


Fig. 4 BLA representation.
the equivalence checking of next-level XOR terms and carry signals will be executed to map the net-list to a dynamically built reference adder. If this process fails the whole net-list is not equivalent to the high-level description. This issue provides a fast convergence for the proposed method. Otherwise, if the XOR category does not include $F A \_N U M+1$ number of XOR terms, the category will be rejected and they will be merged with the new extracted XOR terms in the next adder-extraction iterations to complete the new XOR categories. This verification process succeeds if the whole net-list is mapped to the high-level description.

As an example consider the addition network of $P_{1}+$ $P_{2}+P_{3}+P_{4}$ in Fig. 5, which refers to a 4-bit unsigned integer multiplier. In the first iteration five first-level XOR gates will be extracted. These XOR terms are labeled by $X_{i, j}$, which refers to the $j^{t h}$ first-level XOR gate extracted in the $i^{t h}$ iteration. The non-labeled XOR gates in the net-list are the next-level XORs. The other parameter $P_{i}(j)$ is the $j^{t h}$ MSB of the $i^{\text {th }}$ partial product vector in ADD_SET. The XOR terms can be divided into three categories. The algorithm starts evaluating each category by assigning it a reference adder. The first category is $\left\{X_{1,1}\right\}$, which refers to the word-level addition of $P_{1}+2 P_{2}$. The evaluation of this category easily results in its rejection as $F A \_N U M+1=3 \neq 1$ based on (7) above.

Then the algorithm evaluates the second XOR category: $\left\{X_{1,2}, X_{1,3}, X_{1,4}\right\}$, which refers to $P_{2}+2 P_{3}$ and satisfies


Fig. 5 Mapping the word-level adders to a 4-bit unsigned multiplier.
the condition, i.e. $F A \_N U M+1=3$. Therefore, the nextlevel XORs will be extracted after performing the equivalence checking between each carry signal in the BLA model and the original circuit from LSB to MSB. After mapping the reference adder to the original circuit, $A D D \_S E T$ will be updated to $\left\{P_{1}, P_{2 \_3}, P_{4}\right\}$, in which we have:

$$
\begin{aligned}
& P_{2 \_3}=P_{2}+P_{3}=\left[P_{2 \_3}(6), \ldots, P_{2 \_3}(1)\right] \\
& \operatorname{MSB} P O S\left(P_{2 \_3}\right)=\max \{n+1, n+2\}+1=7 \\
& L S B \_P O S\left(P_{2 \_3}\right)=\min \{2,3\}=2, \quad k\left(P_{2 \_3}\right)=3-2=1
\end{aligned}
$$

The same process will be applied for the third category: $\left\{X_{1,5}\right\}$, and the second adder will be extracted. The second extracted adder calculates $P_{1}+8 P_{4}$. As a result the next updated $A D D \_S E T$ will be equal to $\left\{P_{1 \_4}, P_{2 \_3}\right\}$. After evaluating the first iteration's XOR terms, the second iteration begins and extracts five first-level XOR terms, which are also shown in Fig. 5. All these XOR terms are mapped to one category, which refers to $P_{1 \_4}+2 P_{2 \_3}$. It is interesting that the value of $F A \_N U M+1$ for the equivalent adder of this category is 6 and the previously rejected XOR term: $X_{1,1}$, will complete the category. After mapping the carry signals, the final adder will be extracted from the net-list and the final $A D D \_S E T$ equal to $\left\{P_{1 \_2 \_3 \_4}\right\}$ will be obtained. Note that for an $n$-bit booth-encoded multiplier, i.e, $A \times B$, we have the following initial product vectors for the addition network:

$$
\begin{aligned}
A D D \_S E T= & \left\{P_{m} \mid P_{m}=\left(-2 A_{i}+A_{i-1}+A_{i-2}\right) \times B\right\} \\
& (i=2 m+1 \leq n),\left(A_{-1}=0, m=0,1,2, \ldots\right)
\end{aligned}
$$

With the formula above, we can extract and map adders in a similar way as shown above.

## 5. Experimental Results

In this section, we report two types of experiments that show the superiority of the HED compared to other approaches. In the first experiment, we have run Modular HED on some benchmarks in order to compare the results with those of proposed method in [12] as well as BDD, *BMD, SATminiSat [15] and MILP [11]. In another experiment, we have tried to represent BLA descriptions in HED. We implemented the HED package in $\mathrm{C}++$ and carried out on an Intel 1.8 GHz Core 2 and 1 GByte of main memory running Windows XP.

### 5.1 Comparison with Other Technique

In this experiment, we follow up on comparing our approach with CUDD-BDD, *BMD, SAT-miniSat and MILP by employing phase-shift keying (PSK) used in digital communication, anti-aliasing functions (AAF), digital image rejection unit (DIRU), Degree-4 Filter (D4F) Savitzky Golay (SG) filter, MIBENCH (MI) polynomial used in automotive applications, Chebyshev (CHEB) polynomial and Quartic Spline (QS) benchmarks [12]. For each test-case, we are given two descriptions to be verified whether or not they are equivalent. Some of these designs were available as RTL

Table 1 Modular HED versus contemporary approaches.

| Benchmark | Specs | Modular-HED | Technique [12] | CUDD-BDD | $*$ BMD | miniSat [15] | MILP [11] |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Var/Deg/n | Nodes/Time (s) | Time (s) ${ }^{*}$ | Nodes/Time (s) | Nodes/Time | Vars/Clauses/Tim | Time (s) |
| AAF | $1 / 6 / 16$ | $8 / 0.016$ | 6.81 | $1.1 \mathrm{M} / 32.2$ | NA $/>500$ | $3.9 \mathrm{~K} / 107 \mathrm{~K} />500$ | $>500$ |
| D4F | $1 / 4 / 16$ | $6 / 0.031$ | 4.95 | $27 \mathrm{M} / 20.3$ | $\mathrm{NA} />1000$ | $25 \mathrm{~K} / 76 \mathrm{~K} />1000$ | $>1000$ |
| CHEB | $1 / 5 / 16$ | $7 / 0.01$ | 5.95 | $1 \mathrm{M} / 26.9$ | $\mathrm{NA} />500$ | $3.5 \mathrm{~K} / 86 \mathrm{~K} />500$ | $>500$ |
| PSK | $2 / 4 / 16$ | $16 / 0.032$ | 13.48 | $\mathrm{NA} />500$ | $\mathrm{NA} />500$ | $52 \mathrm{~K} / 142 \mathrm{~K} />500$ | $>500$ |
| DIRU | $2 / 4 / 16$ | $9 / 0.016$ | 14.4 | $\mathrm{NA} />1000$ | $\mathrm{NA} />1000$ | $10 \mathrm{~K} / 30 \mathrm{~K} />1000$ | $>1000$ |
| MI | $2 / 9 / 16$ | $26 / 0.2$ | 17.5 | $23 \mathrm{M} / 39.4$ | $\mathrm{NA} />1000$ | $24 \mathrm{~K} / 69 \mathrm{~K} />1000$ | $>1000$ |
| SG | $5 / 3 / 16$ | $35 / 0.24$ | 6.1 | NA $/>1000$ | $\mathrm{NA} />1000$ | $64 \mathrm{~K} / 190 \mathrm{~K} />1000$ | $>1000$ |
| QS | $7 / 4 / 16$ | $19 / 0.09$ | 32.4 | $\mathrm{NA} />1000$ | $\mathrm{NA} />1000$ | $76 \mathrm{~K} / 211 \mathrm{~K} />1000$ | $>1000$ |

NA: Not Applicable;
K: Thousand;
M: Million

* from [12] and adjusted to CPU speed of 1.8 GHz
code, while the others were available as high-level description in C code. The related RTL codes for these high-level descriptions were automatically generated using high-level synthesis tools. After that, two descriptions are represented in HED and reduction procedure has been applied simultaneously in order to find out whether or not two descriptions are equivalent.

It was found that BDD could solve the problem for AAF, D4F, MI and CHEB benchmarks, but it failed for PSK, DIRU, SG and QS test-cases. For all benchmarks in Table 1, miniSat, *BMD and MILP could not solve the problem within the time-limit of 500 or 1000 seconds. The *BMD is a word-level decision diagram which is able to represent functions with Boolean domain and an integer range, while HED is capable to represent functions with hybrid Boolean/integer domains and integer ranges. Therefore, when the degree $(k)$ of a polynomial increases, *BMDs are not satisfactory due to the fact that their size increases $O\left(n^{k}\right)$, where $n$ is the bit-vector size. In our benchmarks the bitvector size is 16 . For instance consider $F=A * B$ where $A$ and $B$ are 16 -bits wide. To construct $* \mathrm{BMD}$, we have to take into account $A=2^{15} * a_{15}+\cdots+a_{0}$ and $B=2^{15} * b_{15}+\cdots+b_{0}$ and then compute $A * B$ w.r.t. bit-level variables ( $a_{i}$ and $b_{i}$ for $i=0$ to 15 ). It is clear that the number of nodes will be increased rapidly due to bit-level variables. However in HED, we just need to represent for example $A * B$, instead of $\left(2^{n-1} * a_{n-1}+\cdots+a_{0}\right) *\left(2^{n-1} * b_{n-1}+\cdots+b_{0}\right)$, where $A$ and $B$ are taken into account as word-level variables. Obviously, increasing the number of bits (bit-width) only increases the amount of computations needed to reduce polynomials as mentioned in Sect. 4.1.

Moreover, our approach is not only faster than BDDs, *BMDs, miniSat and MILP based equivalence checking techniques, but also reports better performance in comparison with the method in [12] which is based on symbolic algebra tools such as MAPLE. For instance, consider DIRU benchmark. In Modular-HED, the CPU time required to check the equivalence of the given designs is 0.016 s which is much less than 14.4 s of the proposed method in [12]. Obviously, the performance differences are multiple orders of magnitude due to the differences between bit-wise analysis
and word-wise analysis. Also the proposed method in [12] needs to call MAPLE for each computation while in our approach all computations are carried out internally. In addition, after reducing the polynomials in Modular-HED, equivalent polynomials are automatically detected where the computation time is $O(1)$. If two polynomials $F_{1}$ and $F_{2}$ are not equivalent, $F_{1}-F_{2}$ in HED, symbolically represents the difference between two functions which is directly related to the source of bugs. It may not be easy to find out bugs if symbolic algebra tools have been employed. In order to prove our claim we need to give more experimental results. However, we leave it as a future work to extend our approach for arithmetic circuit debugging.

### 5.2 Scalability of Our Approach

In the second experiment, we have generated various polynomials which differ from each other in terms of the maximum degrees and the number of variables. We are given a description according to Eq. (12), where $d$ is the number of variables which varies from 2 to 8, while Deg indicates degree of different polynomials. Similar to the first experiment, for all benchmarks the bit-length $n$ is 16 . For example if $d=2$ and $\operatorname{Deg}=4$ we will have $\left(x_{1}\right)\left(x_{1}+1\right)\left(x_{1}+2\right)\left(x_{1}+3\right)$ $\left(x_{2}\right)\left(x_{2}+1\right)\left(x_{2}+2\right)\left(x_{2}+3\right)$.

$$
\begin{equation*}
\text { polynomial }=\prod_{j=1}^{d} \prod_{i=0}^{\text {Deg-1 }}\left(x_{j}+i\right) \tag{12}
\end{equation*}
$$

After representing each polynomial in HED and then applying Modular_HED (Fig. 3), the experimental results are summarized in Table 2 in comparison with those of symbolic algebra tools like MAPLE. For most benchmarks in Table 2, the method presented in [12] could not solve the problem within the time-limit of 1200 seconds. In this table, the CPU time (Time) in seconds, memory usage (Mem) in MBytes and the number of HED nodes (Nodes) are reported for different polynomials, where $d$ and $D e g$ give the number of variables and degree of polynomials respectively. It should be noted that the number of nodes for symbolic algebra tools is not applicable (NA) and we could not report the memory usage. For instance, consider the last row in this

Table 2 Scalability of Modular-HED for different number of variables and degrees in comparison with symbolic algebra techniques.

| Time (s) / Mem (MB) / Nodes |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Deg |  | d $=2$ | $\mathrm{d}=4$ | $\mathrm{d}=6$ | d $=8$ |
| 8 | Symbolic Algebra in [12] | 5.7 / NA / NA | 101 / NA / NA | 579 / NA / NA | 853 / NA / NA |
|  | HED | 0.5 / 0.7 / 72 | $8.1 / 6.2 / 732$ | 35 / 20 / 2788 | 61/33 / 4992 |
| 12 | Symbolic Algebra in [12] | 12.4 / NA / NA | 251 / NA / NA | 791 / NA / NA | >1200 / NA / NA |
|  | HED | 1.7 / 1.9 / 156 | 21/13/1530 | 82 / 41 / 6890 | 126 / 59 / 10045 |
| 16 | Symbolic Algebra in [12] | 34.9 / NA / NA | 644 / NA / NA | >1200 / NA / NA | $>1200 / \mathrm{NA} / \mathrm{NA}$ |
|  | HED | 3.8 / 2.4 / 272 | $53 / 28 / 4468$ | 184/77 / 13592 | 287/97/21974 |
| 20 | Symbolic Algebra in [12] | 70.7 / NA / NA | $>1200$ / NA / NA | $>1200$ / NA / NA | $>1200 / \mathrm{NA} / \mathrm{NA}$ |
|  | HED | 7.3 / 4.9 / 420 | 138 / 59 / 10028 | 328 / 129 / 27012 | 467 / 157 / 48504 |

table, where Time, Mem and Nodes for a polynomial of degree 20 have been provided. In this case, $\prod_{j=1}^{d} \prod_{i=0}^{D e g-1}\left(x_{j}+i\right)$ is given where the number of variables $(d)$ varies from 2 to 8 , while $D e g$ is 20 . The results show that symbolic algebra approach [12] is only capable to solve the problem when the number of variables $(d)$ is 2 .

It is obvious that in our approach equivalent polynomials are automatically detected due to canonical representation of HED. However symbolic algebra tools such as MAPLE need to check the equivalence polynomials through some computations which spend lots of time and therefore will not be efficient. Furthermore, the HED package can be encapsulated into any design flow in order to check the equivalence between different levels of abstractions, specifically when we follow up a refinement-based design flow.

### 5.3 HED for Gate-Level Implementation

In order to have a practical comparison between the proposed algorithm and the method in [4], we have synthesized the speed-optimized multipliers using the Xilinx ISE-9 synthesis tool on the XC95288XV CPLD device. The Xilinx tool applies a CLA-adder structure to speed-optimized multipliers. On the other hand the area- optimization process with ripple-carry adders has been applied to c6288 from ISCAS-85 benchmark using the Synopsys Design Compiler tool. Table 3 tabulates the experimental results, where $m$-bit-ripple is the $m$-bit area-optimized multiplier, while the $m$-bit-CLA is the $m$-bit speed-optimized multiplier. The method in [4] performs the XOR search within all the internal nodes and therefore it has to search the XOR gates within the carry-signal logic blocks as well as equivalent XOR blocks. This process can become really time-consuming in large multipliers, due to the carry look-ahead logic block expansions. On the other hand the proposed method performs the first-level XOR search and does not proceed in the carry look-ahead logic. It means that the XOR search process in the proposed algorithm is almost similar for CLA and ripple-carry adders in terms of speed. The "\#evaluated gates" column in Table 3 addresses the above issue.

The method in [4] evaluates all the logic gates in the

Table 3 Our approach versus ABL in [4].

| Multiplier | \#of Evaluated Gates |  | Carry-Signal issues in [4] |  |  |
| :--- | :---: | :---: | :---: | :---: | :---: |
|  | ABL <br> $[4]$ | Our <br> Method | \#HC | \#NFA <br> C | \#ORMiss |
| 8-bit-ripple | 357 | 312 | 15 | 7 | 3 |
| 8-bit-CLA | 577 | 335 | 15 | 7 | 3 |
| 16-bit-ripple | 2037 | 1392 | 40 | 15 | 7 |
| 16-bit-CLA | 2559 | 1736 | 40 | 15 | 7 |
| 32-bit-ripple | 7747 | 5856 | 100 | 31 | 15 |
| 32-bit-CLA | 10751 | 6802 | 100 | 31 | 15 |

net-list during XOR search and as a result "\#evaluated gates" for this method is equal to total number of logic gates in the multiplier net-list. However, the proposed algorithm evaluates much lower number of gates during the XOR search, specifically, in large multipliers. Therefore, the XOR search process is much faster in the proposed method. Another factor, which makes the proposed algorithm much faster than the counterparts, is the process of categorizing XOR gates as explained in Sect.4.2. By the time the algorithm fails to extract a word-level adder from an accepted XOR category, some bugs take place and therefore it stops the arithmetic verification process and reports the location of bugs. Therefore, the proposed algorithm converges really fast, even if it could not extract the equivalent arithmetic circuit from the gate net-list.

The equivalence checking of carry signals takes place for both methods. However, the method in [4] has to exhaustively check all the possible non-fulladder carries, hidden carries and full-adder carry signals for each extracted XOR gate. Our method can observe if two cascaded XOR terms should be taken into account as two half-adders or a single full-adder needs to be extracted. The other inefficiency of the ABL method in [4] was the unmapped OR gates. In the BLA, we represent each product vector by $L S B$ and $M S B$ functions and therefore each addition process can

Table 4 Multipliers in HED versus ABL [4].

| \#Bits | HED |  |  | ABL [4] |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  |  |  | \#NN | Time(s) | Mem(MB) | Time(s) | \#Gate |
| 8 | WithoutEC | 16 | 0.07 | 0.11 | NA | NA |
|  | With EC | 29 | 0.37 | 0.3 | 1.92 | 577 |
|  | WithoutEC | 32 | 0.34 | 0.21 | NA | NA |
|  | With EC | 61 | 1.9 | 0.98 | 6.56 | 2559 |
| $\mathbf{3}$ | WithoutEC | 64 | 1.38 | 0.7 | NA | NA |
| $\mathbf{2}$ | With EC | 619 | 11.6 | 4.45 | 28.2 | 10751 |
| }$\mathbf{4}$ | WithoutEC | 128 | 7.1 | 3.05 | NA | NA |
|  | With EC | 2799 | 41.2 | 18.9 | 124.2 | 35936 |

NA: Not Applicable
be evaluated whether or not it leads to a carry overflow. In this way for each adder while evaluating the final half-adder stage in the net-list, we do know that if the OR gate, instead of XOR, is possible for the realization or not. Table 3 also addresses these problems by the three columns, the number of hidden-carry signals (\#HC), the number of non-full-adder carry misses (\#NFAC) and the number of unmapped OR gates (\#ORMiss). As the multiplier becomes larger, these issues become more troublesome and the method in [4] has to exhaustively check different types of carry signals. The number of unmapped OR gates will also be increased in larger multipliers.

Furthermore, we have utilized the HED in order to verify the gate-level implementation of $n * n$ multiplier, where $n$ varies from 8 to 64 . The experimental results of equivalence checking between the gate-level description and a high level specification are summarized in Table 4. In this table, the number of bits, the number of HED nodes, CPU time and memory usage (during processing the gate level netlist) are reported in columns \#Bits, \#NN, Time (in second) and Mem (in MByte) respectively. Although the row WithoutEC tabulates those information just for representing high level specification $\mathrm{A} * \mathrm{~B}$ in HED, the row With EC reports them for equivalence verification of two descriptions. The last column in this table reports the CPU time required to run the ABL method which proves our claim that this method is time-consuming in large multipliers. In Table 4, column \#Gate reports the number of gates to be processed in ABL method [4] which similar to \#of Evaluated Gates column in Table 3 when m-bit CLA structures are considered. For instance, consider the last two rows in Table 4, where 64-bit multiplier was verified. The ABL requires 124.2 s (With $E C$ row) to check the equivalence between the two descriptions. The HED package consumes 3.05 MB memory and spends 7.1 s run time to represent $\mathrm{A} * \mathrm{~B}$ high level description, where $A=\sum_{i=0}^{63} 2^{i} * a_{i} ; B=\sum_{i=0}^{63} 2^{i} * b_{i}$. While it requires 18.9 MB memory and 41.2 s CPU time to check the equivalence between the bit level adder description extracted from the gate level net-list using our proposed method and a high level description, i.e., A*B. Obviously, the HED package spends some time to figure out isomorphic sub-graphs and redundant nodes which is $41.2-7.1=34.1 \mathrm{~s}$
for 64-bit multiplier.

## 6. Conclusion and Future Work

In this paper we have proposed a unified framework to verify the equivalence between different levels of abstractions from a high level specification to a gate level implementation. We introduced the HED as a canonical decision diagram that not only supports modular polynomial computations over ring $Z_{2^{n}}$, but also is able to express ABL description of gate level implementations. We are interested in applying HED to the verification of arithmetic datapath computations over bit-vectors that have different bit-widths. Another future work is to diagnose and locate the source of bugs when the equivalence checking fails.

## References

[1] B. Alizadeh and M. Fujita, "A canonical and compact hybrid wordBoolean representation as a formal model for hardware/software codesigns," Fourth Workshop on Constraints in Formal Verification (CFV), pp.15-29, 2007.
[2] I.A. Groute and K. Keane, "M(VH)DL: A MATLAB to VHDL conversion toolbox for digital control," IFAC Symp. on ComputerAided Control System Design, Sept. 2000.
[3] O. Sarbishei, B. Alizadeh, and M. Fujita, "Arithmetic circuit verification without looking for internal equivalences," IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE), June 2008.
[4] M. Wedler, D. Stoffel, R. Brinkmann, and W. Kunz, "A normalization method for arithmetic datapath verification," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.26, no.11, pp.19091922, Nov. 2007.
[5] S. Horeth and R. Drechsler, "Formal verification of word-level specifications," Proc. Design Automation and Test in Europe (DATE), pp.52-58, 1999.
[6] R. Bryant, "Graph-based algorithms for Boolean function manipulation," IEEE Trans. Comput., vol.35, no.8, pp.677-691, 1986.
[7] R. Drechsler, B. Becher, and S. Ruppertz, "The K*BMD: A verification data structure," IEEE Des. Test Comput., vol.14, pp.51-59, 1997.
[8] M. Ciesielski, P. Kalla, and S. Askar, "Taylor expansion diagrams: A canonical representation for verification of data flow designs," IEEE Trans. Comput., vol.55, no.9, pp.1188-1201, 2006.
[9] B. Alizadeh and M. Fujita, "Automatic merge-point detection for sequential equivalence checking of system-level and RTL descriptions," International Symposium on Automated Technology for Verification and Analysis (ATVA), LNCS 4762, pp.129-144, 2007.
[10] C.-Y. Huang and K.-T. Cheng, "Using word-level ATPG and modular arithmetic constraint solving techniques for assertion property checking," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.20, no.3, pp.381-391, 2001.
[11] R. Brinkmann and R. Drechsler, "RTL-datapath verification using integer linear programming," Proc. ASP-DAC, pp.741-747, 2002.
[12] N. Shekhar, P. Kalla, and F. Enescu, "Equivalence verification of polynomial datapaths using ideal membership testing," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.26, no.7, pp.13201330, 2007.
[13] N. Hungerbuhler and E. Specker, "A generalization of the smarandache function to several variables," Electronic Journal of Combinatorial Number Theory, vol.6, pp.A23, 1-11, 2006.
[14] L. Halbeisen, N. HungerBuhler, and H. Lauchli, "Powers and polynomials in $\mathrm{Z}_{\mathrm{m}}$," Elem. Math, vol.54, pp.118-129, 1999.
[15] http://minisat.se/


Bijan Alizadeh received his B.S., M.S. and PhD . degrees in computer engineering from the University of Tehran, Iran in 1995, 1998 and 2004, respectively. From 2004 to 2006 he was an adjunct Professor of Electrical Engineering Department at the Sharif University of Technology. In Fall 2006, he was a postdoctoral fellow at the VLSI Design and Education Center (VDEC) in the University of Tokyo in Japan. His research interests are in fundamental CAD techniques for verification, synthesis, optimization as well as test generation of digital VLSI circuits and systems.


Masahiro Fujita received his Ph.D. degree in Engineering from the University of Tokyo in 1985 and shortly after joined Fujitsu Laboratories Ltd. From 1993 to 2000 he had been assigned to Fujitsu's US research office and directed the CAD research group. In March 2000, he joined the department of Electronic Engineering in the University of Tokyo as a professor, and now a professor at VLSI Design \& Education Center in the University of Tokyo. He has been involved in many research projects on various aspects of formal verification.


[^0]:    Manuscript received July 24, 2008.
    Manuscript revised October 13, 2008.
    ${ }^{\dagger}$ The authors are with VLSI Design and Education Center (VDEC), The University of Tokyo and JST CREST, Tokyo, 1130032 Japan.
    a) E-mail: alizadeh@cad.t.u-tokyo.ac.jp

    DOI: 10.1587/transinf.E92.D. 985

