Elsevier

Discrete Applied Mathematics

Volume 310, 31 March 2022, Pages 10-31
Discrete Applied Mathematics

Constructing depth-optimum circuits for adders and And-Or paths

https://doi.org/10.1016/j.dam.2021.12.007Get rights and content

Abstract

We examine the fundamental problem of constructing depth-optimum circuits for binary addition. More precisely, as in literature, we consider the following problem: Given auxiliary inputs t0,,tm1, the so-called generate and propagate signals, construct a depth-optimum circuit over the basis {And2,Or2} computing all n carry bits of an n-bit adder, where m=2n1. In fact, carry bits are And-Or paths, i.e., Boolean functions of the form t0(t1(t2(tm1))). Classical approaches construct so-called prefix circuits which do not achieve a competitive depth. For instance, the popular construction by Kogge and Stone (1973) is only a 2-approximation. A lower bound on the depth of any prefix circuit is 1.44log2m+const, while recent non-prefix circuits have a depth of log2m+log2log2m+const. However, it is unknown whether any of these polynomial-time approaches achieves the optimum depth for all mN.

We present a new exponential-time algorithm solving the problem optimally. The previously best exact algorithm by Hegerfeld (2018) with a running time of O(2.45m) is viable only for m29. Our algorithm is significantly faster: We achieve a theoretical running time of O(2.02m) and apply sophisticated pruning strategies to improve practical running times dramatically. This allows us to compute optimum circuits for all m64. Combining these computational results with new theoretical insights, we derive the optimum depths for the computation of all carry bits of 2k-bit adder circuits for all k13, previously known only for k4.

In fact, we solve a more general problem, namely delay optimization of generalized And-Or paths, which originates from late-stage logic optimization in VLSI design. Delay is a natural extension of circuit depth to prescribed input arrival times; and generalized And-Or paths are a generalization of And-Or paths where And and Or do not necessarily alternate. Our algorithm arises from our new structure theorem which characterizes delay-optimum generalized And-Or path circuits.

Introduction

In this work, we construct fast circuits for binary addition and for related Boolean functions, so-called And-Or paths. An And-Or path is a function of the form t0(t1(t2(tm1))) for some mN; and a circuit for a Boolean function is a graph-based model for the computation of the function via elementary building blocks (called gates) on a computer chip. Here, we use And2 and Or2 as elementary building blocks, i.e., logical And and Or with two inputs each. Motivated from VLSI design, our objective function is circuit delay, a generalization of circuit depth to prescribed input arrival times a(ti)N for each input ti. The delay of a circuit is the maximum delay of any input ti, i.e., a(ti) plus the maximum length of any directed path in the circuit starting in ti. In particular, when a0, circuit delay is actually circuit depth, i.e., the maximum length of any directed path in the circuit. Given a specific And-Or path with input arrival times, we want to find a delay-optimum circuit for this Boolean function using only And2 and Or2 gates. Important secondary objective functions include circuit size (i.e., number of gates) and fanout (i.e., number of successors of a gate).

And-Or paths occur as carry bits in the computation of adder circuits: Assume we compute the sum of two n-bit binary numbers i=0n1ai2i and i=0n1bi2i. A circuit for this task can be constructed via carry bits which are defined recursively by c0=0 and ci+1=gipici for 0in1, where gi=aibi and pi=aibi, see, e.g., Weinberger and Smith [35] or Knowles [21]. Using the carry bits, the sum i=0nsi2i can be computed via si=cipi for i0,,n1 and sn=cn.

The computation of all gi and pi as well as the computation of the sum from the carry bits only requires a constant depth and a linear number of gates. Therefore, a circuit computing all the And-Or paths ci+1=gipici=gipigi1pi1ci1=gipigi1pi1gi2pi2p1g0for 0in1 can be used to construct an adder circuit with almost the same depth and delay. We call such a circuit computing all carry bits ci+1 from the signals g0,p1,g1,,pi,gi a carry-propagate adder circuit. Note that a naive implementation of a carry-propagate adder circuit following the formula above yields a ripple-carry adder circuit with linear delay, cf. Fig. 1(a). A logically equivalent implementation is given in Fig. 1(b).

Now, we consider the reverse reduction: Given input bits g0,p1,g1,,pn1,gn1, using constant additional depth, one can construct signals a0,an1 and b0,bn1 such that cn – i.e. the output signal of an And-Or path on g0,p1,g1,,pn1,gn1 – equals the most significant bit sn of the sum of a and b. Thus, any adder circuit could also be used to compute the And-Or path on g0,p1,g1,,pn1,gn1 with almost the same depth.

Note that for And-Or paths, we restrict ourselves to the basis And2,Or2, while for general adders, non-monotone gates are at least required to compute the signals gi and pi. It is unknown whether better adder depths can be obtained by not using the reduction to monotone And-Or paths above, i.e. not using carry-propagate adders, and instead exploiting non-monotone gates in a better way. However, the reverse reduction above shows that this could only be the case if the usage of non-monotone gates led to faster And-Or path circuits, and, until now, no approach can exploit non-monotone gates for And-Or path circuits. In fact, all adder circuits known from literature are based on carry-propagate circuits, i.e. they explicitly compute all carry bits. For more details, see also Section 1.1.

In this work, we will construct depth-optimum circuits for And-Or paths over the basis And2,Or2. In fact, we consider generalized And-Or paths, i.e., a generalization of And-Or paths where And and Or do not necessarily alternate. We will see that this more general problem has a rich structure which we will exploit for our new results. To the best of our knowledge, we are the first to directly consider this generalized problem. Delay optimization of generalized And-Or paths can be applied to optimize the delay of critical combinatorial paths in VLSI design, but existing approaches (see, e.g., Werber et al. [36]) use a simple reduction to And-Or paths which leads to sub-optimal solutions.

We now review previous results on adder and And-Or path optimization over the basis And2,Or2. In this section, following a widely used convention (see e.g. Held and Spirkl [12]), we call any carry-propagate adder circuit – which computes the carry bits ci+1 from the signals g0,p1,g1,,pi,gi as in Eq. (1) – an adder circuit. Recall from Eq. (1) that an n-bit adder can be obtained from And-Or paths on 1,3,,2n1 inputs, so the optimum depth of an n-bit adder equals the optimum depth of an m-input And-Or path with m=2n1, i.e., n=m+12. Note that in the following logarithmic bounds, applying the substitution m=2n1 only leads to a constant additive term, and hence in all bounds without explicit additive constants, n and m can be freely interchanged. Depth bounds for classical adder constructions are given in terms of n instead of m.

Some of the following results only apply to depth optimization, some also to delay optimization for general arrival times. For general arrival times, a lower bound on the delay of a Boolean circuit on inputs t0,,tm1 is given by log2W due to the Kraft inequality [23], where Wi=0t12a(ti). Note that for a0, we have W=m. In the subsequent delay bounds, W can also be replaced by m to obtain the corresponding depth bounds.

Depth optimization of adder circuits is a classical and well-studied problem. Many researchers construct adder circuits via so-called prefix gates, e.g. Sklansky [31], Kogge and Stone [22], Ladner and Fischer [24], or Roy et al. [28], [29]. Though, for n bits, these circuits have an optimum depth of log2n in terms of prefix gates, they have a depth of 2log2n over the basis And2,Or2 since each prefix gate has to be realized by a circuit of depth 2. Based on Rautenbach et al. [26], [27], Held and Spirkl [13] directly optimize the And-Or delay of their prefix adders and obtain a delay guarantee of μlog2W+const over And2,Or2, where 1.44μ1.441. However, Held and Spirkl [13] also proved a lower bound of μlog2n1 on the logic gate depth of any prefix-based adder circuit. Thus, further progress is only possible with adders that are not based on prefix gates. We also refer to Paterson et al. [25] for general lower bounds on the delay of circuits that repeatedly use the same gadget gate type to combine inputs, e.g. prefix gates.

Using non-prefix circuits, Brent [3], Khrapchenko [17], and Held and Spirkl [12] achieve a guaranteed depth of the form

for And-Or paths and adders. Spirkl [33] also considered non-uniform arrival times and constructed circuits with a delay bound of the form
. For And-Or paths with arbitrary arrival times, the best known delay guarantee of log2W+log2log2m+log2log2log2m+4.3 was first obtained by Brenner and Hermann [1]. For depth optimization, the best known depth bound of log2m+log2log2m+O(1) is due to Grinchuk [9]; the additive constant was improved to 1.58 by Hermann [14].

On the other hand, both for adders and And-Or paths, there is a constant c such that for depth optimization, a lower bound of log2m+log2log2m+c holds for sufficiently large m, which was shown by Commentz-Walter [4]. Hitzschke [15] showed that if c is chosen as c=5.02, then the lower bound holds for m2218. Commentz-Walter [4] obtains the lower bound via structural insights on And-Or paths that are the basis for our structure theorem, see Section 3. For the non-monotone case where not only And2 and Or2, but also INV gates are allowed to be used, Commentz-Walter and Sattler [5] proved that for each α(0,1), there is some MαN such that a lower bound of log2m+αlog2log2log2m holds for all mMα. From this, Khrapchenko [19] derives that for any m2232, the lower bound log2m+0.15log2log2log2m21 holds. However, so far there is no approach that can exploit non-monotone gates to obtain better depths. It is an interesting open question whether such an approach exists or stronger lower bounds (e.g. similar to the monotone case) can be shown for the non-monotone case.

Comparing to the monotone lower bound, Grinchuk [9] and Hermann [14] construct depth-optimum circuits up to an additive constant, and the delay-optimization algorithm by Brenner and Hermann [1] is, applied for the special case of depth optimization, optimum up to an additive term of O(log2log2log2m). However, the difference of these upper bounds to the lower bound is still substantial, leading to sub-optimal results, in particular on small instances as they occur in practice in VLSI design.

In order to obtain good empirical results, the algorithm by Brenner and Hermann [2] combines the ideas from these theoretical constructions with practical improvements and generalizations. Although no better worst-case bound can be shown, the obtained circuits mostly have better – often optimal – delay. However, there are instances with only 6 inputs for which the algorithm does not find an optimum solution, see Section 6.3 in Hermann [14]. Still, regarding depth optimization, all instances with known optimum depth that can be solved by this algorithm are indeed solved optimally, and it is open whether this construction is always optimum in the depth case. There is also a depth-optimization heuristic by Grinchuk [8] which might actually be an exact algorithm (cf. the depths displayed in the table in Section 5 of [8]).

There are three previously known exact algorithms for depth optimization of And-Or paths.

Apart from the aforementioned heuristic, Grinchuk [8] also provides an exact algorithm for depth optimization of And-Or paths with a running time of Ω(4m). No explicit empirical results are given, but it is mentioned that the algorithm can only be used for up to 20 or 30 inputs. The idea of this algorithm is to compute the optimum achievable depth for all monotone Boolean functions on m inputs in a bottom-up dynamic programming fashion. Each Boolean function is identified by its truth table, and circuits of larger depth are obtained by pairwise combinations of existing circuits with And or Or gates. Naively, the dynamic-programming table thus would have 22m entries. Grinchuk’s main contribution is the observation that a truth table of size m – called a passport in [8] – suffices to identify a monotone And-Or path circuit. This way, the size of the dynamic-programming table is reduced to 2m, which implies a running time of Ω(4m) to compute all table entries and hence a depth-optimum And-Or path circuit.

Hegerfeld [11] proposes two enumeration algorithms constructing depth-optimum And-Or path circuits.

In a first algorithm, Hegerfeld constructs all circuits that are size-optimum among all depth-optimum And-Or path circuits. This algorithm is based on tree enumeration and is viable for up to 19 inputs. The algorithm can also be used to enumerate And-Or path circuits with non-optimum depth with an increase in running time, which leads to optimum solutions with respect to delay for certain arrival time profiles.

Hegerfeld’s second algorithm is much faster, but is restricted to depth-optimum formula circuits (i.e., circuits where each gate has fanout 1) with a certain size guarantee (cf. Section 7). It has a provable running time of O2.45m and can be applied for up to 29 inputs. Hegerfeld does not enumerate formula circuits for And-Or paths directly, but so-called rectangle-good protocol trees for Karchmer–Wigderson games (see Karchmer and Wigderson [16]) for And-Or paths, which come from the area of communication complexity. From these, Hegerfeld derives the optimum formula circuits. This is the fastest previously known exact algorithm for depth optimization of And-Or paths.

In this work, we present a new exact algorithm for constructing delay-optimum generalized And-Or path circuits. In the most prominent special case of depth optimization of And-Or path circuits (and thereby carry-propagate adders), our algorithm is significantly faster than previous approaches, both in theory and in practice. For the general problem, which occurs in late-stage timing optimization in VLSI design, we obtain the first known non-trivial exact algorithm.

We prove a new structure theorem which characterizes the structure of specific delay-optimum circuits for generalized And-Or paths. More precisely, we show how optimum solutions for generalized And-Or paths can be obtained by combining optimum circuits for certain smaller generalized And-Or paths in a recursive fashion, directly motivating a dynamic programming algorithm. We stress that an analogous statement does not hold for non-generalized And-Or paths, that is, generalized And-Or paths also occur as sub-functions of delay-optimum circuits for non-generalized And-Or paths.

In the general case, the running time of our new algorithm is at most O(3m). For And-Or paths, the bound is improved to O(2.45m).

For the special case of depth optimization of And-Or paths, our algorithm has a running time of O(2.02m), significantly improving upon the previously best running time of O(2.45m) of the formula enumeration algorithm by Hegerfeld [11]. Hegerfeld computes a depth-optimum formula circuit with a certain size guarantee (cf. Section 7). Our algorithm can either compute such a circuit or an arbitrary delay-optimum circuit, which can be done much faster. In contrast to Hegerfeld, in practice, we apply very efficient pruning techniques that drastically reduce empirical running times. The largest instance solved by Hegerfeld has 29 inputs, while our algorithm with size optimization can solve instances with up to 42 inputs; without size optimization even up to 64 inputs. Our running times on 26 inputs are 2.1 s with size optimization and 0.007 s without size optimization, while Hegerfeld’s running time is 17 hours. Our largest running time without size optimization on any of these instances is roughly 2.7 h.

From our structure theorem, our computations and the results computed by the heuristic And-Or path optimization algorithm by Grinchuk [8], we deduce the optimum depths of carry-propagate adder circuits – i.e., circuits computing all the carry bits from the gi- and pi-signals as in Eq. (1) – over the basis And2,Or2 on 2k bits, where k13. To the best of our knowledge, we are the first to obtain such a result.

The rest of the paper is organized as follows: In Section 2, we formally introduce the problem and basic concepts. In Section 3, we present and prove our structure theorem. From this, in Section 4, we derive our exact algorithm, which is refined for the special case of depth optimization of And-Or paths in Section 5. Practical speedups are presented in Section 6. In Section 7, we show computational results, i.e., our practical running times and the computed optimum adder depths.

Section snippets

Boolean functions and circuits

Our notation regarding Boolean functions and circuits is based on Savage [30]. We denote the set of natural numbers including 0 by N. For an n-tuple x0,,xn1 and an index i0,,n1, we use the standard notation (x0,,xî,,xn1) to denote the (n1)-tuple arising from x by deleting the entry xi.

For us, a Boolean function is a function f:0,1m0,1 for some mN. We often write t=t0,,tm1 for the input variables, short inputs, of f. For an input variable ti of f and a value α0,1, the restriction

Structure theorem

Our structure theorem and our algorithm presented in Section 4 both reduce the problem of optimizing a given generalized And-Or path to smaller instances of a specific form.

Definition 3.1

Consider a generalized And-Or path h(t;Γ) with m1 inputs. Given indices 0i0<<ik1m1, the generalized And-Or path ti0i0ti1i1ik3tik2ik2tik1 is called a sub-path of h(t;Γ). Now, let a gate type And,Or and a set S1 with S1S be given, and let i be maximum with tiS1. Then, the sub-path of h(t;Γ) containing

General algorithm

The structure theorem from the previous section motivates an exact algorithm for the DELAY OPTIMIZATION PROBLEM FOR GENERALIZED AND-OR PATHS: Consider a generalized And-Or path h(t;Γ) with prescribed input arrival times. Assume that we know a delay-optimum formula circuit for all strict sub-paths of h(t;Γ). Then, by Theorem 3.4, there are And,Or and a partition

such that a delay-optimum circuit C for h(t;Γ) can be obtained from delay-optimum circuits C1 for h(t;Γ)S1 and C2 for h(t;Γ)S2

Improved Algorithm for Depth Optimization of AND-OR Paths

In this section, we speed up Algorithm 4.1 for the special case of depth optimization of And-Or paths. For this, we partition all sub-paths considered during the algorithm into so-called sp-equivalence classes, where two sub-paths with segment partitions

and
are considered as sp-equivalent if and only if c=c and |Pb|=|Pb| for all b0,,c. Then, up to renaming of the input variables, any two sp-equivalent And-Or paths are either logically equivalent or dual to each other, i.e., the delays

Practical implementation

We implemented Algorithm 4.1 (Section 4) in a C++ program, using 64-bit bit sets to encode the sub-paths via the bijection κ to subsets of t0,,tm1. In order to obtain good practical running times, we implemented several speedup techniques. On most instances, these in particular imply that we compute the delay for only a fraction of the sub-paths from our dynamic-programming table, see also Table 4 (Section 7.2). Hence, we store the table in a hash set, which violates the worst-case running

Computational results

In Section 7.1, we analyze results for delay optimization of And-Or paths and generalized And-Or paths. Then, in Section 7.2, we consider the DEPTH OPTIMIZATION PROBLEM FOR AND-OR PATHS. In particular, here we analyze all speedup techniques in detail, including their individual impact on the empirical running time. These speedups allow us to solve all instances of the DEPTH OPTIMIZATION PROBLEM FOR AND-OR PATHS with up to 64 inputs. For this problem, we also compare our running times with those

Conclusions

We presented a new exact algorithm for constructing depth- and delay-optimum And-Or path and carry-propagate adder circuits over the basis And2,Or2. Our algorithm is much faster than previous approaches – both empirically and regarding provable worst-case running time – and hence can solve significantly larger instances. For all And-Or path instances with up to 64 inputs, the optimum depth was computed in reasonable time.

Using these empirical computations and new theoretical results, we derived

References (36)

  • RautenbachD. et al.

    Delay optimization of linear depth boolean circuits with prescribed input arrival times

    J. Discrete Algorithms

    (2006)
  • RautenbachD. et al.

    The delay of circuits whose inputs have specified arrival times

    Discrete Appl. Math.

    (2007)
  • BrennerU. et al.

    Faster carry bit computation for adder circuits with prescribed arrival times

    ACM Trans. Algorithms

    (2019)
  • BrennerU. et al.

    Delay optimization of combinational logic by and-or path restructuring

    (2020)
  • BrentR.

    On the addition of binary numbers

    Trans. Comput.

    (1970)
  • Commentz-WalterB.

    Size-depth tradeoff in monotone Boolean formulae

    Acta Inform.

    (1979)
  • Commentz-WalterB. et al.

    Size-depth tradeoff in non-monotone Boolean formulae

    Acta Inform.

    (1980)
  • ConradieW. et al.

    Logic and Discrete Mathematics: A Concise Introduction

    (2015)
  • CramaY. et al.

    Boolean Functions: Theory, Algorithms, and Applications

    (2011)
  • M.I. Grinchuk, Low depth circuit design, US patent 8499264...
  • GrinchukM.I.

    Sharpening an upper bound on the adder and comparator depths

    Disk. Anal. I Issledovanie Oper.

    (2008)
  • GrinchukM.I.

    Sharpening an upper bound on the adder and comparator depths

    J. Appl. Ind. Math.

    (2009)
  • HegerfeldF.

    Optimal monotone realizations of And-Or-paths

    (2018)
  • HeldS. et al.

    Binary adder circuits of asymptotically minimum depth, linear size, and fan-out two

    ACM Trans. Algorithms

    (2017)
  • HeldS. et al.

    Fast prefix adders for non-uniform input arrival times

    Algorithmica

    (2017)
  • HermannA.

    Faster circuits for and-or paths and binary addition

    (2020)
  • HitzschkeJ.M.

    Untere Schranken für Tiefe und Delay von AND-OR-Pfaden

    (2018)
  • M. Karchmer, A. Wigderson, Monotone circuits for connectivity require super-logarithmic depth, 3(2) (1990)...
  • Cited by (1)

    1

    Now with Synopsys GmbH, Germany.

    2

    Now with Greenplan GmbH, Germany.

    View full text