# Low Power Sequential Circuit Design by Using Priority Encoding and Clock Gating* 

Xunwei Wu<br>Institute of Circuits and Systems<br>Ningbo University<br>Ningbo, Zhejiang 315211, CHINA<br>xunweiwu@mail.hz.zj.cn

Massoud Pedram<br>Dept. of Elec. Eng.-Systems<br>University of Southern California<br>Los Angeles, CA90089, USA<br>massoud@zugros.usc.edu


#### Abstract

This paper presents a state assignment technique called priority encoding, which uses multi-code assignment plus clock gating to reduce power dissipation in sequential circuits. The basic idea is to assign multiple codes to states so as to enable more effective clock gating in the sequential circuit. Practical design examples are studied and simulated by PSPICE. Experimental results demonstrate that the priority encoding technique can result in sizable power saving.


## I. Introduction

Synthesis of sequential circuits for low power ${ }^{[1 \sim 5]}$ is an area of research that promises to result in large power savings. The sequential circuit design process can be divided into the following steps: (1) State reduction (to minimize the number of state variables); (2) State assignment (to determine the relationship between states); (3) Choice of flip-flops (to determine the state transition diagram by using state variable values); (4) Design of the combinational circuit part (to produce the outputs and next states). State assignment plays an important role in determining the structure and complexity of the resulting finite state machine in terms of the number of nodes required to implement the output and next logic. State assignment also affects the switching activity of the state variables and hence the internal signals in the circuit.
During the low-power design of the combinational circuits; it has been found that blocking the redundant signals and shutting off the redundant parts of the circuit is an effective method to lower the energy dissipation ${ }^{[6]}$. If some part of the

[^0]circuit has no effect on the circuit functionality during some time period, then this part is functionally redundant in that period. If the part is made inactive (by cutting off the power supply or by fixing its input signals), then power can be saved. This technique of exploiting redundancy can be applied to combinational logic part of a finite state machine. A sequential circuit is however different from a combinational one in a number of important aspects:
(i) A sequential circuit has flip-flops, which store state signals.
(ii) A sequential circuit receives a special signal called clock, which is used to synchronously trigger the flip-flops.
(iii) States are assigned by encoding the state variables.

We next describe three restraining techniques with respect to each of these aspects.
(i) Traditional flip-flops are single-edge triggered flip-flips (SETFF), which are sensitive to either rising or falling edge of the clock. So half of the clock transitions do not have any impact on the circuit and thereby create redundant behaviors, which in turn results in wasted power dissipation in the flip-flops. For this reason, a double-edge triggered flip-flop (DETFF) can be used, which utilizes both transition edges of the clock, and thereby achieves power saving. ${ }^{[6-8]}$
(ii) The function of the clock is to force all flip-flips to synchronously change their state (from present state to next state). During this switching process, if the next state of a certain flip-flop is the same as its present state, then this flip-flip will be in a holding mode. The clock's triggering action for this flip-flop becomes redundant and can be masked. Therefore, clock gating technique can be used to lower the power dissipation. ${ }^{[9-13]}$
(iii) During state assignment, $k$ state variables are used to express $2^{k}$ different states. However, if the number of functional states $l$ is not equal to $2^{k}$, i.e., $l \leq 2^{k}$, then there will exist $\left(2^{k}-l\right)$ redundant states. These redundant states can be beneficial in reducing the complexity of the combinational circuit, but reliability of the circuit may be adversely affected. We must consider the system behavior if it enters one of these redundant
states and make the system self-corrective. Theoretically, there should be a technique for avoiding the redundant states, which also saves power dissipation. We have however not found any work on this subject.
This paper proposes a priority encoding technique to eliminate any unused state code. The result is that some states do not require binary assignment of all state variables. When the system is in such a state, the unused state variables become redundant. Because the corresponding flip-flop outputs are not used, these flip-flops can be isolated from the clock to reduce their power dissipation.
The paper is organized as follows. In the next section we present the design principle of multi-code state assignment by using priority encoding. A practical example of ring counter will be presented to show that restraining redundant states can lead to power saving. In section III, we take two sequential circuits in common use as examples to describe the effect of the number of redundant states and the state probabilities on the multi-code state assignment. Section IV is dedicated to the conclusions.

## II. Priority encoding by using redundant states

In combinational circuit design, the existence of redundant states is helpful in generating a large prime implicant during Boolean function minimization. If the implicant contains $2^{m}$ minterms, a maximum of $m$ variables may be eliminated from the product-form. If we use $k$ state variables to express $l$ different states $\left(l \leq 2^{k}\right)$, there will be $\left(2^{k}-l\right)$ redundant states. These redundant states may be utilized to make some states multi-coded e.g., two-coded, four-coded, etc. These state assignments give rise to large implicants during combinational circuit optimization, thus reducing the complexity of the combination circuit implementation.
An example of sequential circuit with a lot of redundant states is the one-hot ring counter, where each state corresponds to a state variable. Take the four-state $\left(S_{1}, S_{2}, S_{3}, S_{4}\right)$ counter as an example. Fig. 1(a) shows the state assignment Karnaugh map and the state assignment table of the four states corresponding to state variables $\left(S_{1}, S_{2}, S_{3}, S_{4}\right)$. Notice that each state is encoded by a state variable minterm. The result is a set of twelve $\left(12=2^{4}-4\right)$ redundant states, which are depicted by empty rooms in Fig. 1(a). Although these redundant states can be used to simplify the excitation functions ( $D_{1}=Q_{4}, D_{2}=Q_{1}, D_{3}=Q_{2}, D_{4}=Q_{3}$ ), the problem is that the design will not be self-corrective. We must change $D_{1}$ function to $D_{1}=\overline{Q_{1}} \cdot \bar{Q}_{2} \cdot \overline{Q_{3}}$, to meet this requirement. The complete state diagram of the revised circuit is shown in Fig. 1b. One can easily verify that if the circuit enters one of the invalid states, it will return to the valid working cycle in a period no more than three clock cycles.


|  | $Q_{1}$ | $Q_{2}$ | $Q_{3}$ | $Q_{4}$ |
| :---: | :---: | :---: | :---: | :---: |
| $S_{1}$ | 1 | 0 | 0 | 0 |
| $S_{2}$ | 0 | 1 | 0 | 0 |
| $S_{3}$ | 0 | 0 | 1 | 0 |
| $S_{4}$ | 0 | 0 | 0 | 1 |

b


Figure 1 Design of a one-hot ring counter
(a) K-map and tabular of state assignment
(b) Complete state diagram of the self-corrective design

We can use these twelve redundant states to realize the multi-code state assignment, as shown in Fig. 2(a). Note that encodings for $\left(S_{1}, \underline{S_{2}}, S_{3}, S_{4}\right)$ are $Q_{1}, \bar{Q}_{1} Q_{2}$, $\bar{Q}_{1} \bar{Q}_{2} Q_{3}$ and $\bar{Q}_{1} \bar{Q}_{2} \bar{Q}_{3}$, hence state variable $Q_{4}$ is immaterial and can be omitted. Because state $S_{4}$ is encoded by three zero state variables, the ring counter has evolved from a one-hot type , to a one-zero-hot type. ${ }^{[14]}$ Because we use D flip-flops, $Q_{i}=D_{i}$, hence the next state equations and the excitation functions of the three flip-flops can be derived based on the state table shown in Fig. 2(b):
$D_{1}=\overline{Q_{1}+Q_{2}+Q_{3}}, D_{2}=Q_{1}, D_{3}=Q_{2}$
These equations may be used to realize a ring counter with correct functionality, but without any clock gating. Our goal is however to find a low power (i.e. clock gated) ring counter circuit. To do so, we need to derive the clock-gating functions as discussed next.
In Fig. 2(b) $S_{1}$ is four-coded and $S_{2}$ is two-coded whereas $S_{3}$ and $S_{4}$ are uni-coded. We find that $Q_{1}$ has the highest priority, which means that when $Q_{1}=1, Q_{2}$ and $Q_{3}$ are don't cares. Similarly, $Q_{2}$ has the second highest priority because when $Q_{1}=0$ and $Q_{2}=1$, then $Q_{3}$ is don't care. Therefore, in the corresponding circuit, $Q_{1}=1$ can be used to restrain the switching of $Q_{2}$ and $Q_{3}$, whereas $Q_{2}=1$ can be used to restrain the switching of $Q_{3}$. The restraining functions can be realized by a clock-gating technique, as shown in Fig.2(c). The signal for gating the clock signal to the second flip-flop must be $D_{1}$, and not $Q_{1}$. The reason is that when the clock to
the second flip-flop arrives, $D_{1}=1$ will force $Q_{1}=1$, which will subsequently block the clock to the second and third flip-flop immediately. Similarly, we use $D_{1}=1$ and $D_{2}=1$ to mask the clock to the third flip-flop as opposed to using $Q_{1}$ and $Q_{2}$. If delays of the two NOR gates that produce the gated clock signals $c l k_{2}$ and $c l k_{3}$ are the same as that of the inverter which produces $c l k_{1}$, then the three flip-flops will all work synchronously. Notice that the omitted fourth flip-flop is replaced by the NOR gate that produces $Q_{2}$.
The circuit of Fig. 2(c) has been simulated by PSPICE. Fig. 3(a) shows waveforms of the clock and the output signals. The three derived clocks $c l k_{1}, \quad c l k_{2}, c l k_{3}$ are quasi-synchronous. The output waveforms show the circuit functionality is correct.


|  | $Q_{1}$ | $Q_{2}$ | $Q_{3}$ | $Q_{4}$ |
| :---: | :---: | :---: | :---: | :---: |
| $S_{1}$ | 1 | $\phi$ | $\phi$ | $\phi$ |
| $S_{2}$ | 0 | 1 | $\phi$ | $\phi$ |
| $S_{3}$ | 0 | 0 | 1 | $\phi$ |
| $S_{4}$ | 0 | 0 | 0 | $\phi$ |


|  | - <br>  <br>  <br> $b$ | $Q_{1}$ | $Q_{2}$ | $Q_{3}$ | $Q_{1}^{\prime}$ | $Q_{2}^{\prime}$ | $Q_{3}^{\prime}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $\left(S_{1}\right)$ | 1 | $\phi$ | $\phi$ | 0 | 1 | $\phi$ |  |
| $\left(S_{2}\right)$ | 0 | 1 | $\phi$ | 0 | 0 | 1 |  |
| $\left(S_{3}\right)$ | 0 | 0 | 1 | 0 | 0 | 0 |  |
| $\left(S_{4}\right)$ | 0 | 0 | 0 | 1 | $\phi$ | $\phi$ |  |


$c$


Fig. 2 Design of a one-zero-hot ring counter
(a) Karnaugh-Map and tabular description of state assignment
(b) State table and state diagram
(c) Gated-clock design

Now let's discuss the power dissipation of the new design. The state assignment table in Fig.1(a) shows that the four flip-flops receive 16 triggering actions from the clock in one cycle. However, the state assignment table in Fig. 2(b) shows that the three flip-flops receive only 9 triggering actions from the clock in one cycle. Consequently, the maximum power saving due to reduction of one flip-flop and clock gating is $(16-9) / 16=43 \%$. The energy dissipation curve of the two circuits in Fig. 3(b) show that the power saving is in fact $33 \%$. The disparity is produced because of the energy dissipation in
the NOR gates used for gating clock.
$a$



Fig. 3 Simulation of the one-zero-hot ring counter
(a) Clock and signal waveforms
(b) Energy dissipation curves

Finally, it should be pointed out that the design of Fig. 2(c) not only simplifies the circuit realization and saves energy dissipation, but also improves the circuit reliability because it eliminates the unused states. The complete state diagram in Fig. 2(b) shows this advantage.

## III. Multi-code state assignment with clock gating

The uni-code state assignment corresponds to a minterm of the state variable space. In contrast, the multi-code state assignment contains $2^{m}$ minterms of the state variable space, thereby eliminating $2^{m}-1$ redundant states. In general, we can decompose the set of redundant states into groups of $2^{i}-1$ states and determine the corresponding multi-code state assignments. In the previous section, there were 12 redundant states when using four state variables $\left(Q_{1}, Q_{2}\right.$, $Q_{3}, Q_{4}$ ). The non-redundant state assignment in Fig. 2(a) was obtained according to the grouping $7+3+1+1=12$. If we used three state variables $\left(Q_{1}, Q_{2}, Q_{3}\right)$ instead, then the number of redundant states would be reduced to 4 . Since $4=$
$3+1$, the non-redundant state assignment in Fig. 2(b) would be achieved. Obviously, the inclusion of redundant states increases the complexity of the state assignment procedure. We consider two popular sequential circuits as examples to discuss the state assignment algorithm.

## Example 1 Decimal up-counter

In this counter, the counting states $(0,1, \ldots, 9)$ are encoded with the conventional 8421 BCD encoding, as shown on the leftmost column of Table 1 . Notice that there are 6 redundant states: $(1010,1011,1100,1101,1110,1111)$.

Table 1: Two encodings of a decimal up-counter
8421 BCD encoding Priority-encoding

| digit | $D$ | $C$ | $B$ | $A$ | $D$ | $C$ | $B$ | $A$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 4 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 5 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 6 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 7 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| 8 | 1 | 0 | 0 | 0 | 1 | $\phi$ | $\phi$ | 0 |
| 9 | 1 | 0 | 0 | 1 | 1 | $\phi$ | $\phi$ | 1 |

From the above table, the excitation functions for the four flip-flops are derived as:
$D_{D}=C B A+D \bar{A}$,
$D_{C}=C \bar{B}+C \bar{A}+\bar{C} B A$,
$D_{B}=\bar{D} \cdot \bar{B} A+B \bar{A}$,
$D_{A}=\bar{A}$.
We now discuss the priority encoding using six redundant states. Since $6=3+3$, two states among the ten digits can be four-coded. Analyzing the original 8421BCD state encoding table, states 8 and 9 can be four-coded, as shown on the right side of Table 1. This new scheme maintains the characteristics of the original circuit. When $D=1$, the state variables $C$ and $B$ become don't cares, therefore the priority of $D$ is higher than $C$ and $B$. In the circuit realization, when the input of flip-flop $D$ is 1 , this input can be used to isolate the clock that triggers flip-flops C and B so as to reduce the corresponding energy dissipation. The new excitation functions for the four flip-flops are:
$D_{D}=\bar{D} C B A+D \bar{A}$,
$D_{C}=\bar{D} C+\bar{D} B A$,
$D_{B}=\bar{D} \cdot \bar{B} A+B \bar{A}$,
$D_{A}=\bar{A}$.

Comparison between eqn. 1 and eqn. 2 shows that $D_{B}$ and $D_{A}$ are the same in both sets. However, in the latter design, a literal $\bar{D}$ is added to $D_{D}$ and the form of $D_{C}$ is simplified. The result is that the combinational circuit part is simpler and the power dissipation is lower. As for the energy dissipation of flip-flops, the occurrence probability of each state in any cycle is $10 \%$. From the right part of Table 1, flip-flops C and B are don't cares in states 8 and 9 . During a complete $0-9$ count, we save $20 \%$ of power dissipation for flip-flops B or C. For the same counting cycle, we thus reduce the power dissipation in all four flip-flops by $10 \%$. Note however that the extra clock gating NOR gate results in some energy dissipation.
In the above example, the occurrence probability of each state was the same. In a general sequential circuit, however, steady state probabilities are not the same. To obtain the maximum power saving, we should therefore choose states with higher occurrence probability and multi-encode them first.

## Example 28421 BCD detector

An 8421 BCD detector receives a serial input $T$ forming a group of four bits (the least significant bit arrives first). When the detector receives a non- 8421 BCD group of bits, the output is $R=1$. The state table for the BCD detector after state reduction is shown in Table 2.

Table 2: state of detector

| Present | Next state |  | Output $R$ |  |
| :--- | :--- | :--- | :--- | :--- |
| state | $T=0$ | $T=1$ | $T=0$ | $T=1$ |
| $A$ | $B$ | $B$ | 0 | 0 |
| $B$ | $C$ | $D$ | 0 | 0 |
| $C$ | $E$ | $F$ | 0 | 0 |
| $D$ | $F$ | $F$ | 0 | 0 |
| $E$ | $A$ | $A$ | 0 | 0 |
| $F$ | $A$ | $A$ | 0 | 1 |

A conventional state assignment using state variables ( $x, y, z$ ) is shown by the conventional encoding column in Table 3. Since the states are all uni-coded, there are two redundant states. From this state assignment, the excitation functions of the three flip-flips and the output function are:

Table 3: state assignment

| State | Conventional <br> encoding |  | Priority |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
|  | $x$ | $y$ | $z$ | $x$ | $y$ | $z$ |
| $A$ | 0 | 0 | 0 | $\phi$ | 0 | 0 |
| $B$ | 0 | 0 | 1 | $\phi$ | 0 | 1 |
| $C$ | 1 | 1 | 1 | 1 | 1 | 1 |
| $D$ | 0 | 1 | 1 | 0 | 1 | 1 |
| $E$ | 1 | 1 | 0 | 1 | 1 | 0 |
| $F$ | 0 | 1 | 0 | 0 | 1 | 0 |

$D_{x}=\bar{T} x z+\bar{T} \bar{y} z$,
$D_{y}=z$,
$D_{z}=\bar{y}$,
$R=T \bar{x} y \bar{z}$.
We now discuss the priority encoding by using the two redundant states. Since $2=1+1$, two-code assignments for two of the states are possible. To obtain the maximal power saving, the occurrence probabilities of these two states should be as high as possible. Suppose $a, b, c, d, e, f$ are occurrence probabilities of the six states $A, B, C, D, E, F$, and $\tau$ is the probability of input variable $T=1$. In line with the relationship between the present states and the next states, we obtain the following probability relations:

$$
\begin{align*}
& a=e+f \\
& b=a \\
& c=b \cdot(1-\tau)  \tag{4}\\
& d=b \cdot \tau \\
& e=c \cdot(1-\tau) \\
& f=c \cdot \tau+d
\end{align*}
$$

Take the first equality in eqn. 4 as example. From Table 2, we find that the next state will be $A$ if and only if the present state is $E$ or $F$, thus $a=e+f$ is obtained. Other equalities in eqn. 4 are similarly obtained. Due to Eq.(4), we have

$$
\begin{equation*}
a=b=c+d=e+f \tag{5}
\end{equation*}
$$

Furthermore, according to the normalization of probability values, we have:

$$
\begin{equation*}
a+b+c+d+e+f=1 \tag{6}
\end{equation*}
$$

So $a=b=0.25$ are the maximal state probabilities, $c, d$, $e, f$ have lower state probabilities and are dependent on the probability of input $T$. Therefore we chose states $A$ and $B$ to carry out the two-coded assignments. The priority encoding scheme is shown on the right side of table 3. The state variable $y$ represents priority over $x$, that $x$ is redundant and can be suppressed when $y=0$. The excitation functions of flip-flops and the output function by using scheme II are as follows:
$D_{x}=\bar{T} x+\bar{T} \bar{y}$
$D_{y}=z$
$D_{z}=\bar{y}$
$R=T \bar{x} y \bar{z}$
Compared the above equation with eqn.3, $D_{y}, D_{z}$ and $R$
are the same, while $D_{x}$ is clearly simplified. The result is to reduce the area of combinational circuit as well as the power dissipation. Above all, when the system is in state $A$ or state $B, y=0$ can be used to isolate the clock that triggers flip-flop $x$ so as to save $50 \%$ of the energy dissipation of the flip-flop $x$. As for all three flip-flops, the total energy saving is about $16.7 \%$.

## IV. Conclusions

State assignment of sequential circuits influences the complexity of their combinational circuit realization, to which designers attach a lot of importance. The state assignment also influences the switching behavior of the state variables, and hence the power dissipation in these circuits. Previous research has resulted in low power state assignment that would assign codes with minimum Hamming distances to states with high transition probabilities. This paper in contrast proposed a priority-based state assignment technique that exploits the redundant state codes to mask the clock to some of the flip-flops. Priority encoding thus not only eliminates the redundant state codes, but also improves the finite machine reliability. Three practical design examples were presented to show that this technique is feasible and can significantly reduce the energy dissipation.

## References

[1] K. Roy and S. Prasad, 'Circuit activity based logic synthesis for low power reliable operations', IEEE Trans. VLSI Systems, 1(4): 503-513, 1993.
[2] G. D. Hachtel, E. Macii, A Pardo and F. Somenzi, 'Symbolic algorithms steady state probabilities of a finite state machine', in Proc. of Design Test Conf., 214-218, Feb. 1994.
[3] E. Olson and S. Kang, 'State assignment for low-power synthesis using genetic logic local search', in IEEE Proc. Custom Integrated Circuit Conf., 140-143, May, 1994.
[4] G. Hachtel, et al. 'Re-encoding sequential circuits to reduce power dissipation', in Int. Workshop of Low Power Design, Napa, 69-73, Apr. 1994.
[5] L. Benini and G. D. Micheli, 'State assignment for low power dissipation', IEEE J. of Solid-State Circuits, 30(3): 258-268, 1995.
[6] E. Macii, M. Pedram and F. Somenzi, 'High level power modeling, estimation and optimization', IEEE Trans. on Computer Aided Design, 17(11):1061-1079, 1998.
[7] S. H. Unger, ‘Double-edge-triggered flip-flops', IEEE Trans. on Computers, 30(6): 447-451, 1981.
[8] R. Hossain, L. D. Wronski and A. Albicki, 'Low power design using double edge triggered flip-flops', IEEE Trans. on VLSI Systems, 2(2): 261-265, 1994.
[9] M. Pedram, Q. Wu and X. Wu, ‘A new design of double edge triggered flip-flops', in Proc. of ASP-DAC, Yokohama, 417-421, Feb. 1998.
[10] G. E. Tellez, A. Farrah and M. Sarrafzadeh, 'Activity-driven clock design for low power circuits' , in IEEE Proc. ICCAD, San Jose, 62-65, 1995.
[11] L. Benini and G. D. Micheli, 'Transformation and synthesis of FSMs for low power gated clock implementation', in Proc. Int. Symp. on Low Power Design, Dana Point, 21-26, Apr. 1995.
[12] Q. Wu, M. Pedram and X. Wu, 'Clock-Gating and Its Application to Low Power Design of Sequential Circuits', in IEEE Proc. of CICC, Santa Clara, 479-482, May 1997.
[13] X. Wu, J. Wei and M. Pedram, 'Low-power design of sequential circuits using a quasi-synchronous derived clock', in Proc. of $A S P-D A C$, Pacifico Yokohama, Jan. 2000.
[14] F. Prosser and X. Wu, 'Design of the one-zero-hot controller', International Journal of Electronics, 64(3): 399-407, 1988.


[^0]:    * This work was sponsored in part by the NNSF of China (Grant No.69773034) and the NSF of USA (Grant No.9901193).

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
    ISLPED '00, Rapallo, Italy.
    Copyright 2000 ACM 1-58113-190-9/00/0007...\$5.00.

