© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

## Citation:

D. Challagundla, M. Galib, I. Bezzam and R. Islam, "Power and Skew Reduction Using Resonant Energy Recycling in 14-nm FinFET Clocks," 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 2022, pp. 268-272, doi: 10.1109/ISCAS48785.2022.9937771.

## DOI:

https://doi.org/10.1109/ISCAS48785.2022.9937771

Access to this work was provided by the University of Maryland, Baltimore County (UMBC) ScholarWorks@UMBC digital repository on the Maryland Shared Open Access (MD-SOAR) platform.

# Please provide feedback

Please support the ScholarWorks@UMBC repository by emailing <u>scholarworks-group@umbc.edu</u> and telling us what having access to this work means to you and why it's important to you. Thank you.

# Power and Skew Reduction Using Resonant Energy Recycling in 14-nm FinFET Clocks

Dhandeep Challagundla, Mehedi Galib Computer Science and Electrical Engineering University of Maryland Baltimore County
Baltimore, Maryland, USA
{vd58139, mgalib1}@umbc.edu

Ignatius Bezzam
IC Design Group
Rezonent Inc.
Milpitas, USA
i@rezonent.us

Riadul Islam

Computer Science and Electrical Engineering
University of Maryland Baltimore County
Baltimore, Maryland, USA
riaduli@umbc.edu

Abstract—As the demand for high-performance microprocessors increases, the circuit complexity and the rate of data transfer increases resulting in higher power consumption. We propose a clocking architecture that uses a series LC resonance and inductor matching technique to address this bottleneck. By employing pulsed resonance, the switching power dissipated is recycled back. The inductor matching technique aids in reducing the skew, increasing the robustness of the clock network. This new resonant architecture saves over 43% power and 91% skew clocking a range of 1-5~GHz, compared to a conventional primary-secondary flip-flop-based CMOS architecture.

*Index Terms*—Clock skew, LC resonance, clock tree architecture, pulsed flip-flops, Power consumption.

#### I. Introduction

Power consumption is one of the major problems faced in the high-performance microprocessor industry [1]–[3]. The need for an increase in performance has steered the operating frequencies higher, resulting in an increased complexity among the microprocessor designs [2], [4], [5]. This higher power led designers to constantly come up with innovative techniques to reduce the power while trying to meet all the design constraints that impact the performance [6]-[9]. A significant portion of dynamic power consumed in a highfrequency design is due to the switching activity in the clock network [10]. To address this, several low power techniques such as dynamic voltage and frequency scaling (DVFS) [11], clock gating [12] and LC resonant clocking [13]-[17], currentmode clocking [18]-[20] are commonly used. Among them, inductor-based LC resonant clocking techniques have great potential to save switching power due to their constant phase and magnitude. There are several LC resonant techniques, such as parallel resonance [13] (please see Fig. 1(a)), intermittent resonance [15] and series resonance [14], [21]. However, most of the LC resonant techniques suffer from a limitation of narrow frequency band and high skew. Moreover, most of the industry-standard electronic design automation (EDA) tools do not explicitly support integrating LC resonance in the clock tree architecture. Additionally, designing a resonant clock architecture requires the designer to have multiple domain expertise due to the non-linear behavior of inductors.

This work was supported in part by the Rezonent Inc. under Grant CORP- 0061 and the UMBC Startup grant.



Fig. 1: LC resonance topologies to reduce dynamic power consumption (a) series resonance topology can address wide frequency band (b) parallel resonance topology can address very narrow frequency band.

To overcome the narrow frequency band, researchers utilize series resonance techniques [14], as shown in Fig. 1(b). This approach uses an inductor placed in the discharge path to store the dissipated energy in the form of a magnetic field. This energy is recycled in the next rising clock edge.

To enable a resonant clock architecture, we need resonant flip-flops (FFs) for synchronous circuits. However, they occupy a substantial chip area consuming high power. Researchers proposed many low-power flip-flops, however, not suitable for resonant operations [22]–[24]. In this research, we propose several conventional register-based pulsed FFs suitable for series resonance and reduce the overall power consumption in the clock network.

Besides power, skew plays a critical role in enabling high-frequency operation. To reduce the skew generated by clock trees, we introduce the first inductor tuning technique to match the series resonance inductor with the load capacitance of the clock tree, which generates pulses with equal resonant frequencies. As the resonant frequency depends on the series inductor and the load capacitance, we generate a constant pulse-width for a wideband (WB) of input clock frequencies. Therefore, calibrating the inductors once for a clock tree architecture enable WB frequency operation.

This paper proposes a clocking architecture to recycle the power dissipated by the synchronous elements using series resonance technology and improve clock skew by adapting the inductor tuning technique. In particular, the main contributions of this work are:

- An architecture to recycle the power dissipated by synchronous elements using series resonance.
- A novel pulse generator with dual-rail booster using series resonance.
- A set of pulsed register-based FFs that exploits the behavior of pulsed series resonance.
- An inductor tuning technique to compensate for the skew in the clock tree architecture.

#### II. BACKGROUND

In a traditional clocking method, half of the switching power is utilized to charge a capacitive node when the clock transitions from 0-to-1. The other half of switching power is dissipated in the discharge cycle when the clock transitions from 1-to-0. The pulsed series resonance (PSR) technique recycles this dissipated energy by placing an inductor in the discharging path. LC resonant is most widely used among several energy recycling techniques as it precisely replicates conventional CMOS clocking. However, it suffers from a higher slew rate while demonstrating great savings in dynamic power consumption [14] [25].

Due to resonance, the free energy swing obtained as a result of recycling energy, is the difference between resonant high output  $(V_{OH})$  and resonant low outupt  $(V_{OL})$  [14] can be expressed as

$$V_{OH} - V_{OL} = \frac{V_{DD}}{2} (1 + e^{-\pi/Q}) - \frac{V_{DD}}{2} (1 - e^{-\pi/2Q})$$
 (1)

where Q is the quality factor of the inductor, which is given by  $Q=\sqrt{L/(CR^2)}$ . We utilize an external power source to pull the output from  $V_{OH}$  to  $V_{DD}$ . The resonant frequency at which the inductor resonates with the load capacitance can be expressed as  $f_{RES}=\frac{1}{2\pi\sqrt{LC}}$ . In this work, we proposed several pulsed-type resonant

In this work, we proposed several pulsed-type resonant FFs with resonant clock trees for PSR operation. Several prior works have focused on low-power FF designs. In [23] an 18T FF was designed to achieve 40% improvement in energy/cycle compared to primary-secondary FF (PSFF). However, to mitigate voltage degradation caused due to the non-complementary topology, the authors used a poly bias technique [22], which requires extra design effort. In [22] an 18T single-phase clocked FF was designed for low power operation. It showed 68% lower power consumption at 0.6V supply but had functionality issues when the voltage was scaled, as reported in [26], [27]. For reliability and robust operation, we implement widely used traditional pulsed register-based and true single-phase clock (TSPC)-based FFs [28].

## III. PROPOSED CLOCK ARCHITECTURE

The proposed architecture comprises a pulse generator, clock drivers utilizing on-chip inductors, and pulsed registers, as shown in Fig. 2.



Fig. 2: The proposed wideband resonant clock tree architecture consists of a system clock source as the root, followed by a pulse generator, multiple PSR drivers with on-chip inductors, clock gaters, clock buffers, and finally, various sets of resonant pulsed FFs in the leaf nodes.

#### A. Pulse Generator

The pulse generator is depicted in Fig. 2 takes the input from the clock source and generates a pulse with boosted amplitude. The series inductor  $L_1$  and the matching capacitance  $C_1$ generate a delay of  $T_d = \pi \sqrt{L_1 C_1}$ . The clock and the delayed signal are fed into an XNOR gate to generate a pulse at both clock edges with a pulse width  $T_d$ . Now, a voltage doubler circuit is employed to invert the generated dual triggered pulse resulting in a boosted signal  $V_{SR}$ . The voltage doubler circuit uses the pulsed series resonance technique to generate a boosted signal. When the  $\overline{V_{sr}}$  is low, the PMOS transistors  $M_1$  and  $M_3$  are "ON," and the inductor resonates with the load capacitance  $C_2$  and the additional PSR capacitance. For large load capacitances, the value of the series inductors is quite small. The inductor in the voltage doubler circuit can be adjusted according to the load of the pulse generator to produce a boosted signal  $V_{SR}$ . We use a dual-rail booster circuit to reduce the power consumed by the voltage doubler by decreasing the resistance of the pull-up network.

## B. Pulsed Series Resonance Driver

The boosted signal  $V_{SR}$  from the pulse generator stage is provided as the input to multiple PSR drivers to generate a pulse signal  $R_{CLK}$ . The inductors on the PSR drivers resonate with the capacitance of the tree to generate a pulse signal that is traversed through many levels of transmission-gate clock gaters and clock buffers. Since we provide a boosted  $V_{SR}$  signal as the input, we obtain a rail-to-rail swing at the output of the PSR driver, and it improves the robustness of the design. Then, the output signal of the PSR driver  $R_{CLK}$  is inverted and supplied to pulsed registers as the clock input signal Pclk.

## C. Proposed Resonant Pulsed Flip-Flops

To utilize the generated Pclk pulse (from the PSR, as shown in Fig. 3(a)), we propose a 13T pulsed FF (13TPFF), as shown in Fig. 3(b). It takes the input data and inverts it to provide it to the transistors M2 and M3, respectively. The M2 and M3



Fig. 3: The proposed series resonant pulsed FFs use (a) A PSR to generate pulse signal to drive the register stage, (b) 13T register, (c) pulsed register, (d) TSPC register to implement three pulsed FFs.

transistors drain are connected to the storage cells where the data is stored as logic "1" or a logic "0". If a "1" is stored in the register, the value at S=1 and  $S_B=0$ . If a "0" is stored, the voltages will be reversed.

When Pclk is "0," the transistor M1 is turned off, wherein the FF is in hold/retain state, and the values of S and  $S_B$  are unaltered. Consider the case when Data=1 and Pclk=1, the transistors M2 and M1 turn on connecting the node  $S_b$  to ground, which then discharges the node and makes it 0, making Q=1 writing a "1" into the register. When Data=0 and Pclk=1, the transistors M3 and M1 turn "ON" and write a "0" at node Q. The FF has an active-low asynchronous reset. The M4 and M5 transistors are turned "ON" and "OFF," respectively, when the Reset signal goes to low resulting in a logic "1" at node  $S_b$  that writes a "0" the output Q.

Along with a 13TPFF, we also design pulsed energy recovery FFs. The pulsed resonant FF (PRFF) is based on the traditional latch, and the resonant TSPCFF is based on the TSPC register [28]. Except, they use pulsed series resonance to take advantage of the input pulse signal  $V_{SR}$  to recycle energy. The energy recovery FFs are positive edge-triggered with asynchronous active low Reset signal. The use of conventional registers makes an easy integration of resonance clock trees into existing clock tree architectures.

#### D. Skew Reduction Methodology

Skew is defined as the spatial variation in the arrival time of clock transition at two different locations. There are several reasons for this skew, and one such cause is different loads on clock drivers [28].

In Fig. 5, the eight different branches of the clock tree are having eight different capacitances  $C_{SR1}$ ,  $C_{SR2}$ , and up to  $C_{SR8}$  due to on-chip variation (OCV). This capacitance mismatch between different branches of a clock tree will result in different clock arrival times. Each branch of the clock tree

TABLE I: The proposed 13TPFF exhibits better set-up time than the PSFF and better hold-time than TSPCFF and PRFF while consuming more dynamic power and area; however, it consume lower static power than PSFF and enables power saving in overall clock architecture.

|    | Types of | Normalised | Delay (ps) |     |    | Static power (pW) |     | Dynamic power (µW) |      |      |      |      |
|----|----------|------------|------------|-----|----|-------------------|-----|--------------------|------|------|------|------|
|    | FF       | area       | C-Q        | ts  | th | D=0               | D=1 | 1GHz               | 2GHz | 3GHz | 4GHz | 5GHz |
| ı  | PSFF     | 1          | 32.5       | 14  | 2  | 1550              | 593 | 8.3                | 14.1 | 21   | 28   | 35.1 |
| Ī  | PRFF     | 0.59       | 35.1       | -95 | 96 | 278               | 272 | 7.16               | 13.8 | 20.4 | 27.1 | 33.8 |
| Ī  | TSPCFF   | 0.84       | 41.9       | -92 | 93 | 283               | 664 | 12.3               | 20.2 | 28   | 35.9 | 43.7 |
| -[ | 13TPFF   | 1.75       | 37.3       | -25 | 60 | 501               | 538 | 16.2               | 31.1 | 46   | 61   | 76   |

represents a separate LC resonant tank. Using the resonant frequency  $f_{RES}=1/2\pi\sqrt{LC}$ , we match the inductors  $L_{SR1}$  to  $L_{SR8}$  with the load capacitances  $C_{SR1}$  to  $C_{SR8}$ , respectively, to have equal frequencies. This inductor matching would result in equal frequency signals in all the clock branches, thus, reducing the skew. The resonant frequency independent of the input clock frequency will not be affected by wide frequency band operation. The primary reason is the  $delay=T_d$  of the pulse generator circuit is independent of clock pulse width, and it works on the clock edges. For all the clock frequencies less than the resonant frequency  $f_{RES}$ , we can have the same inductor value that results in reduced skew. Our results in Section IV-B2 also supported this claim.

## IV. RESULTS AND DISCUSSION

### A. Experimental Setup

The proposed resonant clock tree architecture, as shown in Fig. 5 is implemented using a standard 14nm FinFET technology. Conventional clock tree architecture is used as a reference model to compare with the proposed architecture. Each tree has eight clock drivers, 4K clock gaters, 8K clock buffers, and 32K FFs. The traditional clock tree makes use of transmission gate PSFFs, whereas the resonant clock tree uses pulsed FFs, as shown in Fig. 5(a) and Fig. 5(b), respectively. All the FFs layouts are compatible with a standard cell height of 24 horizontal M2 tracks. All the simulations are performed for frequencies ranging from  $1\ GHz$  to  $5\ GHz$ .

TABLE II: Our proposed PRFF outperforms all the FFs and consumes 43% less power than the conventional PSFF with  $11\times$  improvement in skew, while TSPCFF and 13TPFF consume 26% and 20.2% less power than the PSFF, respectively.

| Types of  | Skew (ps) | Total tree power (mW) |      |      |      |      |  |  |  |
|-----------|-----------|-----------------------|------|------|------|------|--|--|--|
| flip-flop | SKCW (ps) | 1GHz                  | 2GHz | 3GHz | 4GHz | 5GHz |  |  |  |
| PSFF      | 51.1      | 30.8                  | 60.6 | 89.2 | 116  | 138  |  |  |  |
| PRFF      | 4.61      | 17.4                  | 34.1 | 50.3 | 65.7 | 78.7 |  |  |  |
| TSPCFF    | 2.05      | 22.2                  | 43.6 | 64.6 | 84.4 | 102  |  |  |  |
| 13TPFF    | 3.92      | 23.8                  | 46.8 | 69.4 | 90.7 | 110  |  |  |  |

## B. Implementation Results

1) Power and Performance Comparison of Registers: For measuring the performance and functionality of the proposed 13TPFF under process variations, we consider 5000 samples of CLK-to-Q  $(t_{c-q})$  delay using Monte-Carlo simulation.  $\pm 10\%$  variation in the length of all devices is considered while performing the simulations. The  $t_{c-q}$  delay distributions of PSFF, PRFF, TSPCFF, and 13TPFFs are shown in Fig. 4(a), Fig. 4(b), Fig. 4(c), and Fig. 4(d), respectively. Among all the



Fig. 4: Illustration of Monte-Carlo simulation results for various FFs by considering 5000 samples with  $\pm 10\%$  length variation. (a) PSFF, (b) PRFF, (c) TSPCFF, and (d) 13TPFF.



Fig. 5: Clock tree architectures used for functional simulations, (a) conventional clock tree architecture with eight branches and different loads totaling 32k FFs, (b) resonant clock tree architecture replicating the same number of branches and loads as the conventional one.



Fig. 6: Simulation waveforms show a CLK input of  $0.5\ GHz$  is provided to generate a  $1\ GHz\ Pclk$  clock to assert as a clock input to the pulsed FFs.



Fig. 7: 13TPFF layout is implemented using 14 nm FinFET technology following standard cell height, which improves setup-time constraints compared to PSFF.

resonant FFs, the PRFF has lowest mean  $t_{c-q}$  of 35 ps with standard deviation of 0.066 ps.

The normalized layout area,  $t_{c-q}$ , setup times  $(t_s)$ , hold times  $(t_h)$ , and power for the FFs are listed in Table I. Among all the competing FFs, the 13TPFF consumed the highest layout area of 9.62  $um^2$ , which is  $1.75 \times$  the area of PSFF whose area is 5.151  $um^2$ , and  $2.9 \times$  the area of a PRFF whose area is 3.091  $um^2$ . The proposed 13TPFF has a  $t_s$  of -25 ps and a  $t_h$  of 60 ps with a clock-to-q delay of 37.3 ps. Empirically, pulsed register-based FFs exhibit negative  $t_s$ , which tremendously impacts resolving  $t_s$  related timing issues. The PSFF has a  $(t_s)$  of 14 ps,  $(t_h)$  of 2 ps and  $t_{c-q}$  of 32.5 ps. The resonant PRFF has a better  $(t_s)$ of -95 ps but has a high  $(t_h)$  of 96 ps which is similar to the TSPCFF with -92 ps  $(t_s)$  and 93 ps  $(t_s)$ . However, the power consumed by the proposed 13TPFF is 2× higher than the PSFF. Among all the competing FFs, the PRFF consumes the lowest dynamic static and dynamic powers.

2) Clock Tree: The functional simulation for the resonant clock is shown in Fig. 6. We provide 0.5 GHz clock as input clock source shown in Fig. 2. The output of the pulse generator is a boosted signal  $V_{SR}$  of 1 GHz frequency. This  $V_{SR}$  signal is then provided to a PSR driver whose output is  $R_{CLK}$ . This clock signals  $IN_1$  along with the Data signal generates the output Q, as shown Fig. 3(b). We compare the power and skew of the proposed clock architecture, as shown in Table II for frequencies ranging from 1 GHz to 5 GHz. The power consumed by the proposed architecture while using 13TPFFs at 1 GHz frequency is 23.8 mW, compared to a conventional clock tree architecture with PSFF that consumes  $30.8 \, mW$ . The skew generated by the conventional clock tree is  $51.1 \ ps$ . As a result of inductor tuning, the skew generated by the proposed resonant clock architecture is 3.92 ps. The proposed architecture saves 22.7% power using the 13TPFFs compared to conventional tree. The resonant clock tree using TSPCFFs has a skew of 2.1 ps and saves 27.9% power, whereas the clock with PRFFs has a skew of 4.6 ps and saves 43% power compared to conventional clock.

#### V. CONCLUSION

This paper proposed a resonant clock architecture to balance the skew and recycle the power consumed. The proposed architecture with 13TPFF saves 22.7% power with 92% lower clock skew than the conventional clock tree architecture. Furthermore, it saves 43% power with a 91% skew reduction while using the PRFF compared to conventional PSFF-based CMOS clock tree architecture in 14 nm FinFET technology.

#### REFERENCES

- [1] T. Fischer, S. Arekapudi, E. Busta, C. Dietz, M. Golden, S. Hilker, A. Horiuchi, K. A. Hurd, D. Johnson, H. McIntyre, S. Naffziger, J. Vinh, J. White, and K. Wilcox, "Design solutions for the bulldozer 32nm soi 2-core processor module in an 8-core cpu," in 2011 IEEE International Solid-State Circuits Conference, 2011, pp. 78–80.
- [2] A. Cunningham, "Don't buy a desktop pc with one of intel's newest processors—here's why," https://www.nytimes.com/wirecutter/ blog/desktopc-pc-intel-newest-processors/, April 30, 2021.
- [3] A. Kumar Mishra, D. Vaithiyanathan, and U. Chopra, "Design and analysis of ultra-low power 18t adaptive data track flip-flop for high-speed application," *International Journal of Circuit Theory and Applications*.
- [4] N. Kumar and D. P. Vidyarthi, "A novel energy-efficient scheduling model for multi-core systems," *Cluster Computing*, vol. 24, no. 2, pp. 643–666, 2021.
- [5] A. A. Khan, A. Ali, M. Zakarya, R. Khan, M. Khan, I. U. Rahman, and M. A. Abd Rahman, "A migration aware scheduling technique for realtime aperiodic tasks over multiprocessor systems," *IEEE Access*, vol. 7, pp. 27859–27873, 2019.
- [6] H. Jeong, T. W. Oh, S. C. Song, and S.-O. Jung, "Sense-amplifier-based flip-flop with transition completion detection for low-voltage operation," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 4, pp. 609–620, 2018.
- [7] L. Touil, A. Hamdi, I. Gassoumi, and A. Mtibaa, "Design of low-power structural fir filter using data-driven clock gating and multibit flip-flops," *Journal of Electrical and Computer Engineering*, vol. 2020, 2020.
- [8] P. CET, "Review of low power design techniques for flip-flops," *International Journal of Pure and Applied Mathematics*, vol. 120, no. 6, pp. 1729–1749, 2018.
- [9] L. Cherif, M. Chentouf, J. Benallal, M. Darmi, R. Elgouri, and N. Hmina, "Usage and impact of multi-bit flip-flops low power methodology on physical implementation," in 2018 4th International Conference on Optimization and Applications (ICOA), 2018, pp. 1–5.
- [10] J. M. Rabaey, Low power design essentials, 2nd ed. Springer, Jan. 2009
- [11] K. Nowka, G. Carpenter, E. MacDonald, H. Ngo, B. Brock, K. Ishii, T. Nguyen, and J. Burns, "A 32-bit powerpc system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 11, pp. 1441–1447, 2002
- [12] V. Tirumalashetty and H. Mahmoodi, "Clock gating and negative edge triggering for energy recovery clock," in 2007 IEEE International Symposium on Circuits and Systems, 2007, pp. 1141–1144.
- [13] F. u. Rahman and V. Sathe, "Quasi-resonant clocking: Continuous voltage-frequency scalable resonant clocking system for dynamic voltage-frequency scaling systems," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 3, pp. 924–935, 2018.
- [14] I. Bezzam, C. Mathiazhagan, T. Raja, and S. Krishnan, "An energy-recovering reconfigurable series resonant clocking scheme for wide frequency operation," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 7, pp. 1766–1775, 2015.
- [15] H. Fuketa, M. Nomura, M. Takamiya, and T. Sakurai, "Intermittent resonant clocking enabling power reduction at any clock frequency for near/sub-threshold logic circuits," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 2, pp. 536–544, 2014.
- [16] R. Islam, "Low-power resonant clocking using soft error robust energy recovery flip-flops," *Electron Test*, vol. 34, p. 471–485, 2018.
- [17] P.-Y. Lin, H. A. Fahmy, R. Islam, and M. R. Guthaus, "Lc resonant clock resource minimization using compensation capacitance," in *IEEE International Symposium on Circuits and Systems (ISCAS)*, 2015, pp. 1406–1409.
- [18] R. Islam and M. R. Guthaus, "Cmcs: Current-mode clock synthesis," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 3, pp. 1054–1062, 2017.
- [19] M. Guthaus and R. Islam, "Current-mode clock distribution," January 2014
- [20] R. Islam, H. Fahmy, P.-Y. Lin, and M. R. Guthaus, "Differential current-mode clock distribution," in *International Midwest Symposium* on Circuits and Systems (MWSCAS), 2015, pp. 1–4.
- [21] I. Bezzam and N. Rawat, "Digital circuits for radically reduced power and improved timing performance on advanced semiconductor manufacturing processes," July 2021.

- [22] Y. Cai, A. Savanth, P. Prabhat, J. Myers, A. S. Weddell, and T. J. Kazmierski, "Ultra-low power 18-transistor fully static contention-free single-phase clocked flip-flop in 65-nm cmos," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 2, pp. 550–559, 2019.
- [23] F. Stas and D. Bol, "A 0.4-v 0.66-fj/cycle retentive true-single-phase-clock 18t flip-flop in 28-nm fully-depleted soi cmos," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 65, no. 3, pp. 935–945, 2018.
- [24] M.-Y. Tsai, P.-Y. Kuo, J.-F. Lin, and M.-H. Sheu, "An ultra-low-power true single-phase clocking flip-flop with improved hold time variation using logic structure reduction scheme," in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1–4.
- [25] R. Islam, B. Saha, and I. Bezzam, "Resonant energy recycling sram architecture," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 68, no. 4, pp. 1383–1387, 2021.
- [26] G. Shin, E. Lee, J. Lee, Y. Lee, and Y. Lee, "A static contention-free differential flip-flop in 28nm for low-voltage, low-power applications," in 2020 IEEE Custom Integrated Circuits Conference (CICC), 2020, pp. 1–4.
- [27] H. You, J. Yuan, Z. Yu, and S. Qiao, "Low-power retentive true single-phase-clocked flip-flop with redundant-precharge-free operation," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 29, no. 5, pp. 1022–1032, 2021.
- [28] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integrated circuits- A design perspective, 2nd ed. Prentice Hall, 2004.