Optimal utilization of adjustable delay clock buffers for timing correction in designs with multiple power modes
Introduction
Clock is one of the most important signals on a chip of synchronous based system, as all the synchronous components on the chip such as flip-flops (FFs) rely on it. Clock tree is a commonly used structure of circuits that distributes the clock signal from the clock source to all the clock sinks (e.g., FFs and latches), where the clock signal is required. It is imperative that the maximum of the arrival time difference between the clock sinks, which is known as global clock skew, should be maintained under a certain bounded value typically within 10% of the clock period, as a large clock skew may cause timing violation on the circuits. (If no confusion occurs, the global clock skew is simply referred to as clock skew in this presentation.)
Many research works on the clock tree optimization such as clock routing, clock buffer insertion/sizing, and wire sizing have been performed to control or minimize the clock skew [1], [2], [3], [4], [5], [6], [7]. While these approaches were effective, advanced low power design techniques introduced new challenges to the clock skew control problem. Specifically, for multiple power mode designs, where the supply voltage to the circuit components varies dynamically depending on modes, the clock arrival time also varies accordingly.
Even though the previous works can consider the clock skew constraint on every power mode, it would be highly likely that the resulting clock tree uses a substantially long wirelength or there exists no clock tree that satisfies the clock skew constraint on every power mode. On the other hand, post-silicon tuning (e.g., [8], [9], [10], [11]) such as inserting adjustable delay buffers (ADBs) is a widely used method to deal with the timing problem caused by process and environment variations. Because the delay of an ADB can be controlled by its delay control inputs [12], the clock skew variation caused by process variation can be tuned by properly inserting ADBs after the manufacturing stage has been completed. The idea of using ADBs in multiple power modes is to replace some of normal clock buffers with ADBs so that the clock skew constraint on each power mode can be met; when the power mode changes during execution, e.g., from power mode mode-1 to power mode mode-2, the delays of ADBs in clock tree that have been adjusted under mode-1 are readjusted to meet the clock skew constraint under mode-2. Since ADB logic component is much bigger than normal buffer and it requires control line as well as switching logic, the set of related problems to be solved for the ADB-based clock skew optimization in multiple power modes are allocating a minimum number of ADBs, finding the normal buffers (or locations) in the clock tree that are to be replaced by ADBs, and determining the delay value of ADBs to be assigned on each power mode. We call these problems collectively ADB allocation problem.
Su et al. [13], [14] proposed a linear-time optimal algorithm for the delay assignment problem and exploits the algorithm to solve the rest of two subproblems of the ADB allocation problem heuristically in a greedy manner. Lin et al. [15] proposed an efficient algorithm of two-stage approach which performs a top-down ADB allocation followed by a bottom-up ADB elimination. Even though the approach reduces the run time over that in [13], [14], it still does not guarantee an optimality of ADB allocation. Lim and Kim [16] proposed a linear-time algorithm for the ADB allocation problem where they solved the problem optimally for each power mode. However, merely collecting the optimal results on individual power modes does not mean globally optimal for all power modes. In this work, we revisit the ADB allocation problem and propose a set of solutions to overcome the limitation of the previous works. More precisely, we propose (1) an O() time algorithm that optimally solves the problem of minimizing the number of ADBs to be allocated for all power modes with continuous delay of ADBs and (2) enables solving the ADB allocation problem with discrete delay of ADBs to be greatly simple and predictable. In addition, we propose an effective solution to an important extended problem: (3) the ADB allocation problem combined with buffer sizing. (A preliminary version, which contains concise descriptions and no proofs, of our work can be found in [17].)
It should be mentioned that the work in [16] is completely different from our proposed optimal algorithm by a simple reasoning: For example, [16] requires optimally two ADBs, each in clock nodes 1 and 2, for power mode 1 while requiring optimally two ADBs, each in nodes 3 and 4, for power mode 2. Thus, the combined ADB allocation is four ADBs, each in nodes 1, 2, 3, and 4 to meet timing for all power modes. On the other hand, ours produces an optimal ADB allocation result considering power modes all together. The globally optimal allocation may be three ADBs (i.e., not four ADBs), say, each in nodes 1–3. This reasoning clearly foresees that as the number of power modes increases, the gap (i.e., ADB difference) between [16] and ours will increase.
The rest of the paper is organized as follows. Section 2 illustrates the structure of ADB implementation and shows an example of using ADBs for timing correction. Section 3 defines the ADB allocation problem and shows an example to motivate the work. Then, Section 4 proposes an optimal algorithm of ADB allocation with continuous delay values and a modification of the algorithm to support ADBs with discrete delay values. Section 5 proposes a solution to the extended problem of integrating buffer sizing into ADB allocation. Experimental results are provided in Section 6 to show the effectiveness of our proposed ADB allocation algorithms. Finally, a conclusion of the work given in Section 7.
Section snippets
ADB structure and example of ADB utilization
Fig. 1 shows the structure of a capacitor bank based implementation of ADB [18]. This implementation of a well-known capacitor bank based ADB consists of two inverters at the input and output ports, and in the middle there is an array of capacitors with switch transistors attached. The switches are controlled by the capacitor bank controller, which controls the number of active capacitors according to the control bits. Activating more capacitors increases the total capacitance between the two
Problem definition and motivation
The problem of ADB allocation in a clock tree can be described as Problem 1 ADB allocation problem: Given a synthesized clock tree, arrival times of clock sinks in each power mode m, and clock skew bound κm in each power mode m, replace the least number of clock buffers with ADBs and assign delays to the ADBs to satisfy κm in every power mode m.
Two common features of the previous ADB allocation algorithms [13], [14], [16] are that they resolve the clock skew violation by synchronizing the earliest arrival
Optimal ADB allocation
This section describes our proposed ADB allocation algorithms with continuously and discretely adjustable values of ADBs. The notations commonly used in the presentation is summarized in Table 1.
Extension: integration of buffer sizing
We can think of buffer sizing as an ADB allocation imposed by the restriction that the α values in power modes are pre-defined. For example, when a buffer bi in the input clock tree is going to be replaced by a buffer bufj in the buffer library (rather than an ADB), the delay number in each power mode may be increased or decreased, but the number is fixed, which means un-controllable, unlike ADB. Let be the delay increase or delay decrease in power mode m caused by the replacement of
Experimental results
The proposed algorithm ADB-Pullup (continuous delay), ADB-Pullup-Q (discrete delay), and ADB-Pullup-BS (combining buffer sizing) have been implemented in Python 3 language on a Linux machine with 8 cores of 3.50 GHz Intel i7 CPU and 16 GB memory. ISCAS׳95 and ITC׳99 benchmarks were synthesized with Synopsys IC Compiler with 45 nm Nangate Open Cell Library. ISPD׳09 benchmarks were synthesized using the algorithm in [20]. Each benchmark was partitioned into 6–10 power domains which are able to
Conclusions
In this paper, we proposed a polynomial-time optimal algorithm to the problem of ADB allocation on clock trees for the continuous ADB delay. Then, based on the algorithm, we proposed a much simple and predictable solution to the ADB allocation problem for the discrete ADB delay. In addition, we proposed an effective solution to the combined problem of ADB allocation and buffer sizing. From the experimental results on benchmarks, it was shown that compared to the results by the best-known ADB
Acknowledgments
This research was supported by the ITRC program of ITTP by MSIP (IITP-2015-H8501-15-1005) in Korea, NRF grant funded by the MSIP (2015R1A2A2A01004178), Brain Korea 21 Plus Project in 2015, and the Global Ph.D. Fellowship Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013H1A2A1032650).
References (20)
- C.J. Alpert, A. Devgan, S.T. Quay, Buffer insertion with accurate gate and interconnect delay computation, in:...
- J. Cong, C. Koh, K. Leung, Simultaneous buffer and wire sizing for performance and power optimization, In: Proceedings...
- et al.
An efficient and optimal algorithm for simultaneous buffer and wire sizing
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(1999) - I.-M. Liu, T.-L. Chou, A. Aziz, M.D.F. Wong, Zero-skew clock tree construction by simultaneous routing, wire sizing and...
- T. Okamoto, J. Cong, Buffered Steiner tree construction with wire sizing for interconnect layout optimization, In:...
- et al.
Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing
IEEE Trans. Comput.-Aided Des. Int. Circuits Syst.
(2004) - et al.
General skew constrained clock network sizing based on sequential linear programming
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(2005) - S. Hu, J. Hu, Unified adaptivity optimization of clock and logic signals, In: Proceedings of IEEE/ACM International...
- V. Khandelwal, A. Srivastava, Variability-driven formulation for simultaneous gate sizing and post-silicon tunability...
- J.-L. Tsai, L. Zhang, Statistical timing analysis driven post-silicon-tunable clock-tree synthesis, In: Proceedings of...
Cited by (2)
A novel PDWC-UCO algorithm-based buffer placement in FPGA architecture
2017, International Journal of Circuit Theory and Applications