# A Scalable DCO Design for Portable ADPLL Designs Chia-Tsun Wu, Wei Wang, I-Chyn Wey, and An-Yeu (Andy) Wu Graduate Institute of Electronics Engineering, and Department of Electrical Engineering, National Taiwan University, Taipei 106, Taiwan, R.O.C. Abstract—A novel Digital Controlled Oscillator (DCO) design methodology is presented in this paper. The new design methodology includes a scalable DCO architecture and the developed design flow. With precise analysis in early stage, the design effort of DCO can be reduced significantly. The proposed DCO architecture has the characteristics of, high resolution, flexible operating range, and easy design. The design is suitable for high performance clock generator in System on a Chip (SoC) application. #### I. INTRODUCTION The Phase-Locked Loop (PLL) is a widely used circuit for clock generator as system clock or Clock Data Recovery (CDR). There are many digital building blocks or digital Intellectual Properties (IP) integrated in one chip for SoC application. Traditionally, a PLL is made as an analog or mixed-signal building block [8]. However, to integrate an analog PLL in a digital noisy SoC environment is difficult. Therefore, All-Digital Phase-Locked Loop (ADPLL) is designed using digital design techniques and very suitable for being integrated into SoC chips. A typical diagram of an ADPLL is shown in Figure 1. An ADPLL consists of major blocks: Phase Frequency Detector (PFD), Loop Filter (LF), DCO, and Frequency Divider (FD). The DCO is the most critical component in ADPLL design [1]. As an oscillator, building circuits in DPLL such as voltage controlled oscillator (VCO) and current controlled oscillator (CCO) have been widely studied. However, ring-based inverter chains are mostly adopted in DCO architectures. Figure 1. A typical ADPLL Block diagram. Traditional DCOs are suggested with full custom approach [4][7]. To incorporate the DCO in the HDL simulation, design DCO with full custom design technique has to simulate DCO as an analog component to get reliable estimates of its behavior with SPICE level simulation [1][7]. In addition, variable number of inverters may be used for implementing a variable delay. In [2][3], this technique is used for coarse acquisition. Only the delay of inverters for coarse-acquisition results in hundred picoseconds and gives an inaccurate and unstable phase lock for high-frequency applications. Interesting structures composed of coarse-acquisition and fine-tune are developed to enhance resolution. The improved delay cell is used with selective logic gates and shunted driving cells [1][2] to enhance the resolution. Embedded lookup tables specify loop period relation with respect to control code. But the dependency possibly changes because of different process, voltage, and temperature (PVT) variation cases. Furthermore, lookup tables imply more hardware cost. In this paper, new DCO architecture and developed design flow are both presented to achieve scalability, high performance and wide operating range. The proposed methodology can be analyzed in early stage and easy design. The prototype chip is designed in pure standard-cell library and implemented in UMC's 0.18 1P6M CMOS process. The chip operates in the range of 140 MHz to 1030 MHz. The measured P-P jitter and RMS jitter are 143 ps and 30 ps, respectively. ## II. METHODOLOGY OF STANDARD-CELL BASED DCO The DCO is the most challenging block in ADPLL design, especially with standard-cell library. In this section, a novel architecture of DCO is presented in Figure 2. Our goal is to design DCO capable of handling various standard-cell libraries and implement VLSI architecture in pure standard-cell design flow. The novel DCO design methodology presents process-independent topologies. Figure 2. Proposed DCO architecture It is desirable to decompose DCO into coarse and fine inputs. The oscillating period mainly consists of delays of Fine-Tune-Unit (FTU) and Coarse-Tune-Unit (CTU). Once the first stage of CTU is constructed, the following stages are duplicated. A $BUF_{M+I}$ is added to balance wire load for the last coarse-tune stage. Total loop delay ( $\tau_{DCO}$ ) is shown as eq. (1), where $\tau_{FTU}$ is the total gate delay of fine-tune stage and also the minimum DCO period when switch off all coarse-tune stages. The $\tau_{CTU}$ is the total gate delay of CTU. When N coarse-tune stages are switched on, the $\tau_{CTU}$ can be described as eq. (2), where $\tau_c$ is the timing delay of one coarse-tune stage. $$\tau_{DCO} = \tau_{FTU} + \tau_{CTU} \tag{1}$$ $$\tau_{CTU} = \tau_C \cdot N \tag{2}$$ If there are M coarse-tune stages constructed for CTU, the DCO operates at the minimum frequency when N equals to M. Base on the proposed architecture, we develop a new design flow for DCO design methodology in this section. #### A. Coarse-Tune-Unit Architecture As depicted in Figure 2, direct inspection of the fan-in and fan-out loading in the equivalent circuit of coarse-tune stage gives in eq. (3). The wire load ( $W_F$ , $W_{CI}$ , $W_{C2}$ , $W_{CM-I}$ , $W_{CM}$ ) for each coarse-tune stage is identical to assure equal coarse-acquisition step. The impedance load of one buffer and multiplexer are $Z_{BUFinput}$ and $Z_{MUXinput}$ , respectively. The consistence allows the CTU architecture very suitable for standard-cell based design flow. $$W_F = W_{C1} = W_{C2} = W_{CM-1} = W_{CM} = Z_{RUFinnut} + Z_{MUXinnut}$$ (3) Compared to traditional designs [2][3], the main advantage of the coarse-tune architecture is to reserve the highest frequency output. When oscillating at the highest frequency, the fan-out of FTU is only the first coarse-tune stage but not all coarse-tune stages. In particular, when increase coarse-tune stages for CTU to extend the DCO oscillating range, the architecture proves no overloaded wire that limits maximum frequency. The robust and linear characteristic of CTU architecture allows simple formulation of loop period. As discussion in [1], CTU stage should be designed carefully to minimize rising and falling time to assure step accuracy to reduce the next stage's propagation delay. # B. Fine-Tune-Unit Architecture A new systematic method of improving resolution is proposed for DCO fine-tune-unit. The main idea is using the delay difference of paths. For example, the capacitances and output strengths of different pins are close for a NAND gate in standard-cell library. The timing delay difference from different input pins to the same output pin approximates to the intrinsic delay difference. TABLE I. CELL LIBRARY DATA OF 2-INPUT NAND GATE | Paths | Intrinsic Delay | | | | | | | | |------------------------------|-----------------|-----|----|----|----|----|----|----| | | XL | | X1 | | X2 | | X4 | | | $A \rightarrow Y \uparrow$ | 54 | 85 | 46 | 73 | 41 | 65 | 41 | 65 | | $A \rightarrow Y \downarrow$ | 31 | 0.5 | 27 | /3 | 24 | 03 | 24 | 03 | | $B \rightarrow Y \uparrow$ | 71 | 107 | 62 | 94 | 58 | 88 | 58 | 87 | | $B \rightarrow Y \downarrow$ | 36 | 107 | 32 | 74 | 30 | 00 | 29 | 07 | Complete intrinsic-delay data of 2-input NAND gate is listed in TABLE I which is abstracted from a 0.25 um standard-cell library datasheet. Use those timing differences for basis of fine acquisition. With investigating two paths of different/same NAND gates for possible timing difference, we obtain following differences from TABLE I for NAND gates: 1, 2, 3, 6, 7, 8, 9, 12, 13, 14, 15, 19, 20, 21, 22, 23, 29, 34, and 42ps. Figure 3. Proposed FTU architecture $$\tau_{FTU} = \tau_{FC} + \sum_{i=1}^{K} F_{SELi} \cdot R \cdot 2^{i-1}$$ $$\tag{4}$$ Figure 3 shows the proposed FTU architecture based on our idea of path delay difference. The total loop delay of FTU can be described in eq. (4), where K is the total stages of FTU, $F_{SELi}$ is the i-th control bit, $\tau_{FC}$ is the FTU minimum period, and R is the DCO resolution. There are versatile combinations of logic gate to construct a 2-to-1 multiplexer, for example three NAND or NOR gates. We investigate two paths of gates for possible timing difference. Arithmetically, the proposed work can enhance resolution to 1ps. With the proposed design flow in section II-D, the decision of FTU hardware implementation can be decided in early stage for different resolution or specification without wasted trial time in backend simulation. From our proposed DCO architecture in Figure 2, FTU will limit the highest oscillating frequency. A high speed four-path FTU in Figure 4 can reduce the minimum closed-loop delay time. Compare to Figure 3, the modified architecture can achieve higher frequency output without hardware cost penalty. An implement and measurement results can be shown in section III for this modified architecture. Figure 4. A modify FTU to achieve higher frequency output # C. Hardware Complexity of DCO Controller The control complexity of system influences overall hardware cost. More control wires in DCO also may induce higher possibility of glitch due to different latency of each control line. TABLE II shows the hardware overhead for DCO controller. We deduce some parameters to clarify the tradeoff between resolution and control complexity. Suppose M is the total number of coarse-tune stages in CTU, S is the one coarse-tune stage delay step and R is the DCO resolution. As the DCO architecture in [3], M is also the number of CTU control lines and S is the number of FTU control lines. S is typically more than one hundred picoseconds in 0.18um process and larger in elder process. TABLE II. CONTROLLER HARDWARE COMPLEXITY | Design | JSSC99<br>[3] | ISCAS01<br>[1] | JSSC04<br>[7] | ours | |-------------------|---------------|-------------------|---------------|------------------------------| | DCO<br>wordlength | M+<br>S/R | 2*M+<br>(S/R/4)*M | M*<br>S/R | M+<br>(log <sub>2</sub> S/R) | | Lookup<br>Tables | N | Y | N | N | The proposed work shows many hardware cost can be reduced substantially. Since the controller issue control code in binary order for FTU, this hardware is significantly less complex, in terms of monotonic increasing/decreasing than a lookup table and it is more efficient for future applications for low cost and high speed. # D. The Proposed Design Flow As illustrated in Figure 5, the developed design flow that enables the designer to construct hardware based on the proposed architecture. After parsing standard-cell data, the delay differences are pre-calculated by Matlab program. Iterations start with detailed design of crucial FTU components which result in the maximum frequency and resolution. And it works up towards the hardware decision of CTU components that extends the minimum frequency and frequency range. Refinements of jitter reduction due to voltage variation are both applied to individual FTU and CTU components. Figure 5. Proposed DCO designe flow. #### III. IMPLEMENTATION AND CHIP MEASUREMENTS Based on the proposed methodology, a prototype ADPLL design using UMC 0.18 um 1P6M CMOS technology is implemented as Figure 6. The DCO resolution is about 22 ps. The implementation consists of 16 CTU stages, 4 FTU stages, and $(1+16)*2^4$ acquisition steps; the oscillating frequency operates from 140 MHz to 1030 MHz. The DCO core area is 345 um $\times$ 56 um. Figure 6. Chip layout of the prototype ADPLL and DCO module Post-layout simulations of period and bandwidth according to different control code in three corner cases (SS, TT, FF) are depicted in Figure 7 and Figure 8, respectively. Figure 7. CTU delay linerity Figure 8. DCO oscillating bandwidth versus control code The jitter measurement of chip output is performed on Agilent 86100B Infiniium DCA Wide-Bandwidth Oscilloscope as Figure 9. Due to pad limitations, output clock is divided by 4 and the measured frequency is about 960MHz. The DCO operates at high frequency and both in wide range. TABLE III shows the performance comparisons to recent works. Figure 9. Measured waveform and jitter of the prototype ADPLL oscillating at 960 MHz TABLE III. COMPARISONS OF STANDARD-CELL DESIGN | Design | JSSC04 | ISCAS02 | ISCAS01 | Ours | |----------------------------|--------|---------|---------|------| | | [7] | [2] | [1] | | | Process | 0.35 | 0.35 | 0.25 | 0.18 | | F <sub>min</sub> (MHz) | 152 | 40 | 170 | 140 | | F <sub>max</sub> (MHz) | 366 | 545 | 650 | 1030 | | Jitter <sub>P-P</sub> (ps) | 1200 | 340 | 324 | 143 | | Jitter <sub>RMS</sub> (ps) | | 39 | 25 | 30 | ### IV. CONCLUSION We propose a frequency-scalable DCO design methodology for standard-cell library based design flow. After establishing the database of gate delay, paths delay differences are used for basis of FTU acquisition step. The proposed CTU reserves both linear acquisition step and high oscillating frequency even when operating frequency range extends. With arithmetic and systematic method to construct CTU and FTU hardware in early stage, designer can save wasted trial iterations in the backend flow. Following issues on voltage variation, the optimization of frequency distortion is addressed. The measured P-P jitter and RMS jitter are 143 ps and 30 ps, respectively. The ADPLL design can be easily incorporated into modern SoC design flow due to its digital feature. ## ACKNOWLEDGMENT The authors thank Chip Implementation Center (CIC) for post-layout simulation and IC fabrication. ## REFERENCES - [1] J. Jong and C. Lee, "A novel structure for portable digitally controlled oscillator," IEEE International Symposium on Circuits and Systems, vol.1, pp.272 275, May 2001. - [2] C.Chung and C. Lee, "An all-digital phase-locked loop for high-speed clock generation," IEEE International Symposium on Circuits and Systems, vol. 3, pp.26-29, May 2002. - [3] T. Hsu, B. Shieh and C. Lee, "An all-digital phase-locked loop (ADPLL)-based clock recovery circuit," IEEE Journal of Solid-State Circuits, vol. 34, pp.1063-1073, Aug 1999. - [4] J. Chiang and K. Chen, "A 3.3 V all digital phase-locked loop with small DCO hardware and fast phase lock," IEEE International Symposium on Circuits and Systems, vol. 3, pp.554-557, May 1998. - [5] Pialis and K. Phang, "Analysis of Timing Jitter in Ring Oscillators Due to Power Supply Noise," IEEE International Symposium on Circuits and Systems, vol. 1, pp.I-685 – I-688, May 2003. - [6] A. Abidi and R. G. Meyer, "Noise in Relaxation Oscillators," IEEE Journal of Solid-State Circuits, Vol. 18, pp.794-802, December 1983 - [7] T. Olsson and P. Nilsson, "A digitally controlled PLL for SoC applications," IEEE Journal of Solid-State Circuits, vol. 39, pp.751-760, May 2004. - [8] Zhinian Shu; Ka Lok Lee; Leung, B.H.," 2.4-GHz ring-oscillator-based CMOS frequency synthesizer with a fractional divider dual-PLL architecture," IEEE Journal of Solid-State Circuits, vol:39, pp.452-462, March 2004.