

# Blind-oversampling adaptive oversample-level DFE receiver for unsynchronized global on-chip serial links

# Won-Hwa Shin<sup>1, 2</sup>, Young-Hyun Jun<sup>2</sup>, and Bai-Sun Kong<sup>1a)</sup>

<sup>1</sup> College of Information & Communication Engineering, Sungkyunkwan Univesity, Suwon, 440–746, Korea

<sup>2</sup> Memory Division, Samsung Electronics, Hwaseong, 445–701, Korea

a) bskong@skku.edu

**Abstract:** Blind-oversampling adaptive oversample-level decision feedback-equalized (DFE) receiver is presented for use in global onchip serial links. The blind oversampling is adopted to avoid receiver synchronization for reliable channel data reception, and the adaptive oversample-level DFE is used to reduce data-dependent jitter and ease oversampling data recovery regardless of PVT variations. Test results in a  $0.13 \,\mu\text{m}$  CMOS process indicated that the proposed approach achieves up to 37.5% improvement on data rate as compared to conventional approaches, and verified operation at 2.80-Gb/s data rate with 2.31-pJ/bit energy over a 10-mm-long lossy global on-chip interconnect.

**Keywords:** adaptive DFE, blind-oversampling, on-chip serial link **Classification:** Integrated circuits

#### References

- R. Ho, K. W. Mai, and M. A. Horowitz, "The future of wires," *Proc. IEEE*, vol. 89, no. 4, pp. 490–504, April 2001.
- [2] S. Kimura, T. Hayakawa, T. Horiyama, M. Nakanishi, and K. Watanabe, "An On-Chip High Speed Serial Communication Method Based on Independent Ring Oscillators," *ISSCC Tech. Dig.*, pp. 390–391, 2003.
- [3] L. Zhang, J. M. Wilson, R. Bashirullah, L. Luo, J. Xu, and P. D. Franzon, "A 32-Gb/s On-Chip Bus with Driver Pre-Emphasis Signaling," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 17, no. 9, pp. 1267–1274, Sept. 2009.
- [4] A. P. Jose, G. Patounakis, and K. L. Shepard, "Pulsed Current-Mode Signaling for Nearly Speed-of-Light Intrachip Communication," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 772–780, April 2006.
- [5] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, and B. Nauta, "Low-Power, High-Speed Transceivers for Network-on-Chip Communication," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 17, no. 1, pp. 12–21, Jan. 2009.
- [6] S. Hoppner, D. Walter, H. Eisenreich, and R. Schuffny, "Efficient Compensation of Delay Variation in High-Speed Network-on-Chip Data





Links," Proc. ISSoC, pp. 55-58, Sept. 2010.

## **1** Introduction

In a synchronous system-on-a-chip (SoC) with large integration, high-speed data transmission through global on-chip buses is hindered by problems due to skew and crosstalk between bus lines [1]. These problems may limit the data rate of on-chip buses, and increase chip-level design cost due to increased wiring complexity and area overhead. To overcome these problems, a high-speed on-chip serial link can be used [2]. However, highspeed data transmission through a serial link can accompany severe intersymbol interference (ISI) due to limited bandwidth of long interconnects [3]. Moreover, receiver synchronization has recently become a very important issue [4, 5]. Since there is no timing correlation between long onchip interconnects and the global clock, the traditional approach of sampling the channel data by an unaligned global clock may lead to an increased BER due to unavoidable timing deviation of the sampling point.

For the synchronization of on-chip serial links, several approaches have been proposed [2, 4, 5]. The approach in [2] requires a local oscillator in each sender and receiver, and is simply synchronized using these oscillators. But, if the delay of the local strobe from the sender is not exactly matched to that of the transmitted data, a synchronization failure can occur because the clock generated from the receiver oscillator cannot be perfectly aligned with the data to be received. Moreover, the frequency offset between the ring oscillators in the sender and receiver limits the number of data bits to be transferred. The approach in [4] uses a calibration sequence for adjusting the global clock to put the sampling edge at the center of the incoming data. But, simple training bit streams used in this approach cannot represent a variety of data-dependent jitter patterns generated by complex real data sequence. Moreover, the sampling clock phase selected by the calibration during start-up may not applicable when the eye-opening is changed due to voltage and temperature variations. In the sourcesynchronous approach in [5], the delay skew between clock line and data lines, which is caused by crosstalk and mismatch between lines, is recognized as a critical issue, and a huge design effort is required to reduce the skew.

To avoid the receiver synchronization issue described above and to boost the data rate of global on-chip serial links, a blind-oversampling adaptive oversample-level decision-feedback equalized (DFE) receiver is proposed. The remaining sections are described as follows. The architecture of the proposed receiver is described in Section 2. Section 3 presents the measurement results, and conclusions are given in Section 4.

### **2** Architecture

Fig. 1 shows the overall architecture of the proposed receiver. It is composed of a DFE oversampler, a data recovery block, and a DFE adaptation block. The DFE oversampler performs a 3x blind oversampling for the data received from the channel. It also performs an oversample-level DFE where







Fig. 1. Proposed blind-oversampling adaptive oversamplelevel decision feedback-equalized (DFE) receiver

the equalization is done separately for each of the oversamples. The data recovery block collects transition information from the data sampled by the DFE oversampler, and picks a single sample as the recovered data. The DFE adaptation block determines the tap weight values of the oversamplelevel DFE for adaptive equalization.

#### 2.1 DFE oversampler

Fig. 2 shows the structure of each block in the proposed receiver. As seen in Fig. 2(a), the DFE oversampler is composed of three samplers, SA1~SA3.  $CK1 \sim CK3$  are the sampling clocks with equal phase difference from each other for defining the sampling points of the samplers. Oversampled output,  $OUT1 \sim OUT3$ , are sent to the data recovery block for the selection of the recovered data, and are also fed into the samplers themselves for oversample-level DFE operation. A sampler depicted at upper right corner of Fig. 2(a) is composed of a front amplifier, a DFE summer, and a flip-flop. After being amplified by the front amplifier, the signal is summed by the DFE summer with a weighted version of the previous oversampled output, whose weight is controlled by the tail current  $I_{WEIGHT}$ . The flip-flop captures the resulting signal as a new output.

Fig. 2(a) also shows the timing diagram for the operation of the DFE oversampler.  $OUT1_i$ ,  $OUT2_i$ , and  $OUT3_i$  are the outputs of samplers, SA1, SA2 and SA3, sampled at *i*th falling edges of CK1, CK2 and CK3, respectively. These outputs are multiplied with the DFE tap weight  $\alpha$ , and summed with next data samples,  $S1_{i+1}$ ,  $S2_{i+1}$ , and  $S3_{i+1}$ , to obtain  $OUT1_{i+1}$ ,  $OUT2_{i+1}$ , and  $OUT3_{i+1}$ , respectively. For instance, sample  $S1_{i+1}$  and previous output  $OUT1_i$  multiplied by tap weight  $\alpha$  are to be summed together in SA1 during (i+1)-th high level of CK1.  $OUT1_{i+1}$  is then obtained at (i+1)-th falling edge of CK1 by the flip-flop in SA1. Similarly, sample  $S2_{i+1}$   $(S3_{i+1})$  and previous output  $OUT2_i$   $(OUT3_i)$  multiplied by  $\alpha$  are summed by the DFE summer in SA2 (SA3) during (i+1)-th high level of CK2 (CK3), and the equalized output  $OUT2_{i+1}$   $(OUT3_{i+1})$  is determined at the falling edge of CK2 (CK3).

#### 2.2 Data recovery and DFE adaptation

Fig. 2(b) shows the structure of the data recovery block. The data transition detector detects the transitions of the oversampled data  $(OUT1 \sim OUT3)$  by XORing adjacent two samples. The data transition information collected in each sampling phase is accumulated in a 4-bit shift register. The comparator then compares the transition counts, and finds the oversampling time interval where the largest number of transitions has occurred. The data picker then picks a sample that is obtained by the clock phase farthest away from the interval having the largest transition count.







Fig. 2. Structures of (a) DFE oversampler and its timing diagram, (b) data recovery block, and (c) DFE adaptation block

For instance, if the oversampling time interval having the largest number of transitions is between CK1 and CK2, the recovered output is selected to be the data sampled by CK3.

The tap weight adaptation for the proposed receiver is carried out by the DFE adaptation block consisting of an adaptation controller and a tap weight controller, as seen in Fig. 2(c). The DFE adaptation block utilizes the transition count information received from the data recovery block for finding a suitable tap weight for the DFE oversampler. Due to ISI-induced edge distortion, transitions of the oversampled data tend to spread out over three oversampling time intervals defined by three sampling clock phases. If ISI is removed by a proper equalization, data transitions can converge into a single time interval. However, when there is a severe ISI on the channel, it





is very difficult for all transitions to be collected in a single interval. Furthermore, trying to gather all data transitions into a single interval may lead to worsening BER due to overcompensation for edge distortion. To alleviate the problem, the proposed DFE adaptation scheme adjusts the tap weight to repel transitions from a specific interval to find an oversampling time interval having zero transition (OTIZT). After tap weight initialization at the beginning of the adaptation, the tap weight controller increments the tap weight value by up-counting the 4-bit counter. As the tap weight value increases, data transitions that have been spread out among three time intervals tends to move toward a single interval, leaving one oversampling time interval to have no transition. After fixing the tap weight as soon as OTIZT is found, the adaptation controller increments or decrements the tap weight whenever any data transition appears in OTIZT. Tap weight current  $I_{WEIGHT}$  thus obtained is sent to the samplers in the DFE oversampler for DFE adaptation.

#### **3** Measurement results

To assess the performance, the proposed receiver was designed in a  $0.13 \,\mu\text{m}$ CMOS process. A low-swing differential current-mode signaling with bridge resistor termination [3] was used for data transmission through 10-mm-long on-chip interconnects whose width of 0.8  $\mu$ m and spacing of 1.2  $\mu$ m in metal-4 layer. The wires of metal-3 and metal-5 layers are perpendicularly routed as another interconnects with the same pitch. Fig. 3(a) compares simulated BER of the conventional synchronization approaches like the link calibration [4] and the source-synchronous [5], and the proposed approach described in this paper. In order to evaluate BER performance, a behavioral model for each transceiver including on-chip channel by MATLAB was used. As a practical simulation condition for the source-synchronous synchronization, it is assumed that clock delay is mismatched to data delay by 0.1 UI at 3.0 -Gb/s [6]. The data rate of the proposed receiver for providing BER less than 1E-5 is as much as 3.30-Gb/s whereas those of the conventional link calibration and source-synchronous approaches are 2.58-Gb/s and 2.40-Gb/s, indicating improvements of 27.9% and 37.5%, respectively.

For verifying the practical applicability, a global on-chip transceiver with the proposed receiver was fabricated in a 0.13  $\mu$ m CMOS process. The photomicrograph of the test chip is shown in Fig. 3(b), which includes a transmitter, a 10-mm-long on-chip interconnect, the proposed receiver, a PRBS generator, a BER test unit. The layout area of the proposed receiver is 59.6 × 68.4  $\mu$ m<sup>2</sup>. Fig. 3(c) shows measured BER of the proposed receiver. With adopting only the blind oversampling for avoiding receiver synchronization, the data rate for providing BER less than 1E-12 is 2.13-Gb/s. By adopting the oversample-level DFE, the data rate is increased to 2.80-Gb/s, indicating 31.5% improvement. Fig. 3(d) shows the shmoo plot where the proposed receiver with the oversample-level DFE works well down to 1.06 V supply voltage for achieving a 2.1-Gb/s data rate. Without the DFE, it has to operate at 1.17 V supply. The advantage of increased supply margin not only guarantees a reliable data transmission regardless of supply noise but also leads to low power consumption.







Fig. 3. Experimental results: (a) simulated BER of synchronization methods, (b) photomicrograph of test chip, (c) measured BER versus data rate, (d) shmoo plot

# 4 Conclusion

A blind-oversampling receiver with adaptive oversample-level DFE is proposed for use in high-speed on-chip serial links. It can avoid receiver synchronization by the blind oversampling, and boost data rate by the adaptive oversample-level DFE. The proposed receiver was verified using a test chip in a  $0.13 \,\mu\text{m}$  CMOS process, whose experimental results indicated up to 31.5% improvement on BER performance and showed the improvement in terms of supply voltage margin.

## Acknowledgements

This work was supported by Samsung Electronics. Design tools and IC fabrication were supported by IDEC at KAIST.

