

# A charge recycling stacked I/O in standard CMOS technology for wide TSV data bus

Takefumi Yoshikawa<sup>1, a)</sup>, Tatsuya Iwata<sup>1</sup>, Junji Shibazaki<sup>1</sup>, Sho Muroga<sup>2</sup>, and Hiroaki Ikeda<sup>3</sup>

**Abstract** This paper describes theoretical approach and proposed scheme of wide data bus architecture using charge-recycling and stacked I/O for signal transmission via TSV (Through Silicon Via). This data bus is assumed for vertical stacked chips of 3D integration. This theoretical approach is based on probability calculation for data stream on pure random pattern. Through the calculation, power reduction ratio to normal data bus (non-charge recycling) is clarified in given conditions for power estimation in early design stage. The proposed scheme for data and clock transmission adopts Local Voltage Stabilizer (LVS) and compact level shifter for capacitor area and clock power reduction. Simulation results show that the proposed 2 story data bus architectures of 64 bits ( $32\times2$ ) and 128 bits ( $64\times2$ ) achieve competitive power efficiency (0.160 pJ/bit) with smaller size (44% to prior work) and normal operation voltage (1 V). These achievements are in dense TSVs ( $40\mu$ m pitch), standard 65 nm process technology and PRBS9 data stream.

Keywords: signal transmission, charge recycling, stacked I/O

**Classification:** Electron devices, circuits and modules (silicon, compound semiconductor, organic and novel materials)

#### 1. Introduction

Because of slow down of technology scaling [1, 2, 3, 4] and several limitation of interconnect performance improvement [5, 6, 7, 8, 9, 10], three dimensional (3D) integration technology has been considered as a solution to boost system performances by tight connection between stacked chips, e.g. a processor and memories [11, 12, 13, 14, 15]. An essence to obtain higher system performance by TSV utilization is a large number of parallel interconnection between the stacked chips [16, 17, 18, 19, 20]. This parallelism has a possibility as shown in Fig. 1 to realize considerably higher data bandwidth beyond 1 Tbit/sec.

On the other hands, high bandwidth data transmission through wide data bus causes power crisis in dense packaging of multiple chips, and power reduction in data transmission should be carefully considered simultaneously in such 3D packaging.

- <sup>1</sup> Dept. of Electrical and Computer Engineering, Toyama Prefectural University, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
- <sup>2</sup> Graduate School of Engineering Science, Akita University, 1–1 Tegatagakuen-machi, Akita, Akita 010–8502, Japan
- <sup>3</sup> Graduate School of Science, Technology and Innovation, Kobe University, 1–1, Rokkodai-cho, Nada-ku, Kobe, Hyogo 657– 8501, Japan
- <sup>a)</sup> tyoshikawa@pu-toyama.ac.jp

DOI: 10.1587/elex.17.20200112 Received March 24, 2020 Accepted April 3, 2020 Publicized April 15, 2020 Copyedited May 25, 2020 In order to reduce the power dissipation of data transmission on wide data bus, several approaches are disclosed, e.g. lowering I/O voltage [20], charge-recycling on differential I/O [21, 22, 23] or parallel I/O [24]. However, [22] needs excessive low voltage (0.14 V) and needs sophisticated regulator. [21, 22, 23] need equalizing period for charge recycling to complement or adjacent transmission line, then high bandwidth of Gbit/sec class can not be achieved. For higher bandwidth with charge recycling, stacked I/O architectures have been disclosed [25, 26]. As shown in Fig. 2, the Stacked I/O [25, 26] has 2-story configuration (4  $\times$  2) and realizes Gbit/sec class data transmission by fast charge transfer from top to bottom I/O via intermediate voltage node (V<sub>REG</sub>).

As for [26], it is difficult to apply to 3D integration due to necessity of level conversion in the middle of TSV. The charge recycling on simple stacked I/O [25] is seemed to be the most suitable scheme to reduce power dissipation of data transmission on wide TSV data bus in 3D chip packaging.

To apply Sacked I/O scheme [25] to wide TSV data bus, there are some difficulties as listed below.

- The stacked I/O cell [25] is too large for wide data bus due to capacitance allocation for intermediate voltage.
- Clock line is out of the I/O stacking and needs full



Fig. 1 3D chip packaging using through silicon vias



Fig. 2 Stacked I/O for charge recycling [25]

swing, therefore clock transmission consumes much more power compared to the data transmission.

 Trench capacitors are used for larger capacitance, and it restricts process variation and requires additional cost.

This paper proposes data transmission scheme using compact Stacking I/O for wide data bus. It covers low power clock transmission with unique level shifter and calculation formula for charge recycling probability in given number of data bus width.

#### 2. Charge recycling for wide data bus

## 2.1 Applicability to wide data bus

A concept of charge sharing between regulator node capacitance ( $C_{REG}$ ) and TSV load capacitance ( $C_L$ ) is depicted in Fig. 3. When  $C_{REG}$  is quite larger than  $C_L$ , i.e. M (a ratio of  $C_{REG}$  and  $C_L$ ) is much greater than 1, then charge from the top driver can be completely utilized on the bottom driver and power dissipation can be reduced to be half (1/2) compared to normal 0.5VDD data swing bus. However, M is actually not so large number in wide data bus. Assuming wide data bus of more than 256 bits, realistic M is about 2.5 considering area impact and normal process (non-trench capacitor) on semiconductor chip. It means regulator voltage ( $V_{REG}$ ) has +/- 0.14VDD (= 140 mV@1 V VDD) distortion at each data cycle. This situation is not acceptable due to performance degradation of drivers. Due to above reason, [26] adopts trench capacitor with paying additional cost.

#### 2.2 Local voltage stabilizer

To compensate small capacitance on V<sub>REG</sub>, we allocate Local Voltage Stabilizer (LVS) on each pair of Top and Bottom Drivers. The LVS has common gate amplifier (Mpt, Mnt, Mpd, Mnd) and pull-up/down transistors (Mp, Mn) as shown in Fig. 4. The common gate amplifiers are biased by Common Bias Generator for gate voltage of pull-up/down transistors (Mp, Mn) to be 0.75VDD and 0.25VDD respectively. In initial state, Mp and Mn are basically off because of  $V_{GS} < V_{TH}$ , and very small DC current (< 5 µA) flows through MP and Mn. When V<sub>REG</sub> goes up through charge injection from Top Driver, Mpd becomes on and drives Mn for fast pull-down of increasing V<sub>REG</sub>. Meanwhile, when V<sub>REG</sub> goes down by discharge to Bottom Driver, Mnt turns on and enables Mp to suppress V<sub>REG</sub> decrease.

Fig. 5 shows actual layout of driver unit, which includes



Fig. 3 Charge recycling model and  $V_{REG}$  distortion

pre-driver, driver and LVS for both floors (Top and Bottom). The size of the driver unit is  $40 \,\mu\text{m} \times 12.56 \,\mu\text{m}$  by 65 nm SOTB process.  $C_L$  is assumed to be around 0.8 pF. Each capacitor of PMOS and NMOS has approximately 1 pF, which is about 1.25 times to  $C_L$ .

Fig. 6 shows simulation waveforms for  $1 \times 2$  bit data bus configured by single driver unit. A supply voltage is 1V and data rate is 1 Gbit/sec considering non-clock recovery system.

As shown in this figure,  $V_{REG}$  distortion is suppressed within  $\pm 50 \text{ mV}$  thanks to LVS. This symptom appears in wider bit bus, e.g.  $4 \times 2$  bits,  $8 \times 2$  bits ... etc., because single driver unit has own LVS and the suppression of the  $V_{REG}$  distortion is basically concluded in the single driver unit basis.

Because LVS can reduce  $V_{REG}$  distortion as stated above,  $V_{REG}$  voltage returns to initial value (0.5 $V_{DD}$ ) within almost single data cycle. However LVS needs additional power consumption due to prompt charge feed to  $V_{REG}$  node in case that charge recycling is not executed in single data cycle. The additional power consumption on LVS is severe



Fig. 4 Local voltage stabilizer and common bias generator





**Fig. 6** Simulation waveforms of  $1 \times 2$  bit data bus

disadvantage to large  $V_{REG}$  capacitance scheme [25], but it can be compensated by parallel charge sharing on wide data bus.

## 3. Efficiency calculation for wide data bus

## 3.1 Precondition

As stated above,  $V_{REG}$  behavior completes in single cycle with or without charge recycling. Therefore, charge recycling can be possible only in case of simultaneous occurrence for following 2 events;

- 1 to 0 data state transition at Top I/O
- 0 to 1 data state transition at Bottom I/O

Considering this manner, power reduction ratio can be calculated by the charge recycling probability. In this calculation, pure random pattern is assumed, and appearance probabilities of all data polarities are quite same.

#### **3.2** Calculation example at $2 \times 2$ bits stacked data bus

At first, we are going to try to calculate an appearance probability at 2 bit data bus (#1 and #2) on Top and Bottom I/O respectively, i.e.  $2 \times 2$  bits stacked data bus assuming pure random data pattern. In Table I, all cases of '0 to 1' transition are listed for 2 bits data bus. As listed in Case 1 to 6 on the table, the number of combinations for '0 to 1' transition at 1 bit on Bottom I/O can be calculated as

$$_2C_1 \times 3^{(2-1)}$$
. (1)

When both bits (#1 and #2) become '0 to 1' simultaneously (Case 7 in Table I), the number of combinations for '0 to 1' transition at 2 bit on Bottom I/O can be calculated as

$$_{2}C_{2} \times 3^{(2-2)}$$
. (2)

By applying same thought to Top I/O, the number of combinations for '1 to 0' transition at 2 bit on Top I/O can be calculated as

$$_{2}C_{1} \times 3^{(2-1)}$$
. (3)

In similar manner, the number of combinations for '1 to 0' transition at 2 bit on Top I/O can be calculated as

$$_{2}C_{2} \times 3^{(2-2)}$$
 (4)

(3) Combination of events for charge recycling

Fig. 7 shows possible case of charge recycling on  $2 \times 2$  bits stacked data bus.

Referring to Eq. (1) to (4), the number of combinations can be calculated for each event in Fig. 7. Summary is in Table II.

Table I Combination of 0 to 1 transition in 2 bits data bus

| Case | #1     | #2     |
|------|--------|--------|
| 1    | 0 -> 1 | 0 -> 0 |
| 2    | 0 -> 1 | 1 -> 0 |
| 3    | 0 -> 1 | 1 -> 1 |
| 4    | 0 -> 0 | 0 -> 1 |
| 5    | 1 -> 0 | 0 -> 1 |
| 6    | 1 -> 1 | 0 -> 1 |
| 7    | 0 -> 1 | 0 -> 1 |

Therefore, a number of combination for charge recycling bits ( $N_{CR 2\times 2}$ ) can be calculated as follows;

$$N_{CR_{2}\times2} = {}_{2}C_{1} \times 3^{(2-1)} \times {}_{2}C_{1} \times 3^{(2-1)} \times 1$$
  
+  ${}_{2}C_{1} \times 3^{(2-1)} \times {}_{2}C_{2} \times 3^{(2-2)} \times 1$   
+  ${}_{2}C_{2} \times 3^{(2-2)} \times {}_{2}C_{1} \times 3^{(2-1)} \times 1$   
+  ${}_{2}C_{2} \times 3^{(2-2)} \times {}_{2}C_{2} \times 3^{(2-2)} \times 2$   
=  $\sum_{p=1}^{2} \sum_{q=1}^{2} {}_{2}C_{p} \times 3^{2-p} \times {}_{2}C_{q} \times 3^{2-q} \times \min(p,q)$  (5)

p and q are number of bits for '1 to 0' transient on Top and '0 to 1' transient on Bottom Driver respectively. This formula is derived from product of a number of recycled bits and a number of combination in Table II.

(4) Calculation of power reduction ratio

To calculate power reduction ratio by the charge recycling, power consumption of non-charge recycling case should be defined. It can be calculated by appearance probability of '0 to 1' transition on 4 bits. A number of combination for '0 to 1' transient bits in 4 bits data bus  $(N_{01_4})$  can be calculated as;

$$N_{01\_4} = \sum_{k=1}^{4} {}_{4}C_k \times 3^{4-k} \times k \tag{6}$$

As for power reduction ratio on the  $2 \times 2$  bits stacked data bus (RR<sub>2×2</sub>), it is derived from a ratio of N<sub>CR\_2×2</sub> and N<sub>01\_4</sub> from Eq. (5) and (6).



Fig. 7 Possible bits for charge recycling on  $2 \times 2$  stacked bus

**Table II** Charge recycling in  $2 \times 2$  bits data bus

| ]     | Event |   | Recycled | Number of Combinations                                         |
|-------|-------|---|----------|----------------------------------------------------------------|
| Fig.7 | 7 eq. |   | Bits     | referred to Chapter 3.2 (1)(2)                                 |
| (a)   | 1     | 3 | 1 bit    | $_{2}C_{1} \times 3^{(2-1)} \times _{2}C_{1} \times 3^{(2-1)}$ |
| (b)   | 1     | 4 | 1 bit    | $_{2}C_{1} \times 3^{(2-1)} \times _{2}C_{2} \times 3^{(2-2)}$ |
| (c)   | 2     | 3 | 1 bit    | $_{2}C_{2} \times 3^{(2-2)} \times _{2}C_{1} \times 3^{(2-1)}$ |
| (d)   | 2     | 4 | 2 bits   | $_{2}C_{2} \times 3^{(2-2)} \times _{2}C_{2} \times 3^{(2-2)}$ |

$$RR_{2\times 2} = \frac{N_{CR_{2}\times 2}}{N_{01_{4}}}$$
$$= \frac{\sum_{p=1}^{2} \sum_{q=1}^{2} 2C_{p} \times 3^{2-p} \times 2C_{q} \times 3^{2-q} \times \min(p,q)}{\sum_{k=1}^{4} 4C_{k} \times 3^{4-k} \times k}$$
(7)

**3.3** Calculation formula for  $N \times 2$  bits stacked data bus As stated above, charge recycling can be possible between TSVs on Top and Bottom floors, and a number of combination for recycling bit in  $N \times 2$  stacked bus can be obtained by extension of Eq. (5) and (6) to N bits.

$$N_{CR_N \times 2} = \sum_{p=1}^{N} \sum_{q=1}^{N} {}_{N}C_p \times 3^{2-p} \times {}_{N}C_q \times 3^{N-q} \times \min(p,q)$$
(8)

$$N_{01_{2N}} = \sum_{k=1}^{2N} {}_{4}C_k \times 3^{2N-k} \times k \tag{9}$$

#### (1) Positive and negative factor consideration

As described above, capacitance is positive factor for increasing the recycling bit. In contrast, wider bit width (N) needs longer wiring of  $V_{REG}$  node, and it causes timing offset between charge injection and subtraction on  $V_{REG}$  node. The timing offset is negative factor, equivalently decreasing the recycling bit. The positive and negative factor on N × 2 bits bus are described as  $a_N$  and  $b_N$  respectively, then Eq. (8) can be expressed as follows.

If  $\geq$ , then

$$N_{CR_N \times 2}$$

$$= \sum_{p=1}^{N} \sum_{q=1}^{N} {}_{N}C_{p} \times 3^{N-p} \times {}_{N}C_{q} \times 3^{N-q} \times (\min(p,q) - b_{N})$$

If < q, then

$$N_{CR_N \times 2} = \sum_{p=1}^{N} \sum_{q=1}^{N} {}_{N}C_p \times 3^{N-p} \times {}_{N}C_q \times 3^{N-q} \times (\min(p,q) + a_N - b_N)$$
(10)

If p is greater than q, the positive factor does not work to increase N<sub>CR</sub> because theoretical maximum number of recycle bit is q.

(2) Positive and negative factor extension

Positive and negative factors  $(a_N, b_N)$  become more significant as the bus width (N) increases. When multipliers for doubling data bus width at  $a_N$  and  $b_N$  are set to  $m_p$  and  $m_n$  respectively, Eq. (10) can be expressed as shown in Fig. 8.

 $\begin{array}{|c|c|c|} \hline N\times 2bit \\ \hline Nx2-bit \ Bus \\ \hline Nx2-bit \ Nx2-bit \\ \hline Nx2-bit \ Bus \\ \hline Nx2-bit \ Nx2-bit \\ \hline Nx2-bit \ Bus \\ \hline Nx2-bit \ Bus$ 

Fig. 8 Number of combination for recycling bit with doubling bus width

These parameters will be derived through simulation results in later chapter. The power reduction ratio of N  $\times$  2 bits (RR<sub>N $\times$ 2</sub>) can be obtained by Eq. (9) and Eq. (10).

$$RR_{N\times2} = \frac{N_{CR\_N\times2}}{N_{01\_2N}} \tag{11}$$

#### 4. Clock management and data reception

#### 4.1 Clock transmission

The pair of clocks is fed to receiver side using stacked I/O configuration to expect frequent charge recycling. Fig. 9 depicts clock waveforms of Top and Bottom drivers. As shown in this figure, clocks have  $0.5V_{DD}$  voltage swing likely to data transmission. The clocks are complementary and equalized on  $V_{REG}$  node  $(0.5V_{DD})$  in every clock period.

## 4.2 Receiver

Fig. 10 illustrates block diagram of Receiver for  $1 \times 2$  bit bus. The Receiver has level shifter and two comparators.

The transmitted clocks (ClkU, ClkD) are buffered by level shifter in Receiver, and the level shifter generates local clocks (Clk, ClkB) of full voltage swing (V<sub>DD</sub>).

Fig. 11 depicts circuit schematic and incoming clock signal for level shifter.

As shown in the figure, the level shifter can be configured with a small number of transistors and acts as a digital behavior (ON and OFF).

#### 5. Simulation results

Each TSV is modeled by referencing [27, 28, 29, 30]. In this assumption, TSV length, TSV diameter and oxide thickness



Fig. 9 Clock signal transmission



**Fig. 10** Block diagram of receiver  $(1 \times 2 \text{ bit})$ 



Fig. 11 Level shifter and local clock generation

are  $100 \,\mu\text{m}$ ,  $50 \,\mu\text{m}$  and  $0.5 \,\mu\text{m}$  respectively. An effective capacitance of the TSV is about 0.84 pF, which would be equivalent to 5 mm metal wire in [26].

#### 5.1 Power reduction ratio in data transmission

Fig. 12 shows a breakdown of power consumption for N  $\times$  2 bit data bus with and without charge recycling.

To calculate power reduction ratio corresponding to Eq. (11), power consumptions of Data Driver and LVS are took into account in Fig. 12. Fig. 13 shows power reduction ratio of calculation by Eq. (11) and the simulation results.

In the calculation, parameters of  $a_N$ ,  $b_N$ ,  $m_p$  and  $m_n$  are set as Table III for the best fitting to the simulation results. As shown in Fig. 13 and Table III, results of calculation and simulation are well matched using reasonable values of  $a_N$ ,  $b_N$ ,  $m_p$  and  $m_n$ . Hence, the Eq. (11) is useful to estimate power reduction ratio of the charge recycling with given N in this process technology.

## 5.2 Power efficiency

Power efficiency of parallel data bus is defined as power consumption (Jules) per channel bandwidth (bit per second) for single bit data transmission (bit). Fig. 14 shows power efficiencies of proposed scheme (This Work).

In this power efficiency calculation, power dissipation for







Fig. 13 Driver power comparison

Table III Calculation parameters

| a              | 1.1  |
|----------------|------|
| b <sub>8</sub> | 0.25 |
| $m_p$          | 2.3  |
| $m_n$          | 2.45 |

all blocks are considered. As shown in the figure, the power efficiency is improved as bus width becomes wider, but degree of improvement decreases as bus width increases. N should be set to 32 or 64. This figure depicts power efficiencies with (solid line) and without clock (dotted line).

In this simulation, one clock line is allocated to 8 bit data bus for data acquisition. Performances for Reference [20, 25] and this work are summarized in Table IV.

In [25], clock is assumed to have full  $V_{DD}$  swing. In a comparison to [25], the power efficiency is 15% worse in data transmission. This comes from power dissipation on LVS for capacitance reduction. However, considering clock transmission, the power efficiency becomes comparable to [25] thanks to charge recycling and local level shifter on clock line. Ref. [20] has very small size than others and it comes from i) simple CMOS driver (no charge recycling), ii) low TSV capacitance (200 fF) and iii) no voltage regulator for excessive low  $V_{DD}$  (0.14 V). Therefore, the size can not simply be compered to others.

## 6. Conclusion

The formula to calculate power reduction ratio is derived from transition probability for pure random data stream. The reduction ratio is for the proposed charge recycling scheme to normal scheme (non-charge recycling). The formula has good correlation with simulation result of driver's power dissipation, and it gives easy method to perceive the power reduction effect using charge recycling for early stage of system design in 3D chip integration.

The proposed data and clock transmission scheme has a



Fig. 14 Power efficiency comparison

| Table IV | Performance summary |
|----------|---------------------|
|----------|---------------------|

|                                                    | Ref. [20]             | Ref. [25]             | This Work |
|----------------------------------------------------|-----------------------|-----------------------|-----------|
| Driver V <sub>DD</sub> (V)                         | 0.14                  | 0.9                   | 1.0       |
| Process Technology                                 | Standard              | 40nm with             | Standard  |
|                                                    | 45nm                  | Trench Cap.           | 65nm      |
| Charge Recycling                                   | No                    | Yes                   | Yes       |
| Size of Tx & Rx for<br>2bit bus (µm <sup>2</sup> ) | 336                   | 1608.5                | 703.46    |
| Power Efficiency w/o<br>Clk Trans. (pJ/bit)        | 0.13                  | 0.114                 | 0.135     |
| Power Efficiency with<br>Clk Trans. (pJ/bit)       | 0.173<br>(Estimation) | 0.152<br>(Estimation) | 0.160     |

potential to be comparable power efficient (0.16 pJ/bit) with prior work (0.152 pJ/bit) of more advanced and expensive technology [25]. The power efficiency can be achieved by wide bit bus (>  $32 \times 2$  bits) and charge recycling for data and clock transmission. The size of proposed Tx and Rx macro has 56% smaller size to the prior work thanks to Local Voltage Stabilizer (LVS) and simple comparator with compact level shifter. The proposed scheme suggests one promising candidate for the data transmission methodology in 3D chip integration through TSVs.

#### Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP17K00090. This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Cadence Design Systems, Inc., Synopsys, Inc., Mentor Graphics, Inc. and Renesas Electronics Corporation.

#### References

- The International Technology Roadmap for Semiconductors (ITRS) (2015) https://www.semiconductors.org/
- [2] M.T. Bohr and I.A. Young: "CMOS scaling trends and beyond," IEEE Micro 37 (2017) 20 (DOI: 10.1109/MM.2017.4241347).
- [3] J. Meindl, et al.: "Limits on silicon nanoelectronics for terascale integration," Science, 293 (2001) 2044 (DOI: 10.1126/science.293. 5537.2044).
- [4] S. Borkar, "Design challenges of technology scaling," IEEE Micro 19 (1999) 23 (DOI: 10.1109/40.782564)
- [5] E.F. Rent: "Microminiature packaging logic block to pin ratio," Memoranda 28 (1960).
- [6] K. Hardee, et al.: "A 1.43GHz per data I/O 16Mb DDR low-power embedded DRAM nacro for a 3D graphics engine," ISSCC Dig. Tech. Papers (2001) 386 (DOI: 10.1109/ISSCC.2001.912685).
- [7] H. Pilo, et al.: "A 5.6ns random Cycle 144Mb DRAM with 1.4Gb/s/pin and DDR3-SRAM interface," ISSCC Dig. Tech. Papers (2003) 308 (DOI: 10.1109/isscc.2003.1234311).
- [8] E. Mensink, *et al.*: "Power efficient gigabit communication over capacitively driven RC-limited on-chip interconnects," IEEE J. Solid-State Circuits 45 (2010) 447 (DOI: 10.1109/jssc.2009.2036761).
- [9] B. Kim, et al.: "An energy-efficient equalized transceiver for RCdominant channels," IEEE J. Solid-State Circuits 45 (2010) 1186 (DOI: 10.1109/jssc.2010.2047458).
- [10] J. Seo, et al.: "High-bandwidth and low-energy on-chip signaling with adaptive pre-emphasis in 90nm CMOS," ISSCC Dig. Tech. Papers (2010) 182 (DOI: 10.1109/isscc.2010.5433993).
- [11] E.J. Marinissen, et al.: "A structured and scalable test access architecture for TSV-based 3D stacked IC," Proc. 2010 28th IEEE VLSI Test Symposium (2010) 269 (DOI: 10.1109/VTS.2010.5469556).
- [12] S. Takaya, *et al.*: "Diagnosis of signaling and power noise using in-place waveform capturing for 3D chip stacking," IEICE Trans. Electron. E97-C (2014) 557 (DOI: 10.1587/transele.E97.C.557).
- [13] M.G. Farooq, et al.: "3D copper TSV integration, testing and reliability," IEEE IEDM (2011) 143 (DOI: 10.1109/IEDM.2011.6131504).
- [14] T. Dickson, *et al.*: "An 8× 10-Gb/s source-synchronous I/O system based on high-density silicon carrier interconnects," IEEE Symp. VLSI Circuits (2011) 80.
- [15] W.R. Davis, *et al.*: "Demystifying 3D ICs: the pros and cons of going vertical," IEEE Des. Test Comput. **22** (2005) 498 (DOI: 10.1109/mdt. 2005.136).
- [16] J.-S. Kim, et al.: "A 1.2V 12.8GB/s 2Gb mobile wide-I/O DRAM with 4 × 128 I/Os using TSV-based stacking," 2011 IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (2011) 496 (DOI: 10.1109/isscc. 2011.5746413).
- [17] J. Jeddeloh and B. Keeth: "Hybrid memory cube new DRAM architecture increases density and performance," IEEE Symp. VLSI

Circuits (2012) 87 (DOI: 10.1109/VLSIT.2012.6242474).

- [18] J. Roullard, *et al.*: "Evaluation of 3D interconnect routing and stacking strategy to optimize high speed signal transmission for memory on logic," IEEE Electronic Components and Technology Conf. (2012) 8 (DOI: 10.1109/ECTC.2012.6248798).
- [19] F. O'Mahony, *et al.*: "A 47× 10 Gb/s 1.4 mW/Gb/s parallel interface in 45nm CMOS," IEEE J. Solid-State Circuits **45** (2010) 2828 (DOI: 10.1109/jssc.2010.2076214).
- [20] Y. Liu, et al.: "A compact low-power 3D I/O in 45nm CMOS," ISSCC Dig. Tech. Papers (2012) 142 (DOI: 10.1109/isscc.2012.6176900).
- [21] H. Yamauchi, et al.: "An asymptotically zero power charge-recycling bus architecture for battery-operated ultrahigh data rate ULSI's," IEEE J. Solid-State Circuits 30 (1995) 423 (DOI: 10.1109/4.375962).
- [22] K. Hardee, et al.: "A 170GB/s 16Mb embedded DRAM with databus charge-recycling," ISSCC Dig. Tech. Papers (2008) 272 (DOI: 10.1109/isscc.2008.4523162).
- [23] H. Yamauchi, *et al.*: "A low power bus architecture with local and global charge recycling bus techniques for battery operated ultra-high data rate ULSI's", IEICE Trans. Electron. E78-C (1995) 394.
- [24] P. Sotiriadis, *et al.*: "Analysis and implementation of charge recycling for deep sub-micron buses," Proc. 2001 International Symposium on Low Power Electronics and Design (2001) 364 (DOI: 10.1145/ 383082.383184).
- [25] Y. Liu, et al.: "A 0.1pJ/b 5-to-10Gb/s charge-recycling stacked lowpower I/O for on-chip signaling in 45nm CMOS SOI," ISSCC Dig. Tech. Papers (2013) 400 (DOI: 10.1109/ISSCC.2013.6487787).
- [26] J.M. Wilson, et al.: "A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring," ISSCC Dig. Tech. Papers (2016) 156 (DOI: 10.1109/ISSCC.2016.7417954).
- [27] J. Kim, et al.: "High-frequency scalable electrical model and analysis of a through silicon via (TSV)," IEEE Tran. Compon. Packag. Manuf. Technol. 1 (2011) 181 (DOI: 10.1109/tcpmt.2010.2101890).
- [28] C. Ryu, *et al.*: "High frequency electrical model of through wafer via for 3-D stacked chip packaging," Electron. Syst. Integration Technol. Conf. 1 (2006) 215 (DOI: 10.1109/ESTC.2006.280001).
- [29] C. Bermond, et al.: "High frequency characterization and modeling of high density TSV in 3D integrated circuits," Proc. 13th IEEE Workshop Signal Propagation Interconnects (2009) 1 (DOI: 10.1109/ spi.2009.5089840).
- [30] I. Savidis, *et al.*: "Electrical modeling and characterization of through-silicon vias (TSVs) for 3-D integrated circuits," Microelectron. J. **41** (2010) 9 (DOI: 10.1016/j.mejo.2009.10.006).