LETTER

# An efficient radix-4 Quasi-cyclic shift network for OC-LDPC decoders 

Sabooh Ajaz and Hanho Lee ${ }^{\text {a) }}$<br>Dept. of Information and Communication Engr., Inha University, Incheon, 402-751, Korea<br>a) hhlee@inha.ac.kr


#### Abstract

A Radix-4 Quasi-cyclic shift network (QSN) for reconfigurable QC-LDPC decoders is presented in this paper. A complexity reduction technique is described to reduce the total gate count at each stage in addition to the fact that Radix-4 logarithmic barrel shifter naturally offers less number of stages compared to Radix-2. The proposed Radix-4 QSN architecture supports various code rates and all sizes of sub matrices. Moreover, a novel Radix-4 signal generator is proposed which is particularly an essential element for reconfigurable LDPC decoders. The synthesis, placement and routing ( $\mathrm{P} \& \mathrm{R}$ ) of the proposed network is performed using TSMC 90-nm standard cell CMOS technology. The implementation results shows that the proposed network outperforms its predecessors by about $11 \%$ and $38 \%$ in terms of area and clock frequency respectively.


Keywords: shift network, LDPC, multi-size barrel shifter, Radix-4
Classification: Electron devices, circuits, and systems

## References

[1] S. Hwang and H. Lee: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21 (2013) 1337.
[2] S. Kim, G. E. Sobelman and H. Lee: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19 (2011) 1099.
[3] D. Oh and K. Parhi: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18 (2010) 85.
[4] X. Peng, X. Zhao, Z. Chen, F. Maehara and S. Goto: IEICE Trans. Fundamentals E93-A (2010) 2551.
[5] X. Chen, S. Lin and V. Akella: IEEE Trans. Circuits Syst. II 57 (2010) 782.
[6] J. R. Hauser: in Handbook of Semiconductor Manufacturing Technology, ed. R. Doering and Y. Nishi (CRC Press, Bosa Roca, 2007) 2nd ed. 1-21.

## 1 Introduction

Low-density parity-check (LDPC) codes, are widely used in modern communication systems. The hardware implementation of LDPC decoder is gener-
ally very complex. The switch network is one of the sources of complexity and critical path within LDPC decoders [1, 2]. QC-LDPC codes simplify the switch network. QC-LDPC code eliminates the random shifting (or permutation) and only cyclic shifting is required [1]. The conventional barrel shifter architectures are not sufficient for modern reconfigurable LDPC decoders because conventional barrel shifters don't support cyclic shifting when the number of inputs is less than the network size or they don't support LDPC decoders with multi-size sub matrices $[3,4,5]$.

Recently, many switch network designs are presented for reconfigurable QC-LDPC decoders. Some of the efficient designs based on Benes network, Banyan network and Barrel shifters are given in [3, 4] and [5], respectively. To the best of our knowledge, the Radix-2 Quasi-cyclic shift network (QSN) design [5] has performed better than all other designs. The QSN architecture has utilized two conventional logarithmic barrel shifters and a merge network to perform the required cyclic shifts for arbitrary number of inputs. This works present a novel idea of designing QSN based on high radix number system.

## 2 Radix-4 OSN architecture

Generally, the Radix-2 logarithmic barrel shifters consist of a base unit of two to one multiplexers. So, Radix-4 approach consists of four to one multiplexers. Radix-2 network can offer maximum two numbers of shifts at each stage (here, $s=$ stage), i.e. $0 \times 2^{s}$ and $1 \times 2^{s}$. So, intuitively the Radix-4 network offers maximum four numbers of shifts at each stage, $0 \times 4^{s}, 1 \times 4^{s}, 2 \times 4^{s}$ and $3 \times 4^{s}$. Hence it offers shift amount of $0 / 1 / 2 / 3,0 / 16 / 32 / 48$ for first and third stage, respectively. Total numbers of stages required for shift value of ' $N$ ' are $\left\lceil\log _{4} N\right\rceil$. It is clear that total numbers of stages are reduced compared to Radix-2 [5], which are the core constituent of complexity and critical path. A $16 \times 16$ Radix- 4 logarithmic barrel shifter using four to one multiplexers is shown in Fig. 1 (a).

Radix-4 QSN requires a base-4 representation of the shift amount. So, each stage requires a 2-bit control value. The most complex part of the barrel shifter architecture shown in Fig. 1 (a) is a fixed wired interconnecting network between the multiplexer stages. Each multiplexer needs four wired inputs. So, 16 multiplexers require total of $64(=16 \times 4)$ interconnecting wires and total of 128 interconnecting wires are required for two stage $16 \times 16$ barrel shifter shown in Fig. 1 (a). All the interconnections are implemented using Eq. (1).

$$
\begin{equation*}
i N u m=\left(N-\left(\operatorname{miNum} \times 4^{\text {stage }}\right)+m N u m\right) \bmod \quad N . \tag{1}
\end{equation*}
$$

where, ' $N$ ' is a network size, ' $i N u m$ ' is a stage input, ' $m i N u m$ ' is an input number of a $4 \times 1$ multiplexer and ' $m N u m$ ' is a multiplexer number. I[0] to $I[15]$ inputs shown in Fig. 1 (a) are $i N u m$ 's or stage inputs for first stage; while for all other stages, outputs of multiplexers from stage ' $s$ - 1 ' are consid-


Fig. 1. (a) Radix-4 barrel shifter. (b) Interconnections for multiplexer $\# 5$ in $2^{n d}$ stage. (c) Proposed Radix-4 QSN architecture
and 3) for Radix-4 as all multiplexers require total of 4 inputs. ' $m N u m$ ' or multiplexer number is shown inside each multiplexer in Fig. 1 (a). Fig. 1 (b) shows that how multiplexer \#5 is connected in stage \# $1\left(2^{\text {nd }}\right.$ stage) using Eq. (1).

Fig. 1 (a) shows two types of interconnecting wires, one with arrow head and others without arrow head. All the wires with the arrow head in upward direction are actually shifting the signals 'up' from their original input positions. For instance, first stage shifts the input signals by a maximum value of three. Thus, the last three signals are shifted in upward direction. The rest of the input signals can be shifted downward with respect to their original input positions. But when the number of inputs (I[0] to $I[n-1]$ ) to the network are less than network size $N S(n<N S)$, these upward signals become insignificant as last ' $N S-n$ ' inputs are no longer present. The author

Table I. Total multiplexers required for Radix-4 QSN

| Stage | Mux | $N>3\left(4^{s-1}\right)$ | $3\left(4^{s-1}\right) \geq N>2\left(4^{s-1}\right)$ | $2\left(4^{s-1}\right) \geq N>0$ |
| :---: | :---: | :---: | :---: | :---: |
| s-1 | $4-$ to -1 | $N-3 \times 4^{s-1}$ | - | - |
|  | $3-$ to -1 | $4^{s-1}$ | $N-2 \times 4^{s-1}$ | - |
|  | $2-$ to -1 | $4^{s-1}$ | $4^{s-1}$ | $N-4^{s-1}$ |
|  | $1-$ to -1 | $4^{s-1}$ | $4^{s-1}$ | $4^{s-1}$ |
| zero to s-2 | $4-$ to -1 |  | $\sum_{i=0}^{s-2}\left(N-3 \times 4^{i}\right)$ |  |
|  | $3-$ to -1 | $\sum_{i=0}^{s-2}\left(4^{i}\right)$ |  |  |
|  | $2-$ to -1 |  |  |  |
|  | $1-$ to -1 |  |  |  |
| $1-1$ to $-1 u x=$ eliminated multiplexer, $N=$ network size and $s=$ stage. |  |  |  |  |

of QSN [5] solved the problem by using two barrel shifters instead of one and a merge stage to combine the outputs of two networks. The proposed architecture for Radix-4 QSN is shown in Fig. 1 (c). Direct network takes inputs from $I[0]$ to $I[15]$; while reverse network takes inputs in reverse order, i.e. from $I[15]$ to $I[0]$. Merge control signals are shown as $\mathrm{m}[\mathrm{i}]$. All the upward signals are eliminated in Fig. 1 (c) because now it is the responsibility of the second barrel shifter (or reverse network) to provide all the upward shifting signals to the merge network. The control value of direct network is ' $c$ ' (cyclic shift); while the control value for reverse network is ' $r$ ' (difference of number of inputs and cyclic shift amount). A merge network is used to merge the signals from both the networks, as shown in Fig. 1 (c). After the elimination of upward directed signals, first $4^{s}$ multiplexers in each stage are completely eliminated. Furthermore, second $4^{s}$ multiplexers turn into two to one multiplexers, while next $4^{s}$ multiplexers turn into three to one multiplexes. The rest are four to one multiplexers. These results provide a significant area reduction specifically, when the numbers of stages are large. The total number of multiplexers required for Radix-4 QSN are calculated using the Table I. The proposed complexity reduction method not only reduces number of multiplexers but also reduces the interconnecting wire required between stages. The interconnecting wires between stages are calculated using Eq. (2).

$$
\begin{equation*}
\text { Total Wires }=(4 \times N)-\left(1 \times 4^{\text {stage }}+2 \times 4^{\text {stage }}+3 \times 4^{\text {stage }}\right) \tag{2}
\end{equation*}
$$

For $16 \times 16$ Radix-4 QSN network shown in Fig. 1 (c), first stage requires 58 $(=64-6)$ wires, while second stage requires $40(=64-24)$ wires. Hence, total of $196(=98 \times 2)$ interconnecting wires are required for both 'direct' and 'reverse' network. While $16 \times 16$ Radix-2 QSN requires $226(=113 \times 2)$ wires. Thus, $16 \times 16$ Radix- 4 QSN network shows $13 \%$ saving in terms of interconnection complexity. The improvement in terms of interconnection complexity increases with the increase in network size.

## 3 Radix-4 signal generator

Signal generator consists of two parts; a merge stage signal generator and a reverse cyclic shift generator. Reverse cyclic shift is the difference of network size ' $N$ ' and cyclic shift ' $c$ ' value $(N-c)$. A subtractor generates a reverse cyclic shift. Merge stage selects a signal from direct or reverse network and routes a proper signal to the output, as shown in Fig. 1 (c). Merge network selects the output of reverse network for all upward directed signals and direct


Fig. 2. (a) Merge control signals for $16 \times 16$ Radix- 4 QSN. (b) Radix-4 barrel shifter setup for signal generator. (c) Proposed control signal generation algorithm for Radix-4 QSN
network for all downward directed signals. Hence, for a cyclic shift of seven on a $16 \times 16$ Radix- 4 QSN network, the merge network selects first seven signals from the reverse network ( $\mathrm{m}[0]$ to $\mathrm{m}[6]$ ); while rest of the signals ( $\mathrm{m}[7]$ to $\mathrm{m}[15]$ ) are selected from direct network. So, it selects top 'c' (cyclic shift value) signals from reverse network and the rest from direct network. Even if the numbers of inputs are less than the total network size, it still selects top 'c' signals from reverse network. This fact actually simplifies the merge signal generator design compared to [5], because merge signal generator is independent of the sub network size. Thus, the merge control signals are directly the function of cyclic shift value.

A $16 \times 16$ Radix-4 QSN merge stage control signals (m[0] to m[15]) for a cyclic shift values from zero to 15 are shown in Fig. 2 (a). It is clear that the zeros are shifting in a cyclic shifting manner and each shifted zero is
replaced by one. So, the controller for the merge stage can be implemented as a Radix-4 barrel shifter with all inputs equal to zero. Furthermore, all the downward directed interconnecting wires are fed with the value of one, as shown in Fig. 2 (b). The control signal generation algorithm for Radix-4 QSN is shown in Fig. 2 (c), where 'four_to_one_Mux' is a 4-to-1 multiplexer with control value ' $c$ '. Constant ' 1 ' input inside each 4 -to- 1 multiplexer depicts all downward directed signals connected to 'logic 1'. Similar signals are shown in Fig. 2 (b) as 'connected to logic ONE'. Moreover, Eq. (1) is used to make the connections between the multiplexer stages.

## 4 Implementation and comparison results

The proposed Radix-4 QSN design (with 8-bit word length) was modeled in Verilog HDL and synthesized with TSMC 90-nm CMOS technology (All the inputs and outputs were loaded with buffers). The layout was carried out using 9-layer metal technology.

Table II shows implementation and comparison results for proposed Radix-4 QSN architecture. It is clear that Radix-4 QSN performs much better compared to [3,5]. A $96 \times 96$ network is the key requirement for IEEE 802.11n and IEEE 802.16e standard LDPC decoders. Generally, an area scaling factor of $\left(1.414^{2}\right)^{2} \approx 4$ and a frequency scaling factor of $1.414^{2} \approx 2$ is used to convert a $90-\mathrm{nm}$ result to $180-\mathrm{nm}$ result [6]. Thus, scaled area value for $96 \times 96$ network equals $0.1317 \times 4=0.527 \mathrm{~mm}^{2}$, that translates to about $11 \%$ saving in terms of area compared to [5] and $27 \%$ compared to [3]. Scaled frequency value for $96 \times 96$ network equals $650 \div 2=325 \mathrm{MHz}$, which is about $38 \%$ and $70 \%$ higher than [5] and [3], respectively.

## 5 Conclusion

The proposed work describes an efficient Radix-4 QSN architecture for reconfigurable QC-LDPC decoders. This work paves a way for the development and implementation of high radix QSN network. A novel complexity reduction technique is described to reduce a gate count at each stage. Furthermore, a novel signal generator suitable for Radix-4 QSN is also proposed. The proposed design shows a definite performance improvement over its predecessor.

## Acknowledgments

This research was supported by Basic Science Research Program through the NRF funded by the Ministry of Science, ICT, and future planning (2013R1A2A2A01068628).

