# Conditional termination check min-sum algorithm for efficient LDPC decoders

## Keol Cho and Ki-Seok Chung<sup>a)</sup>

LETTER

Department of Electronics and Computer Engineering, Hanyang University, 222, Wangsimni-ro, Seongdong-gu, Seoul, 133–791, Rep. of Korea a) kchung@hanyang.ac.kr

**Abstract:** Conditional termination check min-sum algorithm (MSA) using the difference of the first two minima is proposed for faster decoding speed and lower power consumption of low-density parity-check (LDPC) code decoders. Judging from the size of the difference in LDPC decoding scheduling, the proposed method dynamically decides whether the termination checking steps will be skipped or not. The simulation results show that the decoding speed is improved up to 7%, and the power consumption is reduced by up to 16.43% without any loss of error correcting performance. Also, the additional hardware cost of the proposed method is negligible compared to conventional LDPC decoders.

Keywords: low-power LDPC decoder, min-sum algorithm

**Classification:** Integrated circuits

#### References

- Y. M. Jung, Y. H. Jung, S. Lee and J. Kim: IEEE Trans. Consum. Electron. 59 (2013) 467. DOI:10.1109/TCE.2013.6626226
- [2] J. Kim and W. Sung: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 22 (2014) 1004. DOI:10.1109/TVLSI.2013.2265314
- [3] T. Tokutomi, S. Tanakamaru and K. Takeuchi: IEICE Technical Report 113 [419] 23.
- [4] J. Zhang, Y. Wang, M. Fossorier and J. S. Yedidia: IEEE Int. Symp. on Information Theory (2005) 454. DOI:10.1109/ISIT.2005.1523375
- [5] J. H. Kim, M. Y. Nam and H. Y. Song: Electron. Lett. 45 (2009) 117. DOI: 10.1049/el:20092505
- [6] J. Chen and M. Fossorier: IEEE Trans. Commun. 50 (2002) 406. DOI:10.1109/ 26.990903
- [7] A. A. Emran, M. Elsabrouty, O. Muta and H. Furukawa: IEICE Technical Report 115 [37] (2015) 7.
- [8] X. Wu, Y. Song, M. Jiang and C. Zhao: IEEE Commun. Lett. 14 (2010) 667. DOI:10.1109/LCOMM.2010.07.100508
- [9] B. Xiang, R. Shen, A. Pan, D. Bao and X. Zeng: IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 18 (2010) 1447. DOI:10.1109/TVLSI.2009.2025169
- [10] J. Y. Park and K. S. Chung: EURASIP J. Wirel. Commun. Netw. 2011 (2011)
  48. DOI:10.1186/1687-1499-2011-48

© IEICE 2015 DOI: 10.1587/elex.12.20150738 Received September 2, 2015 Accented September 15, 2015

Received September 2, 2015 Accepted September 15, 2015 Publicized November 6, 2015 Copyedited December 25, 2015



#### 1 Introduction

Low-density parity-check (LDPC) codes have been widely adopted in various applications from communication systems to storage systems [1, 2, 3]. These days, applications commonly require higher throughput with lower power consumption. Thus, implementing efficient LDPC decoders has been actively studied in various ways such as smart scheduling schemes [4, 5] and new decoding algorithms [6, 7].

The most popular LDPC decoding algorithm is min-sum algorithm (MSA) because of its low computational complexity with slight loss of the coding gain. Furthermore, even a simple modification of MSA can reduce the loss of the coding gain, so modified versions [6, 8] of MSA have been widely adopted in LDPC decoders. Moreover, dynamic approaches of MSA have been discussed in several studies for better performance and throughput [8, 10]. In [8], the scaling and offset factors were adjusted adaptively, and [10] utilized a look-up table of the minimum iteration number for adaptive scheduling. However, conventional works were based on estimation of signal-to-noise ratio (SNR) which required either complex computation or an external estimator. In this Letter, a conditional termination check scheduling approach of MSA without SNR estimation is proposed. The proposed scheme is based on the difference of the first two minima from variable to check (V2C) node messages, which was devised in [2, 9] to compress check to variable (C2V) node messages to reduce interconnection complexity and memory usage of LDPC decoders.

## 2 C2V message compaction

Let  $L_{i \to j}^{(l)}$  and  $L_{j \to i}^{(l)}$  denote the log-likelihood ratio (LLR) information from variable node (VN) *i* to check node (CN) *j* and that from CN *j* to VN *i*, respectively at the *l*th iteration. The set of VNs neighboring to the CN *j* is denoted as  $V_j$ . In a modified MSA, the LLR values of C2V messages are defined as:

$$L_{j \to i}^{(l)} = k \times \left( \prod_{i' \in V_j \setminus i} \operatorname{sign}(L_{i' \to j}^{(l)}) \right) \times \min_{i' \in V_j \setminus i} |L_{i' \to j}^{(l)}|$$
(1)

where k is a scaling factor, and  $V_j \setminus i$  represents the subset of VNs excluding the *i*-th VN. In a hardware implementation, the sign of Eq. (1) is simply calculated with exclusive-OR gates, and the minimum finding is usually processed by (2).

$$\min_{i' \in V_j \setminus i} |L_{i' \to j}^{(l)}| = \begin{cases} \min 2, & \text{if index of } \min 1 = i \\ \min 1, & \text{otherwise} \end{cases}$$
(2)

where min 1 and min 2 are the first and the second minimum, respectively. Therefore, C2V messages are formatted with four components: {signs, index of min 1, min 1, min 2}. To reduce memory usage of an LDPC decoder, [9] reduced the size of a C2V message by sending the difference of min 1 and min 2,  $\Delta$ min, instead of min 2, and saved the memory usage by 5.64% with negligible performance loss.





## 3 Conditional termination check min-sum

## 3.1 Analysis of Δmin

By varying the SNR, the distribution of  $\Delta$ min values in the C2V messages in each iteration was explored with the (9216, 4608) LDPC codes for China Multimedia Mobile Broadcasting (CMMB). The maximum iteration count was set to 30, and the scaling factor *k* was set to 0.75 while AWGN channel with various SNRs was chosen. Fig. 1 shows experimental results of the  $\Delta$ min values from 10,000 frame data. In each iteration, the average of all  $\Delta$ min values from 4608 CNs was calculated. Clearly in Fig. 1, the  $\Delta$ min values are bounded when the decoding is unsuccessful, while the values increase as the iterative decoding is getting close to a successful completion. Moreover, as shown in Fig. 2, the  $\Delta$ min values are closely related to the iteration count of the LDPC decoder: decoding in a low SNR region, which can be recognized by low  $\Delta$ min values, leads to a high iteration number, while decoding under a good channel condition, which can be identified by high  $\Delta$ min values, successfully terminates with a low iteration count.



Fig. 1. Average of 4608  $\Delta$ min values in every iteration with various SNRs



Fig. 2. Overall average of ∆min values and average of iteration numbers with various SNRs





## 3.2 Conditional termination check MSA

Based on the observations regarding the  $\Delta \min$  values discussed in the previous section, *delta-minima* which is a refined value of the  $\Delta \min$  values is used for dynamic scheduling of MSA. The refinement of  $\Delta \min$  can be implemented in various ways, such as straightforward summing of the  $\Delta \min$  values, saturation checking of the summation, or averaging of the  $\Delta \min$  values. In this Letter, as discussed above, the average of the  $\Delta \min$  values was chosen as *delta-minima*. The overall proposed decoding process is described in Algorithm 1.

Algorithm 1 Conditional termination check MSA based on delta-minima

1: Initialize VNs,  $L_{i \rightarrow j}$ , with initial LLRs,  $F_i$ , derived from received vector  $y_i$ 

```
L_{i \to j} = F_i = -2y_i/\delta^2
```

- 2: for l from 1 to max\_iteration do
- 3: {Check node update and construct C2V message}  $L_{j \to i}^{(l)} = k \times \left( \prod_{i' \in V_j \setminus i} \operatorname{sign}(L_{i' \to j}^{(l)}) \right) \times \min_{i' \in V_j \setminus i} |L_{i' \to j}^{(l)}|$   $C2V_{msg} = \{signs, \min 1_{index}, \min 1, \Delta \min\}$
- 4: {Variable node update with *delta-minima* computation}

| 5: $L_{i \to j} = F_i + \sum_{j' \in C} C_{ij}$ | $_{(i)\setminus j}L_{j'\to i}$ |
|-------------------------------------------------|--------------------------------|
|-------------------------------------------------|--------------------------------|

6:  $z_i = F_i + \sum_{j \in C(i)} L_{j \to i}$ 

- 7: compute *delta-minima* using  $\Delta$ mins of  $C2V_{msg}$  (averaging)
- 8: **if** (delta-minima <  $\Delta \min_{bound}$  and  $l < max_{iteration}$ )
- 9: l = l + 1, 10: Go to line 3;
- 11: else

12:

14:

15:

- $\hat{\boldsymbol{c}} = \{\widehat{c_1}, \widehat{c_2}, \dots, \widehat{c_N}\}, \ \widehat{c_i} = \begin{cases} 0, & z_i \ge 0\\ 1, & z_i < 0 \end{cases}$
- 13: **if**  $(\boldsymbol{H} \cdot \hat{\mathbf{c}}^T = 0 \text{ or } l = max\_iteration)$ 
  - Output  $\hat{c}$  as decoded bits
  - else
- 16: l = l + 1
- 17: Go to line 3;

In our algorithm, *delta-minima* is calculated from C2V messages while VN update operation is being processed. When *delta-minima* is lower than  $\Delta \min_{bound}$  (line 8 in Algorithm 1), which is determined by extensive simulations, the decoder skips the termination check (line 12 and line 13 in Algorithm 1) because the LDPC decoding is very unlikely to be successful at the end of the iteration. The proposed approach not only reduces the power consumption but also enhances the decoding speed by skipping unnecessary termination check operations.

## 4 Experimental results

## 4.1 Skipped termination check analysis

Performance results of the proposed algorithm with the number of skipped termination checks are presented in this section. The proposed algorithm was applied to decode CMMB codes described in 'Analysis of  $\Delta$ min' section. For the simulation, we used the average of the  $\Delta$ min values as *delta-minima*, and  $\Delta$ min<sub>bound</sub> was chosen based on *delta-minima* of SNR 1.4 dB.





| Table I. | The number of skipped termine | nation checks in various SNR |
|----------|-------------------------------|------------------------------|
|----------|-------------------------------|------------------------------|

| SNR (dB)          | 0.9 | 1.2 | 1.5  | 1.8  | 2.1 | 2.4 | 2.7 | 3   |
|-------------------|-----|-----|------|------|-----|-----|-----|-----|
| Average iteration | 30  | 30  | 29.6 | 15.3 | 8.8 | 5   | 4.2 | 3.8 |
| Number of skip    | 29  | 29  | 28.1 | 12.2 | 5.3 | 3.2 | 2   | 1.3 |

Table I summarizes the simulation results. When the SNR was under 1.5 dB, every termination check except the last iteration (line 13 of Algorithm 1) was skipped, and almost the half of the iterations were skipped when the SNR was over 1.8 dB. In [10], the authors reported that the termination check would account for 17% of the power consumption of the total LDPC decoder in each iteration, and our proposed method reduced power consumption of the LDPC decoder by up to 16.43% under bad channel conditions. The decoding time was also reduced by 7% in case of low SNRs.



Fig. 3. BER performance comparison (AWGN, max iteration: 30, 1,000,000 frames)

Fig. 3 shows the performance difference when the proposed algorithm and the normalized MSA are employed as a part of the CMMB LDPC decoder. We note that the proposed algorithm does not cause any harm to error correction capability.

### 4.2 Hardware cost analysis

In this section, the hardware cost for implementing the proposed algorithm is addressed. Calculating *delta-minima* by averaging of all 4608  $\Delta$ min values from all CNs requires excessive hardware cost and time. To reduce the hardware cost of *delta-minima* computation, we get  $\Delta$ min values from randomly chosen 4, 8, 16, 32, and 64 CNs, and calculate the *delta-minima*. In our exploration, the average of more than 8 sampled  $\Delta$ min values provides reasonable *delta-minima* and  $\Delta$ min<sub>bound</sub> for the proposed decoding scheme. Due to space limitation, only 32 sampled *delta-minima* values are depicted in Fig. 4.

Implementing the computation module of the sampled *delta-minima* is straightforward when an LDPC decoder is designed with a partly parallel architecture. The number of the CN and VN units in such a decoder ranges from 8 to 128 or more







Fig. 4. Delta-minima based on 32 samples of  $\Delta min$  values

[2, 9]. The  $\Delta$ min values of multiple CN units are added to compute the *delta-minima*, and its hardware cost is negligible considering that a partly parallel decoder requires multiple CN and VN units. To verify the hardware cost of the proposed algorithm, CN unit, VN unit, and *delta-minima* calculation module for the CMMB decoder were synthesized using a 0.18-µm CMOS cell library.

| Table II. Anda companison     |         |         |         |         |  |
|-------------------------------|---------|---------|---------|---------|--|
|                               | CN unit | VN unit | D_min16 | D_min32 |  |
| Area<br>(in NAND2 gate count) | 2623.48 | 221.62  | 94.23   | 200.57  |  |

Table II. Area comparison

In Table II, D\_min16 and D\_min32 are 16 and 32 sampled delta-minima modules for 16 level and 32 level parallelized LDPC decoders, respectively. The area of D\_min32 is similar to that of the VN unit. However, D\_min32 can be said to take small area considering that 32 VN units are required for a 32-parallel LDPC decoder. Moreover, the SNR estimator in a conventional LDPC decoder [10] takes up 12.1% of the total decoder area, the hardware cost of the *delta-minima* module is relatively low because the proposed algorithm does not need the estimator.

## 5 Conclusion

A novel min-sum based LDPC decoding algorithm based on *delta-minima* was proposed. Using *delta-minima*, the LDPC decoder dynamically decides whether the termination check will be skipped or not. Simulation and synthesis results show that the proposed method improves the speed and reduces the power consumption without any performance loss, and its hardware cost is negligible compared to the LDPC decoders utilizing SNR estimators.

## Acknowledgments

This research was supported by the MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2015-H8501-15-1005) supervised by the IITP (Institute for Information & communications Technology Promotion).

EiC