

# A novel bit flipping decoder for systematic LDPC codes

# Jinsoo Lim and Dong-Joon Shin<sup>a)</sup>

Department of Electronics and Computer Engineering, Hanyang University, 222, Wangsimni-ro, Seongdong-Gu, Seoul, 04763, Republic of Korea a) djshin@hanyang.ac.kr

**Abstract:** In this letter, a novel bit flipping decoding of systematic LDPC codes is proposed. Unsuccessfully decoded codeword is efficiently redecoded by the candidate information bit flipping (CIBF) decoder using cyclic redundancy check (CRC) information at the end of each iteration. We adopt the CIBF decoder to the LDPC decoding additionally and that makes it possible to reduce the power consumption up to 12.7% because of the reduced average number of iterations and to improve the frame error rate (FER) performance. Based on the hardware cost analysis in the CMOS cell library, the additional hardware cost of the CIBF decoder is negligible compared with the conventional LDPC decoder.

**Keywords:** bit flipping, CRC, low-power LDPC decoder, normalized minsum algorithm

**Classification:** Integrated circuits

#### References

- D. J. MacKay: "Good error correction codes based on very sparse matrices," IEEE Trans. Inf. Theory 45 (1999) 399 (DOI: 10.1109/18.748992).
- [2] K. Zhao, et al.: "Overclocking NAND flash memory I/O link in LDPC-based SSDs," IEEE Trans. Circuits Syst. II, Exp. Briefs 61 (2014) 885 (DOI: 10. 1109/TCSII.2014.2350377).
- [3] Y. Zhang, *et al.*: "A high-throughput multi-rate LDPC decoder for error correction of solid-state drives," SiPS (2015) 1 (DOI: 10.1109/SiPS.2015. 7345006).
- [4] J. Chen and M. P. C. Fossorier: "Near optimum universal belief propagation based decoding of low-density parity check codes," IEEE Trans. Commun. 50 (2002) 406 (DOI: 10.1109/26.990903).
- [5] J. H. Kim, et al.: "Variable-to-check residual belief propagation for LDPC codes," IET Electron. Lett. 45 (2009) 117 (DOI: 10.1049/el:20092505).
- [6] K. Cho and K.-S. Chung: "Conditional termination check min-sum algorithm for efficient LDPC decoders," IEICE Electron. Express 12 (2015) 20150738 (DOI: 10.1587/elex.12.20150738).
- [7] M. Baldi, *et al.*: "On the error detection capability of combined LDPC and CRC codes for space telecommand transmissions," IEEE ISCC (2016) 1058 (DOI: 10.1109/ISCC.2016.7543876).
- [8] Y.-H. Kwon, *et al.*: "A new LDPC decoding algorithm aided by segmented cyclic redundancy checks for magnetic recording channels," IEEE Trans. Magn. **41** (2005) 2318 (DOI: 10.1109/TMAG.2005.851861).
- [9] ETSI: "DVB; Second generation framing structure, channel coding and modulation systems from broadcasting, interactive services, news gathering





and other broadband satellite applications," ETSI EN 302 307 (V1.3.1), 03/2013.

- [10] S. Gounai and T. Ohtsuki: "Lowering error floor of irregular LDPC codes by CRC and OSD algorithm," IEICE Trans. Commun. E89-B (2006) 1 (DOI: 10. 1093/ietcom/e89-b.1.1).
- [11] S. Shu and D. Qu: "Improved perturbation method for decoding of LDPC aided by CRC," Sci. J. Inf. Eng. 4 (2014) 91.
- [12] IEEE Std. 802.16: "IEEE standard for local and metropolitan area network part 16: Air interface for fixed and mobile broadband access systems," 2004.
- [13] M. Jiang, et al.: "Adaptive offset min-sum algorithm for low-density parity check codes," IEEE Commun. Lett. 10 (2006) 483 (DOI: 10.1109/LCOMM. 2006.1638623).
- [14] C. L. Wey, *et al.*: "Algorithms of finding the first two minimum values and their hardware implementation," IEEE Trans. Circuits Syst. I, Reg. Papers 55 (2008) 3430 (DOI: 10.1109/TCSI.2008.924892).

#### 1 Introduction

Low-density parity-check (LDPC) codes [1] are adopted in many communication standards such as DVB-S2, WLAN (802.11n), and WiMAX (802.16e) and improving the performance or convergence rate of the LDPC decoder with minimal additional complexity is of prime interest. Owing to the capacity-approaching performance, the LDPC codes using sum-product (SP) decoding also have been adopted in data-storage systems [2, 3].

The most popular LDPC decoding algorithm is min-sum algorithm (MSA) because of its low computational complexity with slight loss of the coding gain. To reduce the performance loss of MSA, modified versions of MSA have been researched based on normalized MS (NMS) decoding [4], which simply multiplies the check to variable node messages by a scaling factor  $\alpha$  to compensate for overestimated message in comparison to the SP algorithm. Compared with the conventional SP decoder, MS decoder does not require the SNR estimator, which reduces the decoder implementation size and the power consumption. Thus, implementation of efficient LDPC decoders has been actively studied in various ways such as new decoding algorithm [4], smart scheduling scheme [5] and early termination scheme algorithm [6].

In general, the belief propagation (BP)-based LDPC decoder is stopped if the syndrome check is passed or the maximum number of iterations is reached. However, even though the parity check equation (syndrome check) of LDPC code is satisfied, an undetected error may occur if another codeword is obtained by the LDPC decoder, which frequently happens in most finite-length LDPC codes. Thus, cyclic redundancy check (CRC) is included in a frame structure for recent deep space communication, data-storage system, and digital communication standard [7, 8, 9].

In [10], LDPC code concatenated with CRC was proposed using ordered statistical decoding (OSD) algorithm and perturbation method for decoding of LDPC code aided by CRC was proposed [11]. However, the previous work in [10] requires high decoding complexity because the Gaussian elimination of parity-





check matrix is required for OSD algorithm at every BP decoding and specific parameters are demanded to perform the perturbation method for each LDPC code [11].

In this letter, we propose a simple decoding algorithm to reduce the average number of iterations (ANIs) of LDPC codes based on the characteristics of erroneous information bits. In the candidate information bit flipping (CIBF) decoder, candidate information bits (CIBs) are selected by identifying the least reliable information bits and they are exploited to the CIBF decoding at the end of each NMS decoding iteration. The proposed decoding not only reduces the power consumption by increasing the convergence speed of NMS decoding but also it improves the FER performance. Simulation results show that the power consumption of the proposed decoder is reduced in 12.7% although the overall hardware cost is slightly increased. Moreover, it does not require modification of the coding scheme.

# 2 Characteristics of erroneous information bits

In Fig. 1, the histograms of LLR values of the erroneous information bits at the iterations 4 and 6 are shown for the NMS decoding at SNR = 3 dB where  $\hat{\mathbf{k}}_{out}^{(i)}$  and  $\theta_{th}$  denote the output information LLR vector at the *i*-th iteration of NMS decoder and the threshold value for CIBs selection, respectively.

In general, the LLR magnitude of the erroneous bits tends to be decreased as iteration increases. As shown in Fig 1, the information bits in the range of  $\theta_{th}$  are the most unreliable, and if the decoder declares a fail then those bits remain still errors. And the number of them is also very small. Thus, we focus on these characteristics of erroneous information bits. For the simulation, (2304, 1152) IEEE 802.16e LDPC code is used. BPSK modulation and additive white Gaussian noise (AWGN) channel are assumed. The scaling factor of the NMS decoder is set to 0.75. Note that we assume that all-zero codeword is transmitted for Fig. 1.



**Fig. 1.** Histograms of LLR values of the erroneous information bits at the iterations 4 and 6 are shown for the NMS decoding at 3 dB for IEEE 802.16e (2304, 1152) LDPC code.





### 3 Candidate information bit flipping decoding

#### 3.1 System description



Fig. 2. Diagram of the decoder structure: (a) conventional decoding (b) proposed decoding

In this subsection, the CIBF decoding is described. Fig. 2 presents the proposed decoder structure where  $\mathbf{y}_{in}$  is the received LLR vector from the channel,  $\hat{\mathbf{k}}_{out}^{(i)}$  is the output information LLR vector at the *i*-th iteration of NMS decoder,  $\hat{\mathbf{x}}_{out}^{(i)}$  is the decoded information bits at the *i*-th iteration of NMS decoder,  $\hat{\mathbf{x}}_{out\_CIBF}^{(w)}$  is the decoded information bits at the *w*-th iteration of CIBF decoder, and  $\hat{\mathbf{u}}_{out}$  is the final decoded information bits.

In general, the conventional LDPC decoder is composed of normalized MS decoder and CRC calculator. After the NMS decoder is finished using the syndrome check, the CRC calculator is performed to check the undetected error of the decoded frame. We use the CRC more efficiently in the proposed decoding; the CRC calculation has to be performed at the end of each iteration of the NMS decoding to check correctness of the frame (instead of the syndrome check). If CRC fails, the CIBF decoder is initiated. The output information LLR vector  $\hat{\mathbf{k}}_{out}^{(i)}$  is loaded into the CIBF decoder from the NMS decoder. The CIBF decoder is iteratively conducted until CRC succeeds or the given maximum number of CIBF iterations is reached.

### 3.2 Candidate information bit flipping decoding

In this Section, the CIBF decoding is described. If the NMS decoder has a fail, then there exist few erroneous information bits with small LLR magnitude. In the information bits in range of  $\theta_{th}$ , some of them are selected as the CIBs and efficiently re-decoded using CIBF decoder.

The CIBF decoder consists of the sorting and the flipping algorithms. The sorting may increase the CIBF decoding complexity as the information bits increases. Thus, to reduce the complexity of the sorting, most of the reliable information bits are excluded by the threshold  $\theta_{th}$ .

The  $\hat{\mathbf{k}}_{th}^{(i)}$  denotes the unreliable information bits selected in the range of  $\theta_{th}$ . We sort  $\hat{\mathbf{k}}_{th}^{(i)}$  as ascending order based on the LLR magnitude and then first  $N_{CIB}$  information bits are selected as the CIBs. Through extensive simulation,  $\theta_{th} = 1.1$  is proper value for IEEE 802.16e LDPC codes. Note that the size of  $N_{CIB}$  is determined by considering trade-off between the decoding performance and the decoding complexity.





| Algo  | orithm 1 Proposed Decoding Algorithm                                                                                                        |
|-------|---------------------------------------------------------------------------------------------------------------------------------------------|
| Inpu  | <b>it: y</b> <sub>in</sub> : received LLR vector from the channel                                                                           |
| Out   | <b>put:</b> $\hat{\mathbf{k}}_{out}^{(i)}$ : output information LLR vector at the <i>i</i> -th iteration of NMS decode                      |
|       | $\hat{\mathbf{x}}_{out}^{(i)}$ : decoded information bits at the <i>i</i> -th iteration of NMS decoder                                      |
|       | $\mathbf{x}_{k_{out}}^{(i)}$ : binary decision information bits of $\hat{\mathbf{k}}_{out}^{(i)}$ at the CIBF decoder                       |
|       | $\hat{\mathbf{x}}_{out\_CIBF}^{(w)}$ : decoded information bits at the <i>w</i> -th iteration of CIBF decoder                               |
|       | $\hat{\mathbf{u}}_{out}$ : final decoded information bits                                                                                   |
| 1: 1  | for $i = 1$ to $I_{max}^{MS}$ do                                                                                                            |
| 2:    | Perform NMS decoding for $\mathbf{y}_{in}$                                                                                                  |
| 3:    | if CRC succeeds then                                                                                                                        |
| 4:    | get $\hat{\mathbf{u}}_{out} = \hat{\mathbf{x}}_{out}^{(i)}$ and go to the end                                                               |
| 5:    | else                                                                                                                                        |
| 6:    | Sorting $\hat{\mathbf{k}}_{th}^{(i)}$ based on magnitude where $\hat{\mathbf{k}}_{th}^{(i)} =  \hat{\mathbf{k}}_{out}^{(i)}  < \theta_{th}$ |
| 7:    | Select CIBs from $\hat{\mathbf{k}}_{th}^{(i)}$ and make flipping vectors $\mathbf{f}^{(w)}$ of size $n_k$ using CIB                         |
| 8:    | for $w = 1$ to $I_{max}^{CIBF} (= 2^{N_{CIB}} - 1)$ do                                                                                      |
| 9:    | CRC calculation of $\hat{\mathbf{x}}_{out\_CIBF}^{(w)} = \hat{\mathbf{x}}_{k_{out}}^{(i)} + \mathbf{f}^{(w)}$                               |
| 10:   | if CRC succeeds then                                                                                                                        |
| 11:   | get $\hat{\mathbf{u}}_{out} = \hat{\mathbf{x}}_{out\_CIBF}^{(w)}$ and go to the end                                                         |
| 12:   | end if                                                                                                                                      |
| 13:   | end for                                                                                                                                     |
| 14:   | end if                                                                                                                                      |
| 15: 0 | end for                                                                                                                                     |
| 16: ] | Declare decoding failure                                                                                                                    |
| 17: ' | Гhe end                                                                                                                                     |

If CRC fails at each NMS decoding iteration,  $2^{N_{CIB}} - 1$  flipping vectors  $\mathbf{f}^{(w)}$ ,  $1 \le w \le 2^{N_{CIB}} - 1$ , are generated to flip the CIBs. In detail, the CIBF decoding adds each of flipping vectors  $\mathbf{f}^{(w)}$  to the decoded information vector  $\hat{\mathbf{x}}_{k_{out}}^{(i)}$  and checks the CRC iteratively until the CRC succeeds, as described in Algorithm 1.

#### 4 Experimental results

For simulation, (2304, 1152) quasi-cyclic (QC) LDPC code of the IEEE 802.16e standard [12] and the CRC-16 code with the generator polynomial  $X^{16} + X^{15} + X^2 + 1$  are used. AWGN channel and BPSK modulation are assumed. The NMS decoder with 6-bit quantization and the scaling factor 0.75 is used. The CIBF scheme with  $N_{CIB} = 4$  is used. The maximum number of iterations for the NMS decoding  $I_{max}^{MS}$  and CIBF decoding  $I_{max}^{CIBF}$  are set to 30 and 15 (=  $2^4 - 1$ ), respectively. For comparison, two other decoding algorithms are applied, namely, the OSD algorithm [10] (maximum number of iterations is 20 for the decoding, 10 for the OSD decoding and the number of target bits *L* is 4) and the perturbation method [11] (maximum number of target bits *L* is 4 and the perturbation noise variance is 0.3).





#### 4.1 Decoding performance and power consumption analysis



**Fig. 3.** Comparison of FER for IEEE 802.16e (2304, 1152) LDPC code with CRC (code rate = 0.493).

Fig. 3 shows the FER performances of the proposed decoding and the conventional decoding. The proposed decoding with  $N_{CIB} = 4$  achieves 0.13 dB gain approximately at FER =  $2 \times 10^{-7}$ , compared with the conventional decoding. Although the proposed decoding shows the performance degradation at low-SNR region due to the rate loss, it outperforms the conventional decoding (w/CRC or w/o CRC) by lowering error floors at high-SNR region. The OSD algorithm [10] and the perturbation method [11] have shown the same FER performance, but the ANIs of the proposed decoding is the lowest as shown in Table II.

|           | Conv. dec.          | [10]                                       | [11]                                       | Prop. dec.                                                           |
|-----------|---------------------|--------------------------------------------|--------------------------------------------|----------------------------------------------------------------------|
| Compar.   | $(2d_c-3)\cdot n_m$ | $(2d_c - 3) \cdot n_m + L \cdot (n_k - L)$ | $(2d_c - 3) \cdot n_m + L \cdot (n_k - L)$ | $(2d_c - 3) \cdot n_m + N_{CIB} \cdot (n_k^{\theta_{th}} - N_{CIB})$ |
| Addition  | $(d_v+1)\cdot n_k$  | $(d_v + 1) \cdot n_k$                      | $(d_v+1)\cdot n_k+L$                       | $(d_v + 1) \cdot n_k$                                                |
| Multipli. | n <sub>m</sub>      | $n_m$                                      | $n_m$                                      | $n_m$                                                                |
| Subtrac.  | -                   | -                                          | -                                          | $n_k$                                                                |
| Mod2 add. | -                   | $n_c \times {n_k}^2$                       | _                                          | $I_{avg}^{CIBF}$                                                     |
| CRC Cal.  | 1                   | 1                                          | 1                                          | $1 + I_{avg}^{CIBF}$                                                 |

 Table I. Computational complexity comparison per one decoding iteration

Table I shows the computational complexity comparison per one iteration. Since the CIBF decoding is performed at the end of each iteration of NMS decoding, the computational complexity of the proposed decoding consists of the NMS decoding and the CIBF decoding. Here,  $d_v$  and  $d_c$  are the average degree of variable nodes (VNs) and check nodes (CNs), respectively. Suppose that codeword





bits  $n_c$  comprises  $n_k$  information bits and  $n_m = n_c - n_k$  check bits, where information bits consist of  $n_d$  data bits and  $n_r = n_k - n_d$  CRC bits. Note that CRC is applied to only data bits.

For conventional decoding per iteration,  $(2d_c - 3)$  comparisons are required for the selection of two minimum values with degree  $d_c$  at each CN and  $(d_v + 1)$ additions are required for the summation with degree  $d_v$  at each VN [13]. Also,  $n_m$  multiplications for normalizing check to variable LLR value and one CRC calculation are needed. Both the OSD algorithm  $(n_c \times n_k^2 \text{ Modulo2} additions for$ the Gaussian elimination) [10] and perturbation method (*L* additions for the perturbation noise) [11] require  $L \cdot (n_k - L)$  comparisons per iteration for sorting of the unreliable information bits.

For CIBF decoding at the end of each NMS decoding iteration,  $n_k$  subtractions for thresholding  $\theta_{th}$  and  $N_{CIB} \cdot (n_k^{\theta_{th}} - N_{CIB})$  comparisons for sorting of the unreliable information bits are performed, where  $n_k^{\theta_{th}}$  is the number of information bits determined by  $\theta_{th}$ . The CIBF decoding per one iteration requires  $n_k$  modulo2 additions for the bit flipping and  $I_{avg}^{CIBF}$  CRC calculations where  $I_{avg}^{CIBF}$  denotes the average number of CIBF decoding iterations. Although the computation complexity of the proposed decoding is slightly increased due to the CIBF decoding compared with conventional decoding, the ANIs of the proposed decoding is substantially reduced as shown in Table II.

| $E_b/N_0$ (dB) | Conv. dec. | [10]  | [11]  | Prop. dec. |  |  |  |  |
|----------------|------------|-------|-------|------------|--|--|--|--|
| 1.0            | 27.46      | 29.27 | 28.15 | 27.03      |  |  |  |  |
| 1.5            | 15.98      | 16.14 | 16.02 | 14.81      |  |  |  |  |
| 2.0            | 9.41       | 9.42  | 9.41  | 8.50       |  |  |  |  |
| 2.5            | 6.71       | 6.75  | 6.72  | 5.89       |  |  |  |  |
| 3.0            | 5.11       | 5.13  | 5.12  | 4.46       |  |  |  |  |

**Table II.** ANIs reduction obtained by the proposed algorithm with IEEE 802.16e (2304, 1152) LDPC code ( $N_{CIB} = 4, \theta_{th} = 1.1$ )

Table II presents the simulation results about the ANIs of the conventional NMS decoding and the proposed decoding. For fair comparison, the conventional NMS decoding iteration is also terminated when CRC succeeds. The ANIs of the CIBF decoding is included in counting the ANIs of the proposed decoding in Table II. For example, at  $E_b/N_0 = 3.0$  dB, the ANIs of the proposed decoding 4.46 is composed of 4.33 from NMS decoding and 0.13 that is converted from the ANIs of CIBF decoding. The convergence speed of the proposed decoding improved as SNR increases because the CIBF decoding success rate is increased due to the small number of erroneous information bits at high-SNR region compared with the large number of erroneous informations bit at low-SNR region. Thus, the power consumption is reduced in 12.7% compared with the conventional NMS decoding at SNR = 3 dB.





#### 5 Hardware cost analysis

In this section, the hardware cost for implementing the proposed algorithm is addressed. Implementing the computation module of the CIBF decoding is straightforward when a conventional LDPC decoder has the CRC calculator, the minimum operators at CNs [14].

Selection of the information bits with magnitudes that are lower than the threshold  $\theta_{th}$  and sorting of  $\hat{\mathbf{k}}_{th}^{(i)}$  based on magnitude of LLR are implemented by using the subtractors and the minimum operators, respectively. The information bit flipping for the CIBs is implemented by the modulo2 adders. To verify the hardware cost of the proposed decoding, CN, VN unit, modulo2 adder unit, and subtractor unit for the IEEE 802.16e decoder were synthesized using a 0.18-µm CMOS cell library.

Table III. Area comparison

|                               | CN unit | VN unit | Subtractor | Modulo2 adder |
|-------------------------------|---------|---------|------------|---------------|
| Area<br>(in NAND2 gate count) | 2728.25 | 241.94  | 30.23      | 4.21          |

Table III presents the area comparison between CN, VN unit and subtractor, modulo2 adder. For fair comparison, we compare the area per each unit. The area of subtractors is determined by the number of VNs of the LDPC decoder. Although the overall hardware cost is slightly increased, the CIBF decoder requires only 1.26% hardware cost compared with the CN unit. Thus, the hardware cost of CIBF decoder is negligible.

# 6 Conclusion

In this letter, a novel bit flipping decoding for systematic LDPC decoder is presented. If there is a decoding fail in the NMS decoder, the CIBF decoder tries to re-decode this codeword using CRC at the end of each NMS decoding iteration. At a slightly increased hardware cost, the CIBF decoder can decode the most of erroneous codewords of the NMS decoder successfully because it estimates the most unreliable bits and efficiently re-decodes them using bit flipping and CRC. Thus, the CIBF decoder makes it possible to reduce the average number of iterations of the decoder, and finally the power consumption related to the average number of iterations is reduced up to 12.7%. Moreover, the proposed decoding can be applied easily to the existing communication standards and the data-storage systems.

# Acknowledgments

This work was supported by the research fund of Signal Intelligence Research Center supervised by Defense Acquisition Program Administration and Agency for Defense Development of Korea.

