

# An implementation of belief propagation decoder with combinational logic reduced for polar codes

Yongli Yan<sup>1,2a)</sup>, Xuanxuan Zhang<sup>1,2</sup>, and Bin Wu<sup>1</sup>

**Abstract** In this letter, a combinational logic reduced belief propagation (BP) decoder for polar codes is designed in 55 nm CMOS technology. The authors first introduced the BP decoding algorithm for polar codes, and then analyzed the architectures of the conventional BP decoders. Finally, the hardware implementation with the proposed multiplexed process element architecture is presented. Synthesis results show that the consumption of hardware resources is reduced by 36%. The architecture and circuit techniques reduce the power to 398 mW for an energy efficiency of 292 pJ/b. The throughput is improved to 4.36 Gbps by applying the G-matrix early stopping criteria.

Keywords: polar codes, belief-propagation, scaled min-sum, muxed-pe Classification: Integrated circuits

## 1. Introduction

Polar codes, which were invented by E. Arıkan, have been proven to be able to achieve the capacity of binary-input discrete memoryless channels (B-DMCs) with low encoding and decoding complexity [1]. The basic decoding algorithm of polar codes is the successive-cancellation (SC) decoding algorithm that was proposed by E. Arıkan [1]. In addition to the SC-based decoders [2, 3, 4, 5, 6, 7], the belief-propagation (BP) decoding algorithm was also applied by some researchers to polar codes [8, 9, 10, 11, 12]. The SC-based decoders of polar codes are sequential in nature, which leads them to suffer from high decoding latency. Compared with SC, BP has a parallel data processing architecture, which offers some improvements to the decoding throughput [13, 14, 15, 16, 17]. However, a major problem with BP is that it consumes too much hardware resources in parallel computing.

Several BP decoders have been designed for polar codes in [15], [16] and [17], with the aim to reduce the hardware complexity and decoding latency. In [15], an overlapped-scheduling approach at iteration level was proposed to reduce the overall decoding latency. Since the iteration-level overlapping schedule has high hardware utilization, its hardware complexity is relatively high, although the decoding latency can be effectively reduced.

<sup>1</sup>Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China

<sup>2</sup>University of Chinese Academy of Sciences, Beijing 100049, China

DOI: 10.1587/elex.16.20190382 Received June 13, 2019 Accepted July 1, 2019 Publicized July 12, 2019 Copyedited August 10, 2019 To further reduce the decoding latency, a double-column architecture was designed in [16] for a better utilization of a clock period. In [16], the operations of two adjacent columns are merged in one clock cycle, which means that it sacrifices the critical path in exchange for decoding latency. In [17], a stage-combined BP decoding algorithm was proposed to reduce the memory usage, where two adjacent columns are combined into one so that intermediate messages do not need to be stored. However, none of the above papers consider the fact that the message propagation in BP is unidirectional to allow only one of left-toright or right-to-left messages to be propagated, which means that some hardware resources can be reused in the time domain.

In this letter, a combinational logic reduced belief propagation decoder for polar codes is presented. The conventional factor graph of polar BP decoding is based on bidirectional process element (PE). The PE updates both left-to-right and right-to-left messages, whereas the message propagation is unidirectional to allow only one of leftto-right or right-to-left messages to be propagated. Therefore, some hardware resources can be reused in the time domain. Based on this fact, a multiplexed process element (muxed-pe) architecture is introduced to reduce the consumption of combinational logic. Based on synthesis in 55 nm CMOS technology, the muxed-pe decoder provides a decoding throughput of 1.37 Gbps at 400 MHz using 15 iterations at the worst-case 1.08 volts and 125°C. Compared with the conventional scaled min-sum (SMS) decoder [13], the proposed muxed-pe decoder reduces the consumption of hardware resources by 36%.

The remainder of the letter is organized as follows. Section 2 briefly reviews the principle of the belief propagation decoding algorithm for polar codes. Section 3 proposes a new muxed-pe architecture for BP-based polar decoders. Section 4 illustrates performance analysis and comparisons. Finally, Section 5 concludes the letter.

# 2. Review of belief propagation polar decoders

As a class of linear block error correcting code, polar codes can be identified as a parameter vector (N, K), where N is the block length, K is the information size.

The polar code BP decoder was proposed in [8] based on the factor graph representation. Fig. 1 shows the basic process elements (PEs) of the polar code BP decoder. For a polar code with a block length of  $N = 2^n$ , its factor graph contains *n* stages, and each stage has N/2 PEs [18]. Here,

a) yanyongli@ime.ac.cn

$$v(i,j) \bigoplus_{\substack{L_{i,j} \\ L_{i,j} \\ R_{i+2^{n-j},j} \\ L_{i+2^{n-j},j} \\ L_{i+2^{n-j},j} \\ L_{i+2^{n-j},j+1} \\ v(i+2^{n-j},j+1) \\ v(i+2^{n-j},j+1) \\ k_{i+2^{n-j},j+1} \\ v(i+2^{n-j},j+1) \\ k_{i+2^{n-j},j+1} \\ k_{i+2^{n-j},j+1} \\ v(i+2^{n-j},j+1) \\ k_{i+2^{n-j},j+1} \\ k_$$

Fig. 1. Basic process elements of the BP decoder.

 $R_{i,j}$  and  $L_{i,j}$  are the log-likelihood ratio (LLR) of the left-toright message and the right-to-left message of the node v(i, j), respectively.

An example of BP factor graph for the case of (8, 4) polar code is given in Fig. 2. Here the factor graph has a total of  $n = log_2 8 = 3$  stages. Each stage consists of N/2 = 4 process elements that are used to update the propagated messages.



Fig. 2. Factor graph of (8, 4) polar codes.

In general, the BP algorithm can be simplified using the min-sum (MS) approximation [19, 20, 21, 22], which greatly reduces the complexity of the hardware implementation, but incurs a degradation in decoding performance. In [13], a scaled min-sum (SMS) belief-propagation decoder was proposed to perform a linear message updating process without losing error performance. For a SMS-BP decoder, the computational process includes two steps. The first step delivers the LLRs in the factor graph from the right-most nodes to the left-most nodes. For node *i* of *j* layer at *t*-th iteration, the update rules provided in [13] are

$$\begin{cases} L_{i,j}^{t} = \lambda \times g(L_{i,j+1}^{t-1}, R_{i+2^{n-j},j}^{t} + L_{i+2^{n-j},j+1}^{t-1}) \\ L_{i+2^{n-j},j}^{t} = \lambda \times g(L_{i,j+1}^{t-1}, R_{i,j}^{t}) + L_{i+2^{n-j},j+1}^{t-1} \end{cases}$$
(1)

Similarly, the second step delivers the LLRs in the factor graph from the left-most nodes to the right-most nodes. The update rules are

$$\begin{cases} R_{i,j+1}^{t} = \lambda \times g(R_{i,j}^{t}, R_{i+2^{n-j},j}^{t} + L_{i+2^{n-j},j+1}^{t-1}) \\ R_{i+2^{n-j},j+1}^{t} = \lambda \times g(R_{i,j}^{t}, L_{i,j+1}^{t-1}) + R_{i+2^{n-j},j}^{t} \end{cases}$$
(2)

where  $g(\alpha, \beta) = sign(\alpha)sign(\beta) \min(|\alpha|, |\beta|)$  is the propagation function that updates the LLR messages,  $\lambda$  is the scale factor that reduces the approximation error.

After updating the left-to-right LLRs of all nodes, one iteration is completed. In general, a sufficiently large number of iterations can provide a good error performance. However, experience shows that when the channel has a low noise level (high SNR scenario), the BP decoders can always decode the valid output without achieving the maximum number of iterations [23, 24, 25, 26]. In order to reduce the redundant iterations of BP-based polar decoders, some efforts have been made in [27]. Inspired by the H-matrix of low-density parity-check (LDPC) decoders [28], G-matrix early stopping criterion was proposed in [27]. The generator matrix of polar codes plays a key role in G-matrix method. After each iteration, the detected leftmost bits ( $\hat{u}$ ) are re-encoded by the generator matrix (G) of polar codes. After that, the re-encoded bits ( $\hat{u}G$ ) and the decoded right-most bits ( $\hat{x}$ ) are applied to Eq. (3). If the result of Eq. (3) is zero, the decoding is finished according to the G-matrix criteria.

$$\sum (\hat{u}G \oplus \hat{x}) \tag{3}$$

By applying the G-matrix early stopping criteria to the BP-based polar decoders, the number of iterations can be effectively reduced and the decoding throughput can be greatly improved [27, 29, 30].

## 3. Proposed muxed-pe architecture

Recall that the LLR calculations of the SMS algorithm are described by Eq. (1) and Eq. (2). In general, these four equations can be categorized into two types: Type-I d = $\lambda \times sign(a)sign(b+c)min(|a|, |b+c|)$  and Type-II d = $a + \lambda \times sign(b)sign(c)min(|b|, |c|)$ . Accordingly, the highlevel architectures of these two types of computation are given in Fig. 3 and Fig. 4, respectively. Here, the S2C unit converts the number representation of the sign-magnitude form into the two's complement form, and C2S unit performs the inverse conversion. Besides, the scale unit implements the scaling function. It can be seen that these two types of computations involve a large amount of combinatorial logic. In addition, the high-level architecture of the conventional PE represented by these two types of computations is given in Fig. 5. As illustrated, Type-I and Type-II are used independently for the computation of the left-to-right and right-to-left LLR messages [15, 16, 17].



Fig. 3. High-level architecture of the Type-I block.

Since the BP decoder contains a large number of PEs, if the resource usage of Type-I and Type-II is lowered, the hardware resource consumption of the BP decoder can be effectively reduced. For the BP-based polar decoders, an indisputable fact is that the message propagation in BP is unidirectional to allow only one of left-to-right or right-toleft messages to be propagated [8], which means that some hardware resources can be reused in the time domain.



Fig. 4. High-level architecture of the Type-II block.



Fig. 5. High-level architecture of the conventional PE.



Fig. 6. High-level architecture of the proposed muxed-pe.

Based on this fact, we designed a multiplexed process element (muxed-pe) architecture to reuse Type-I and Type-II to achieve the purpose of reducing combinatorial logic, as shown in Fig. 6. The proposed muxed-pe architecture introduces a control signal *dir* for selecting the input LLR messages. When *dir* is equal to 0, the LLR messages propagating from right to left are selected, and when *dir* is equal to 1, the LLR messages propagating from left to right are selected, which means that Eq. (1) and Eq. (2) are activated respectively in these two cases.

Since the proposed muxed-pe architecture is a general solution that optimizes MS-based operations, it can also be applied to any other type of BP-based polar decoders.

#### 4. Performance analysis and comparisons

In this section, the performance of different BP-based polar decoding architectures is analyzed. Here, polar codes with a block length of N = 1024 and a code rate of R = 0.5 are

used. The SMS-BP decoder is selected with a scale factor of  $\lambda = 0.9375$  [13]. The maximum number of iterations is set to 15 and the Monte Carlo method is utilized to assess the average number of iterations.

Fig. 7 shows the FER (frame error rate) performance of the G-matrix and original BP decoding without early stopping methods. As illustrated, there is no performance loss because of using early stopping method.



Fig. 7. FER comparisons of the (1024, 512) polar codes.

Fig. 8 shows the average number of iterations of the G-matrix early stopping criterion under various SNRs. It can be seen that the number of iterations of the BP decoder can be effectively reduced by using the G-matrix method, which helps to reduce the decoding latency and improve the throughput.



**Fig. 8.** Average number of iterations with the G-matrix method for (1024, 512) polar BP decoding.

The RTL (Register Transfer Level) models of the optimized SMS-BP polar decoder with proposed muxedpe architecture are developed with Verilog HDL (Verilog Hardware Description Language). The designs are synthesized by Synopsys Design Compiler with SMIC (Semiconductor Manufacturing International Corporation) CMOS 55 nm library. The supply voltage is 1.08 volts with worst timing model at 125°C. For a fair comparison, the same parameters are used for the conventional SMS-BP polar decoder. The comparison results in terms of hardware efficiency and energy efficiency are shown in Table I. Here,

| Fable | I. | Performance | of | different | (1024) | 1, 512 | 2) ] | polar | decode | ers. |
|-------|----|-------------|----|-----------|--------|--------|------|-------|--------|------|
|-------|----|-------------|----|-----------|--------|--------|------|-------|--------|------|

| Design (7-bit<br>Quantization)             | SMS Decoder | SMS Decoder<br>with muxed-pe | SMS Decoder<br>with muxed-pe<br>and G-matrix |  |
|--------------------------------------------|-------------|------------------------------|----------------------------------------------|--|
| CMOS<br>Technology                         | 55 nm       | 55 nm                        | 55 nm                                        |  |
| Maximum Clock<br>Frequency<br>(MHz)        | 400         | 400                          | 400                                          |  |
| Total Gate<br>Counts                       | 1986419     | 1278299                      | 1563359                                      |  |
| Power (mW)                                 | 496         | 398                          | 431                                          |  |
| Average Number<br>of Iterations<br>@3.5 dB | 15          | 15                           | 4.5                                          |  |
| Average Latency<br>(Cycles) @3.5 dB        | 300         | 300                          | 94                                           |  |
| Average<br>Throughput<br>(Gbps) @3.5 dB    | 1.37        | 1.37                         | 4.36                                         |  |
| Hardware<br>Efficiency<br>(Normalized)     | 1           | 1.55                         | 4.06                                         |  |
| Energy Per Bit<br>(pJ/bit) @3.5 dB         | 363         | 292                          | 99                                           |  |

the hardware efficiency (HE) is the ratio of throughput to total gate counts and the energy per bit (EPB) is the ratio of power to throughput. The values of HE and EPB are calculated as Eq. (4) and Eq. (5), respectively.

$$HE = \frac{ClockFrequency \times N}{DecodingLatency \times TotalGateCounts}$$
(4)

$$EPB = \frac{Power \times DecodingLatency}{ClockFrequency \times N}$$
(5)

where the power is reported by Synopsys Design Compiler, N is the block length and the unit of decoding latency is clock cycle.

From Table I, it can be seen that the proposed muxedpe decoder can achieve a 36% reduction in hardware resources and a 55% increase in hardware efficiency compared to the conventional SMS decoder. Since the throughput of the conventional SMS decoder and the proposed muxed-pe decoder are the same, the hardware efficiency represents the inverse value of the reduction ratio of hardware resources. In addition, the power consumption of the SMS decoder with muxed-pe is lowered thanks to the reduction of the combinational logic compared with the conventional SMS decoder. Therefore, the EPB of the proposed muxed-pe decoder is reduced to 292 pJ/b, which is further reduced to 99 pJ/b by applying the G-matrix method. With a simple early termination scheme of G-matrix, the average number of iterations is lowered to 4.5 at a 3.5 dB SNR with no loss in error correcting performance. In addition, the use of G-matrix criterion leads to an additional latency of four clock cycles. Early termination enables a higher throughput of 4.36 Gbps at 431 mW. Therefore, the proposed muxed-pe architecture is a good candidate for low-complexity and high-performance polar decoder designs.

## 5. Conclusion

In this letter, a combinational logic reduced belief propagation decoder for polar codes is designed in 55 nm CMOS technology. With the proposed muxed-pe architecture, optimized SMS-BP polar decoder is developed. Synthesis results show that the proposed architecture has significant advantages with respect to hardware reduction. Since the proposed muxed-pe architecture is a general solution that optimizes MS-based operations, it can also be applied to any other type of BP-based polar decoders.

#### Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their helpful and constructive comments. Meanwhile, this work was supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of the People's Republic of China under Grant Y4GZ342001.

#### References

- E. Arıkan: "Channel polarization: A method for constructing capacity achieving codes for symmetric binary-input memoryless channels," IEEE Trans. Inf. Theory 55 (2009) 3051 (DOI: 10.1109/ TIT.2009.2021379).
- [2] A. Alamdar-Yazdi and F. R. Kschischang: "A simplified successive-cancellation decoder for polar codes," IEEE Commun. Lett. 15 (2011) 1378 (DOI: 10.1109/LCOMM.2011.101811.111480).
- [3] C. Leroux, et al.: "Hardware architectures for successive cancellation decoding of polar codes," 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (2011) 1665 (DOI: 10.1109/ICASSP.2011.5946819).
- [4] K. Niu and K. Chen: "CRC-aided decoding of polar codes," IEEE Commun. Lett. 16 (2012) 1668 (DOI: 10.1109/LCOMM.2012. 090312.121501).
- [5] K. Niu and K. Chen: "Stack decoding of polar codes," Electron. Lett. 48 (2012) 695 (DOI: 10.1049/el.2012.1459).
- [6] C. Leroux, et al.: "A semi-parallel successive-cancellation decoder for polar codes," IEEE Trans. Signal Process. 61 (2013) 289 (DOI: 10.1109/TSP.2012.2223693).
- [7] I. Tal and A. Vardy: "List decoding of polar codes," IEEE Trans. Inf. Theory 61 (2015) 2213 (DOI: 10.1109/TIT.2015.2410251).
- [8] E. Arıkan: "A performance comparison of polar codes and Reed-Muller codes," IEEE Commun. Lett. 12 (2008) 447 (DOI: 10.1109/ LCOMM.2008.080017).
- [9] A. Pamuk: "An FPGA implementation architecture for decoding of polar codes," 2011 8th International Symposium on Wireless Communication Systems (2011) 437 (DOI: 10.1109/ISWCS.2011. 6125398).
- [10] J. Xu, et al.: "XJ-BP: Express journey belief propagation decoding for polar codes," 2015 IEEE Global Communications Conference (2015) 1 (DOI: 10.1109/GLOCOM.2015.7417316).
- [11] J. Lin, et al.: "Reduced complexity belief propagation decoders for polar codes," 2015 IEEE Workshop on Signal Processing Systems (2015) 1 (DOI: 10.1109/SiPS.2015.7344984).
- [12] J. Liu and J. Sha: "Frozen bits selection for polar codes based on simulation and BP decoding," IEICE Electron. Express 14 (2017) 20170026 (DOI: 10.1587/elex.14.20170026).
- [13] B. Yuan and K. K. Parhi: "Architecture optimizations for BP polar decoders," 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013) 2654 (DOI: 10.1109/ICASSP. 2013.6638137).
- [14] B. Yuan and K. K. Parhi: "Early stopping criteria for energy-efficient low-latency belief-propagation polar code decoders," IEEE Trans. Signal Process. 62 (2014) 6496 (DOI: 10.1109/TSP.2014. 2366712).

- [15] B. Yuan and K. K. Parhi: "Architectures for polar BP decoders using folding," 2014 IEEE International Symposium on Circuits and Systems (2014) 205 (DOI: 10.1109/ISCAS.2014.6865101).
- [16] Y. S. Park, et al.: "A 4.68 Gb/s belief propagation polar decoder with bit-splitting register file," 2014 Symposium on VLSI Circuits Digest of Technical Papers (2014) 1 (DOI: 10.1109/VLSIC.2014. 6858413).
- [17] J. Sha, et al.: "A memory efficient belief propagation decoder for polar codes," China Commun. 12 (2015) 34 (DOI: 10.1109/CC. 2015.7112042).
- [18] E. Arıkan: "Polar codes: A pipelined implementation," International Symposium on Broadband Communication (2010) 11.
- [19] M. P. C. Fossorier, *et al.*: "Reduced complexity iterative decoding of low density parity check codes based on belief propagation," IEEE Trans. Commun. **47** (1999) 673 (DOI: 10.1109/26.768759).
- [20] J. Zhao, et al.: "On implementation of min-sum algorithm and its modifications for decoding low-density parity-check (LDPC) codes," IEEE Trans. Commun. 53 (2005) 549 (DOI: 10.1109/ TCOMM.2004.836563).
- [21] A. Darabiha, et al.: "A bit-serial approximate min-sum LDPC decoder and FPGA implementation," 2006 IEEE International Symposium on Circuits and Systems (2006) 149 (DOI: 10.1109/ ISCAS.2006.1692544).
- [22] V. Savin: "Min-max decoding for non binary LDPC codes," 2008 IEEE International Symposium on Information Theory (2008) 960 (DOI: 10.1109/ISIT.2008.4595129).
- [23] K. Niu, *et al.*: "Beyond turbo codes: Rate-compatible punctured polar codes," 2013 IEEE International Conference on Communications (2013) 3423 (DOI: 10.1109/ICC.2013.6655078).
- [24] S. M. Abbas, et al.: "Low complexity belief propagation polar code decoder," 2015 IEEE Workshop on Signal Processing Systems (2015) 1 (DOI: 10.1109/SiPS.2015.7344986).
- [25] C. Condo, et al.: "Blind detection with polar codes," IEEE Commun. Lett. 21 (2017) 2550 (DOI: 10.1109/LCOMM.2017. 2748940).
- [26] S. M. Abbas, *et al.*: "High-throughput and energy-efficient belief propagation polar code decoder," IEEE Trans. Very Large Scale Integr. (VLSI) Syst. **25** (2017) 1098 (DOI: 10.1109/TVLSI.2016. 2620998).
- [27] B. Yuan and K. K. Parhi: "Early stopping criteria for energy-efficient low-latency belief-propagation polar code decoders," IEEE Trans. Signal Process. 62 (2014) 6496 (DOI: 10.1109/TSP.2014. 2366712).
- [28] R. Gallager: "Low-density parity-check codes," IRE Trans. Inf. Theory 8 (1962) 21 (DOI: 10.1109/TIT.1962.1057683).
- [29] C. Simsek and K. Turk: "Simplified early stopping criterion for belief-propagation polar code decoders," IEEE Commun. Lett. 20 (2016) 1515 (DOI: 10.1109/LCOMM.2016.2580514).
- [30] Q. Zhang, et al.: "Early stopping criterion for belief propagation polar decoder based on frozen bits," Electron. Lett. 53 (2017) 1576 (DOI: 10.1049/el.2017.3316).