Parallel access by butterfly networks for any degree permutation polynomial and ARP interleavers
Introduction
Nowadays communication systems with high processing speed at receiver side are a major and important requirement. Turbo codes gain much interest in error correcting codes area because of their very good performance when used on noisy communication channels. To speed up turbo decoding, more processors in parallel are used. For a parallel turbo decoding implementation the extrinsic values are stored in several memory banks. We denoted by L the number of the processors used. Because of the interleaver used in turbo codes, this approach leads to possible collisions when accessing the corresponding L memory banks since the extrinsic values required in turbo decoding are read and written in two different orders. Different interleavers have been designed to resolve the problem of collisions, but this method imposes constraints on the overall turbo code design.
In [1] the problem of avoiding collisions in accessing the memory was solved for an arbitrary interleaver by a suitable mapping of the variables read/written in the memory. In [2] the problem of avoiding collisions in accessing the memory was solved for L a power of two and for an arbitrary interleaver by using butterfly networks to map the addresses of extrinsic values. Butterfly networks offer a simpler and less complex solution for parallel access to the memory.
However, for an arbitrary interleaver, determining the control bits required for a routing as in [2] is cumbersome enough. Therefore, in [3], the author has particularized the determination of the control bits in a butterfly network for quadratic permutation polynomial (QPP) interleavers [4], obtaining an easy way to determine them for these particular interleavers. The control bits can be computed on-the-fly for QPP interleavers and thus initial separated processing and lookup tables are not needed as in [2]. This fact allows using parallel turbo decoding with lower complexity and high speed. These are the main advantages of using butterfly networks in parallel decoding of turbo codes with QPP interleavers.
When implementing a parallel turbo decoder of turbo codes with QPP interleavers in an application specific integrated circuit (ASIC), the main target is to obtain high throughput with as small as possible area of chip and low power consumption. For a given number of processors used, the main challenge in parallel turbo decoder design is to find an efficient solution for routing the extrinsec values computed by the component soft-input soft-output (SISO) maximum a-posteriori probability (MAP) decoders. The two main parts used in implementation of a QPP interleaver for a parallel turbo decoder are the circuit for the generation of the physical address where the extrinsec values will be stored/read in/from the memory and the interconnection network. The interconnection network deals with the appropriate routing of the extrinsec values. To our knowledge the most known interconnection networks proposed in the literature are the crossbar network [5], the master-slave Batcher network [6], the Benes network [7], and the barrel shifter network [8]. In Section 5.1 we will show that when routing the same number of extrinsec values (equal to a power of two) with butterfly networks, the number of 2-input multiplexers and the number of full adders required for hardware implementation is smaller than or equal to the number required for the previous solutions. We note that in [7] it is mentioned that butterfly networks can be used with QPP interleavers. However, the whole turbo codeword block is assumed to be processed at decoder on different equally-sized subblocks for which the maximum contention free property is proved in [9]. The solution proved in [3] for QPP interleavers and in this paper for any degree PP interleavers and ARP interleavers with some constraints is more general and it offers more flexibility. As it is stated in [3], when using several pipelined units to compute the state metrics it is possible to process the whole trellis continuously, while with the proof from Takeshita [9] the processing has to be performed over some disjoint subblocks, and thus metrics initializations for each subblock must be done.
An efficient and general solution for implementation of QPP interleaver for parallel turbo decoder is given in [10]. It is shown that the proposed solution for the memory reading circuit is more efficient in terms of the number of 2-input multiplexers and the number of full adders compared to the previous known solutions. Actually, the solution proposed in [10] uses a butterfly network based structure matched to four types of parallelization of turbo decoders. These are serial MAP (SMAP) or cross MAP (XMAP) strategies to compute the state metrics, the symbol based radix-2ν algorithm to compute the state metrics by merging ν trellis sections, the pipeline decomposition of the recursion computing for the forward and backward metrics required in MAP decoders, and, finally, the classical use of several processors which work as MAP decoders over subblocks of the original turbo codeword block. This solution uses more blocks of butterfly networks appropriatelly arranged so that the complexity is reduced. When the type of parallelization is only made by several MAP decoders that work on disjoint subbloks as in [9], the interconnection network reduces to one butterfly network. Since in the present paper we analyze the possibility to use only one butterfly network for all extrinsec values it is not fair to compare the implementation complexity with that from Wang et al. [10]. The possibility to use any degree PP and ARP interleavers with the solution from Wang et al. [10] is left for future work.
QPP interleavers have been intensively studied in the last fifteen years. The topic of PP interleavers of degrees higher than two has gained interest in the last years (see [9], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20] for some results). Although QPP interleavers lead to very good performance when they are carefully chosen, higher than two degree PP interleavers can overcome the performance of QPP interleavers. Firstly, in [13] the author provides a five degree PP interleaver of length 512 with performance similar to a dihtered relative prime (DRP) interleaver [31], known with the best known performance. Then, some better cubic PP (CPP) interleavers of short to medium lengths are provided in [15], [16], [19]. Recently, in [21] a partial upper bound for any degree PP interleavers has been established. In [20], up to five degree PP interleavers with optimum minimum distance were found and some PP interleavers which reach the partial upper bound in [21] were identified.
Another class of interleavers with very good performance and simple implementation is that of the almost regular permutation (ARP) interleavers [22]. ARP interleavers are used in Digital Video Broadcasting Return to Channel Satellite (DVB-RCS) [23] and WiMax [24] standards. ARP interleavers were also proposed as an alternative for QPP interleavers in LTE standard [25].
In [3], the use of butterfly networks with on-the-fly determination of control bits is proved only for QPP interleavers. Our concern in this paper is to show that any degree PP interleavers and ARP interleavers with some constraints, allow the same on-the-fly determination of control bits for routing the extrinsec values with butterfly networks in parallel turbo decoding. As a consequence, the advantages of using butterfly networks with on-the-fly determination of control bits hold for these interleavers.
The 5G wirelles network is proposed to be used in 2020. Turbo codes used in the previous 3G and 4G networks are under study to be replaced with another two channel code classes which approach the channel capacity [26], [27], namely low density parity check (LDPC) codes [28] and polar codes [29]. The reason for this change would be the fact that turbo codes cannot achieve enough high throughputs and that they have higher complexity than LDPC and polar codes. However, in [30], these two statements have been unconfirmed. AccelerComm has demonstrated that turbo codes can achieve decoded throughputs exceeding the 5G target of 20 Gbps. In [30] it was shown that overall implementation complexity of a channel code depends not only on its computation complexity, but also on its interconnect complexity and its inherent flexibility. From the three near-optimal channel code classes, turbo codes have higher degrees of flexibility at the lowest complexity. These features are due to the regular and flexible structure of turbo codes, in contrast to the structures of LDPC and polar codes. Additionally, turbo codes can offer the advantage of backwards compatibility to 3G and 4G. The interconnect complexity of turbo codes can be the lowest with butterfly networks.
Using any degree PP and ARP interleavers in parallel turbo decoding with butterfly networks, we can achieve good error correction, high flexibility, low latency and low complexity, such as those required in 5G communication systems. In addition, these interleavers can be designed without other constraints (except their length) when parallel turbo decoding is used. These issues presented above have motivated our work.
The main contributions of this paper are:
- 1)
we show that not only QPP interleavers, but any degree permutation polynomial (PP) allow the same easy way for computing the control bits required in butterfly networks as in [3].
- 2)
we prove some properties of ARP parameters and give a way to construct ARP interleavers with component LPPs from an ARP with R′ LPPs, where R, kR, and R′ are positive integers.
- 3)
we show that ARP interleavers consisting of R component LPPs, with R a power of two allow the same easy way to compute the control bits required in the butterfly networks as in [3] when the number of processors used in parallel turbo decoding is a power of two, dividing the interleaver length, greater than or equal to R. If the ARP parameters fulfill some more constraints, then the number of used processors can be smaller than R.
- 4)
we make a theoretical comparison from the point of view of implementation complexity of butterfly networks and other previous known interconnection networks when using any degree PP interleavers and ARP interleavers with some constraints. The implementation complexity is assessed in terms of the number of 2-input multiplexers and the number of full adders required for implementation of the interconnection network and to generate the interleaved addresses. We show that the implementation complexity of butterfly networks with interleavers analyzed in this paper is lower than or equal to that of the previous known interconnection networks. Additionally, butterfly networks with possibilities to store the extrinsec values proved in the paper offer more flexibility than classical solution with disjoint equally-sized subblocks.
Recently, in [32], it has been shown that ARP interleavers represent a more general model because DRP and QPP interleavers can be described by ARP model. The representation QPP interleavers by ARP model was extended to cubic permutation polynomial (CPP) interleavers in [19]. Thus, all these performant interleavers can be used with the same facility when parallel turbo decoding uses butterfly networks.
The paper is structured as follows. Section 2.2 presents a background for parallel access by butterfly networks, in Sections 3 and 4 it is proved that any degree PP and ARP interleavers, respectively, can be used with butterfly networks with the same facility of “on the fly” control bits computing, as it is proved for QPP interleavers in [3]. In Section 4, the condition for parameters of an ARP interleaver is also proved and two ways to choose the parameters for these interleavers are provided. In Section 5.1 we make a theoretical comparison of the most known interconnection networks from the point of view of implementation complexity. In Section 5.2 we give two examples for a five degree PP interleaver and two examples for an ARP interleaver, showing for each of them the physical interleaved addresses and the control bits when using these interleavers with butterfly networks. Finally, Section 6 concludes the paper.
Section snippets
Notations
In the paper, the following notations will be used:
- •
is the set of natural numbers;
- •
is the set of natural numbers greater than zero;
- •
WN, with is the set ;
- •
is the set of prime numbers;
- •
with and n ≥ k is the binomial coefficient (i.e. n choose to k);
- •
p∣N, with stands for p divides N;
- •
stands for the greatest common divisor of non-negative integers a and b;
- •
(mod n), with stands for modulo n operation;
- •
⌈x⌉, with x a real number, stands for the
Previous results on permutation polynomials
We begin this section with the definition of a PP. Then we give three theorems and a lemma useful for getting the results in this section. Definition 3.1 The polynomial of degree d, modulo N:where N is a positive integer, is a PP if the coefficients qk, are chosen so that the set modulo N, is a permutation of the set WN.
The free term q0 only determines a cyclic shift of the permutation elements. Thus, we may and we will assume that . Theorem 3.2 For any [4], [11]
Generating ARP interleavers
The definition of an ARP interleaver is given below. Definition 4.1 An ARP interleaver modulo N is defined aswhere R ∈ WN, so that R∣N.
π(x) from Eq. (30) is an interleaver modulo N only if . In the following we obtain the conditions for free terms P0, P1, ..., so that π(x) from Eq. (30) is an interleaver.
We note that Definition 4.1 of an ARP is not exactly the one given in [22], but it was used in [32]
Implementation complexity analysis
The scope of the present paper is beyond of the ASIC implementation of a parallel turbo decoder using butterfly networks. However, in this section we make a theoretical analysis from the point of view of the implementation complexity for the memory reading circuit when using butterfly networks for PP of degree d or ARP with R LPPs interleavers. The implementation complexity is done in terms of the number of 2-input multiplexers and the number of full adders. In this analysis we compare the
Conclusions
In this paper we prove that parallel decoding of turbo codes with any degree PP and ARP interleavers can be performed using butterfly networks, allowing the same easy way to compute the control bits as it was shown for QPP interleavers in [3]. The usefulness of the results in the paper consists in the possibility of computing “on the fly” the control bits when using turbo codes with these performant algebraic interleavers.
The result for any degree PP interleavers is general, in the sense that
Acknowledgement
We thank the editor and the reviewers for their helpful comments and suggestions which greatly improved the quality and the presentation of this paper.
References (43)
- et al.
A coefficient test for fourth degree permutation polynomials
AEU Int. J. Electron. Commun.
(2016) - et al.
The limitation of permutation polynomial interleavers for turbo codes and a scheme for dithering permutation polynomials
AEU Int. J. Electron. Commun.
(2015) Permutation polynomials modulo 2w
Finite Fields Appl.
(2001)- et al.
Mapping interleaving laws to parallel turbo and LDPC decoder architectures
IEEE Trans. Inf. Theory
(2004) A contention-free parallel access by butterfly networks for turbo interleavers
IEEE Trans. Inf. Theory
(2014)On quadratic permutation polynomials, turbo codes, and butterfly networks
IEEE Trans. Inf. Theory
(2017)- et al.
Interleavers for turbo codes using permutation polynomials over integer rings
IEEE Trans. Inf. Theory
(2005) - et al.
High-throughput turbo decoder using pipelined parallel architecture and collision-free interleaver
IET Commun.
(2012) - et al.
Design and implementation of a parallel turbo-decoder ASIC for 3GPP-LTE
IEEE J. Solid State Circuits
(2011) - et al.
Efficient VLSI architecture of QPP interleavers for LTE turbo decoders
Proceedings of the IEEE International Symposium on System-on-Chip
(2012)
Reconfigurable turbo decoder with parallel architecture for 3GPP LTE system
IEEE Trans. Circuits Syst. II Exp. Briefs
On maximum contention-free interleavers and permutation polynomials over integer rings
IEEE Trans. Inf. Theory
Design of QPP interleavers for the parallel turbo decoding architecture
IEEE Trans. Circuits Syst. I Reg. Papers
A simple coefficient test for cubic permutation polynomials over integer rings
IEEE Commun. Lett.
A note on “A simple coefficient test for cubic permutation polynomials over integer rings”
IEEE Commun. Lett.
Permutation polynomial interleavers: an algebraic-geometric perspective
IEEE Trans. Inf. Theory
A note on permutation polynomials over
IEEE Trans. Inf. Theory
Permutation polynomials of higher degrees for turbo code interleavers
IEICE Trans. Commun.
Analysis of cubic permutation polynomials for turbo codes
Wirel. Person. Commun.
A coefficient test for quintic permutation polynomials over integer rings
IEEE Access
On the equivalence between cubic permutation polynomial interleavers and ARP interleavers for turbo codes
IEEE Trans. Commun.
Cited by (1)
Cubic Permutation Polynomials-Based Block RLNC Algorithm in Wireless Networks
2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)