

# Flexible Multilevel Coding with Concatenated Polar-Staircase Codes for M-QAM

Mehmood, Tayyab; Yankov, Metodi Plamenov; Iqbal, Shajeel; Forchhammer, Søren

Published in: IEEE Transactions on Communications

Link to article, DOI: 10.1109/TCOMM.2020.3038185

Publication date: 2021

Document Version Peer reviewed version

Link back to DTU Orbit

*Citation (APA):* Mehmood, T., Yankov, M. P., Iqbal, S., & Forchhammer, S. (2021). Flexible Multilevel Coding with Concatenated Polar-Staircase Codes for M-QAM. *IEEE Transactions on Communications*, *69*(2), 728-739. https://doi.org/10.1109/TCOMM.2020.3038185

# **General rights**

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

- You may not further distribute the material or use it for any profit-making activity or commercial gain
- You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

# Flexible Multilevel Coding with Concatenated Polar-Staircase Codes for M-QAM

Tayyab Mehmood Student Member, IEEE, Metodi P. Yankov Member, IEEE, Shajeel Iqbal and Søren Forchhammer Member, IEEE

Abstract-In this work, a multilevel coding (MLC) based coded modulation scheme with two degrees of freedom in rate flexibility is proposed and compared with a bit-interleaved coded modulation (BICM) scheme from a performance versus complexity perspective. The proposed MLC scheme is based on a rate flexible inner soft-decision polar code and utilizes an outer hard-decision staircase code structure as in the 400ZR concatenated forward error-correcting code. The performance of the MLC scheme is investigated for a range of inner code lengths, inner decoder list sizes, and signaling with 16 and 64 quadrature amplitude modulation, respectively. The MLC is designed such that a portion of the staircase encoded bits can bypass the inner code. The number of required inner soft-decision decoders can thus be reduced, thereby saving computational complexity. The proposed MLC scheme simultaneously offers up to a 53.7% reduction in the number of inner decoders and up to 0.55 dB of performance improvement when compared with the similar **BICM** approach.

Index Terms—Polar codes, staircase codes, multilevel coding, rate flexibility, data center interconnects.

# I. INTRODUCTION

THE proliferation of bandwidth starving internet applications and cloud-based services causes an explosive growth in the traffic exchanged between data centers. Hyperscale data center cloud providers have necessitated the building of dedicated data center interconnects, which should be optimized in terms of data rate, latency, and power consumption [1]. Driven by the data center interconnect market needs, the optical internetworking forum proposed an implementation agreement for interoperable single carrier 400Gbps pluggable coherent modules called 400ZR. In 400ZR, a particular focus is on forward error-correction (FEC) codes and digital signal processing algorithms as solutions to cope with the latency and power constraints of the in-region distributed data center interconnects [2].

The pursuit of high net coding gain from FEC with softdecision decoding (SDD) such as low-density parity-check (LDPC) [3] and polar codes [4] has instigated a significant

Parts of this paper have been presented at the Optical Fiber Communication Conference (OFC), San Diego, California, USA, March 2020. [21]

This work was supported by the Innovation Fund Denmark (INCOM project No. 8057-00059B) and DNRF Research Center of Excellence, SPOC, ref. DNRF123. The work of S. Iqbal was supported by IFD, ref. 9066-00027A.

T. Mehmood, M. P. Yankov, and S. Forchhammer are with the Department of Photonics Engineering, Technical University of Denmark, 2800 Kongens Lyngby, Denmark (e-mail: tayy@fotonik.dtu.dk; meya@fotonik.dtu.dk; sofo@fotonik.dtu.dk).

S. Iqbal is with Comcores ApS, 2800 Lyngby, Denmark, and also with the Department of Photonics Engineering, Technical University of Denmark, 2800 Kongens Lyngby, Denmark (email: shaip@fotonik.dtu.dk). increase in the decoding complexity. Compared to the highperformance soft-decision codes, the hard-decision decoding (HDD) based FEC solutions such as product codes and staircase codes require decoders of lower complexity [5]. The complexity of the SDD based coarsely quantized LDPC decoders is investigated and reduced in [6], [7], whereas the performance of the HDD based product-like codes is improved by exploiting the soft information from the channel in [8], [9].

1

Polar codes [4] are a class of linear block SDD codes which are capacity achieving in basic channels, such as the binary symmetric channel (BSC) [4] and the additive white Gaussian noise (AWGN) channel [10]. The construction of successive cancellation list (SCL) decoder for polar codes [11] has shown a highly competitive error-correction performance, at the expense of higher decoding complexity and latency. Compared to the high complexity long polar codes, the construction of short polar codes has recently gained considerable attention in the optical communications community due to the low-complexity and parallel decoding of component codes. Systematic and non-systematic polar codes as component codes have also been studied for product codes in [12]–[14], and staircase codes in [15]–[17].

In order to reduce the complexity of the FEC decoders, the concatenated (inner soft-decision and outer hard-decision code) FEC (CFEC) approach is gaining traction [2], [18]–[21].

To reduce the SDD complexity and to eliminate the error floor in CFEC solutions, in [19] the LDPC-staircase code was proposed with fixed rate and quadrature phase-shift keying modulation. The key idea is to design the complexity-reduced inner LDPC code with SDD to reduce the bit-error rate (BER) below the threshold of an outer staircase code, while the outer code is responsible both for correcting errors and to achieve output BER below  $10^{-15}$  (without a higher error floor). The complexity reduction in the inner LDPC code is achieved both by optimizing the degree distribution of the LDPC code ensemble and by leaving a fraction of inner codeword bits uncoded. The concatenated Hamming-staircase code with an outer decoder error floor below  $10^{-15}$  is adopted as the 400ZR FEC algorithm, because of its well-balanced FEC performance and a low-complexity soft-decision Hamming code [22]. In [20], the performance of the 400ZR FEC was investigated down to BER of  $10^{-15}$  using FPGAs.

Future optical networks (data center interconnects, edge, metro, long-haul, and subsea) demand transceivers to function at different signal-to-noise ratios (SNRs). Therefore, a coded modulation (CM) scheme with the possibility to adapt its spectral efficiency to the varying SNR is required in these networks. Rate flexibility is not considered in the 400ZR implementation agreement from the optical internetworking forum [2] but as an extension, it can either be achieved by employing a set of FEC codes of different code rates [3] or by shortening or puncturing, e.g. shortening the staircase codes [23]. Multiple FEC codes require an increased chip area, which is already challenging for coherent transceivers, while shortening and puncturing of the staircase code come at the cost of performance degradation [23]. Hence, for future 800ZR+ data center interconnects, it is highly desirable to have a low-power, high-performance, and/or rate flexible solution with the same CFEC structure and an error floor below  $10^{-15}$ [24]. In [21], a rate flexible CFEC solution was proposed with the same structure as in the 400ZR FEC of [22] but with a different inner code by replacing the Hamming code with rate flexible polar code. The concatenated polar-staircase code (CPSC) offers rate flexibility with near-continuous granularity and 0.35 dB of extra gain when compared with the 400ZR FEC design [21].

This paper resumes the work of [21] on the bit-interleaved coded modulation (BICM) based rate flexible CPSC for optical communication. This paper further addresses the complexity of concatenated schemes w.r.t. the number of inner decoders used in an multilevel coding (MLC) architecture [25]. The performance and complexity of the proposed scheme are compared to the classical BICM. The early work on fixed-rate MLC schemes date back to 1977 [26], but their application for optical fiber communication systems with concatenated codes have only recently been reported in [27]–[29], where the MLC schemes are wisely designed to offer performance or complexity savings over BICM.

The novel contributions of this paper are as follows:

- A two-level MLC scheme with CPSC codes is proposed for high spectrally efficient optical communication systems.
- The code structure of an inner rate flexible polar code is redesigned in such a way that some of the bits bypass the SDD based inner code and are only protected by the HDD based outer code.
- The proposed MLC scheme offers a rate flexibility with two degrees of freedom with near-continuous granularity.
- The proposed MLC scheme results both in net coding gain benefits and direct complexity savings due to the reduced number of SDD based inner decoders when compared with the similar BICM scheme [21].

The remainder of this paper is arranged as follows. In Section II, we introduce the concatenated outer staircase code, inner polar codes, and the MLC scheme. Section III of this paper describes the principles of BICM and MLC-based CM (coded modulation). A design example of complexityreduced MLC-based CM scheme is presented in Section IV and compared with the BICM scheme. Simulation results are presented in Section IV for various inner code blocklengths, inner decoder list sizes, and quadrature amplitude modulation (QAM) formats to observe the effectiveness of the proposed MLC scheme in terms of performance improvement and reduction in the number of inner decoders as compared to the BICM scheme.

# **II. PRELIMINARIES**

2

Because of the complexity and power constraints of practical systems, a CM scheme always exhibits a gap compared with the constellation constrained capacity (CCC) of the uniform M-QAM over the AWGN channel. The coded modulation capacity of the channel with input constrained to uniform QAM constellation [30], defined here as CCC. The performance of CM systems is often characterized by the gap to the CCC ( $\Delta CCC$ ) value (in dB) at a given rate and over a given channel using M-QAM. The  $\Delta CCC$  can be defined as:

$$\Delta CCC(dB) = (SNR)_{CM, \, dB} - (SNR)_{MI, \, dB} \,, \quad (1)$$

where  $(SNR)_{CM,dB}$  is the minimum SNR (in dB) of the practical CM system, required to achieve a target BER at a target data rate for a given modulation format and  $(SNR)_{MI,dB}$  is the SNR at which the mutual information between the channel input and output of the M-QAM equals the specified data rate. The target BER is  $1 \times 10^{-15}$ , corresponding to the output BER of the outer code. The CCC is a limit that can be achieved by infinitely large code length and decoding complexity. In general,  $\Delta CCC$  increases with the suboptimality of the FEC code, e.g. practical finite code length and limitations on the decoder. Throughout the paper, we consider a memoryless, AWGN channel and assume uniform signaling using a binaryreflected Gray (BRG) labeled M-QAM constellation.

# A. The Outer Staircase Codes

The staircase code is a class of high-rate FEC codes that amalgamates the ideas from the recursive convolutional and linear block codes [31]. Let,  $n_c$ ,  $k_c$ ,  $r_c$  and  $t_c$  denote the code length, information dimension, redundancy, and unique decoding radius of the component code, respectively. We use a binary primitive  $(n_c, k_c, r_c, t_c)$  BCH code as the component code C to construct staircase codes. An ITU-T G.709.2 [32] recommended staircase code can be defined as a semi-infinite set of  $D \times D$  blocks  $\mathbf{B}_i$ , i = 1, 2, ..., where each row in the successive blocks  $[\mathbf{B}_{i-1}^T, \mathbf{B}_i]$  is a valid codeword in Cand  $D = n_c/2$ . The rate  $R_{sc}$  of staircase code with BCH component code  $(n_c, k_c, r_c, t_c)$  can be defined as,

$$R_{sc} = 1 - \frac{2(n_c - k_c)}{n_c}.$$
 (2)

The staircase codes are naturally unterminated, hence allowing a broad range of decoding methods with different latencies. The decoding of staircase codes is iteratively performed in a sliding window fashion. Iterative decoding halts when the maximum number of iterations  $I_{sc}$  has been reached. Thereafter, the decoder slides the window by shifting out the estimate of decoded block  $B_i$ , and shifts in a new block  $B_{i+W}$ and the operation reiterates indefinitely. The total latency of the staircase decoder is a function of the decoding window size, W, and  $I_{sc}$ . Generally, the performance of the decoder improves by increasing the number of blocks, W, in the decoding window. The number of lookups and binary operations per information bit, for a sliding window decoder with syndromebased decoding of C, can be loosely upper bounded as [18]

$$\frac{I_{sc}(W-1)t_c}{(D-r_c)}$$

The threshold  $P_{sc}$  (pre-outer decoding BER) of the outer code is the maximum error probability (assuming uniform distribution of errors) for which the outer decoder can attain an output BER of  $10^{-15}$ .

# B. The Inner Polar Codes

Polar codes [4], [33], based on the *channel polarization* method, are the first family of FEC codes that can achieve the capacity of a discrete memoryless binary symmetric channel. As the code length  $N_{pc}$  goes to infinity, the channel polarization method classifies the bit-channels into either good (very high reliability) or bad channels (very low reliability) [4]. However, for the practical polar code blocklengths  $N_{pc}$ , the channel polarization of bit-channels is incomplete, hence, there are partially noisy channels between good and bad channels, known as mediocre channels. In the proposed concatenated scheme, the polar codes behave as error reduction codes, where the correction of any complete bit-error sequence in the decoded block is not required. Instead, reduction in the average bit-error-rate (BER) of the decoded block is required.

The order of reliability of the bit-channels depends on the channel SNR and thus the ordering is non-universal [34]. This non-universality of the reliability order of bit-channels makes the construction of rate flexible polar codes with optimum performance non-trivial. Different techniques to design the polar codes on-the-fly, with optimum performance for each data rate, have been investigated in [35]. However, these on-the-fly techniques increase the complexity, power consumption, and latency of the polar code. A description of optimizing the design SNR of the error-reducing inner polar code to achieve rate flexibility over a given range with a single CFEC code was proposed in [21] and detailed in the following:

Design SNR (Frozen Set) Optimization: In the proposed inner-code, the identification of the channel reliabilities (and thereby the frozen set), associated with each information bit to be encoded, are efficiently calculated by using the Bhattacharyya parameter Z(U) [4]:

$$Z(U) = \sum_{y \in \mathcal{Y}} \sqrt{U(y|0) U(y|1)}, \qquad (3)$$

where  $\{U(y|x) : x \in \mathcal{X}, y \in \mathcal{Y}\}$  denotes the transition probabilities of the memoryless binary symmetric channels, with input alphabet  $\mathcal{X} = \{0, 1\}$  and output alphabet  $\mathcal{Y}$ .

Recently, in [21] the application of a rate flexible errorreducing inner polar code was proposed, where for a fixed code length  $N_{pc}$ , a single frozen set is chosen at which the average  $\Delta CCC$  value over the range of code rates is relatively constant. Furthermore in [21], fixed-rate inner polar codes are designed on-the-fly for each data rate and optimized w.r.t to the design SNR. The penalty for using rate flexible codes w.r.t. the on-the-fly calculated optimum code will be discussed in Sec. IV-A. It is worth mentioning that throughout the paper, the frozen set of the inner polar codes is designed according to the desired threshold  $P_{sc}$  of the outer code and not where the output frame error rate (FER) of the polar decoder is minimum. After fixing the CFEC parameters, the number of inner code information bits,  $K_{pc}$ , can be tuned according to the channel conditions to get a BER below  $P_{sc}$  at the input of the output decoder with flexible rates. After choosing a single frozen set, a cyclic redundancy check (CRC) of length  $l_{crc}$  is performed on the  $K_{pc}$  information bits of the inner code and the overhead is included in the simulation results. The resultant concatenated block of size  $K_{pc}+l_{crc}$  bits is encoded by the  $(N_{pc}, K_{pc}+l_{crc})$  systematic polar encoder. The rate of inner code can be defined as:

$$R_{pc} = \frac{K_{pc}}{N_{pc}}.$$
(4)

It has been observed that the concatenation of an outer CRC code (acting as a genie) improves the performance of the successive-cancellation list (SCL) decoder [11]. In this work, CRC-aided SCL (CA-SCL) decoder is considered for the polar codes.

# C. 400ZR FEC

The 400ZR FEC algorithm is a concatenated FEC that combines an inner SDD-based Hamming code with an outer HDD-based staircase code. In 400ZR FEC [2], a systematic double-extended (119, 128) Hamming code is used as inner code, the SDD of the Hamming code may be performed with e.g. a Chase decoder [36]. In this paper, we adopt the Chase implementation from [37], which finds the six most unreliable candidate positions for flipping and corrects up to four errors. The (239, 255) staircase code with a sliding window decoder of W = 5,  $I_{sc} = 14$ , and BER threshold of  $P_{sc} = 5 \times 10^{-3}$  is used as outer code. The 400ZR FEC has a code rate of  $(239/255) \times (119/128)$  and by multiplying by  $\log_2(M)$ , yields 3.49 and 5.23 (bits/QAM symbol) for 16- and 64-QAM, respectively.

## D. Multilevel Coding

The CM addresses the question of how to jointly optimize the coding for the specific non-binary transmission symbols, as a means of improving the performance of the transmission system. Examples of powerful CM methods include Trelliscoded modulation introduced by Ungerboeck [38], MLC presented by Imai [26], and BICM proposed in [30], [39]. Let  $\mathbf{b} = (b^0, b^1, ..., b^{l-1}), b^i \in \{0, 1\}$ , be the binary addresses of the  $M = 2^l$ -ary modulation scheme and signal set (constellation)  $\mathcal{A} = \{a_0, a_1, ..., a_{M-1}\}$  is defined by the bijective mapping  $a = \mathcal{M}(\mathbf{b})$  of binary addresses b.

In the set-partitioning based Trellis-coded modulation scheme, the more reliable binary bit-levels stay uncoded and the least reliable binary bit-levels are protected by the convolutional codes [38]. In general, in MLC, the M-ary input channel, U, is divided into l equivalent bit-levels (bit channels), where each bit channel  $b^i$  is protected by an individual binary component code  $C_i$  of rate  $R_i$ . Multi-stage decoding (MSD) can be used to decode MLC, which is based This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2020.3038185, IEEE Transactions on Communications

on the principle of successive interference cancellation [26]. In MSD, each bit-level  $b^i$  protected by an individual binary component code  $C_i$  of rate  $R_i$  is decoded given the results of previously decoded bit-levels. In MLC, parallel decoding of individual levels (PDILs) can be achieved by removing the interaction between the decoders. Compared to MSD, PDIL results in shorter decoding time and has no issue of error propagation while decoding [40].

The application of the MLC to optical communication systems is a rather new area of research. In [41], Ungerboeck's set-partitioning based MLC scheme was applied to reduce the non-linear phase noise, using Reed–Solomon codes with different rates as the component codes at different bit-levels of MLC, and performing MSD at the receiver to improve the performance of the fiber-optic system. A three-level MLC scheme is proposed in [42], where SDD based LDPC, HDD based BCH and uncoded bits are assigned to each bit-level of the MLC scheme with different code rates and unequal error protection. At the receiver-side, iterative MSD is performed between the BCH and LDPC codes to improve the performance of the system.

An MSD-based MLC scheme with an outer staircase code and an inner differentially encoded tail-biting convolutional codes is proposed in [27] for coherent systems, where the low-reliability bit-levels of an M-QAM scheme are encoded by the CFEC code while the remaining high-reliability bitlevels are only protected by the outer code. In [28] a twolevel MLC scheme with an outer product code was proposed. A data stream encoded by the inner spatially-coupled repeat accumulate code is assigned to the low-reliability bit-levels of the constellation symbols and decoded by the two-stage MSD, while the remaining data stream bypass the inner code and are assigned to the high-reliability bit-levels of the constellation symbols. The comparison between the MSD-based MLC and BICM approach for various M-QAM modulation formats is done in [29], where the complexity-optimized soft-decision LDPC code is concatenated with the hard-decision staircase code, as in [19]. At data rate of 4.68 (bits/symbol) and applying 64-QAM modulation, the MLC scheme of [29] results in a coding gain of up to 0.4 dB or complexity savings of 60%, when compared with the BICM.

The key idea of our proposed MLC scheme is to spend more computations and error-correcting power for the bits which are at low-reliability bit-levels and spend less power for the  $L_0$  bypassing bits that are assigned to the high-reliability bitlevels. We propose a two-level PDIL-based MLC scheme with an outer staircase code and an inner polar code with an M-QAM modulation. A portion  $L_0$  of the outer encoded bits bypass the inner code and are assigned to the high-reliability bit-levels of the constellation points, while the remaining bits are protected by the inner code and are assigned to the relatively low-reliability bit-levels of the constellation points. Contrary to the MLC-based CFEC schemes proposed in [27]-[29], where the data rate is fixed for a single FEC code, our proposed scheme is capable of achieving rate flexibility with two degrees of freedom with a single FEC code by simultaneously tuning the  $K_{pc}$  and  $L_0$  bits.

# III. PROPOSED MULTILEVEL CODING WITH CONCATENATED CODES DESIGN

4

The proposed CM scheme comprises an outer staircase code with the extended BCH (1022,990) code as component code. In the encoding process, the information bits of size  $D \times (D - r_c)$  are encoded and the generated parity bits of size  $(D \times r_c)$  are arranged into the staircase block. The outer encoded bits are first interleaved by the interleaver  $\pi_1$ (of length  $J \times D^2$ ) and then either de-multiplexed into two separate data paths (MLC scheme) or passed to the inner encoder (BICM scheme). We consider the BICM scheme as a special case of the MLC scheme of Fig. 1 when  $L_0 = 0$ . A demultiplexer denoted DEMUX ( $L_0, K_{pc}$ ) is used to split the  $\pi_1$ interleaved bits into  $L_0$  bypassing bits and inner encoder input bits  $K_{pc}$ . The DEMUX ( $L_0, K_{pc}$ ) allows the concatenated codes to simultaneously switch between the following two CM approaches (1) BICM ( $L_0 = 0$ ) and (2) MLC ( $L_0 \neq 0$ ).

#### A. BICM scheme

For the BICM scheme,  $L_0$  is equal to zero and there is only one data-path at the output of DEMUX  $(0, K_{pc})$ , as shown in Fig. 1. For the fixed CFEC parameters,  $K_{pc}$  can be selected at the DEMUX  $(0, K_{pc})$  to achieve rate flexibility with one degree of freedom. The total number of inner encoded codewords required to transmit J staircase blocks over the AWGN channel is equal to  $T_{BICM}$ . The outer encoded interleaved bits of size  $(J \times D^2)$  are divided into  $T_{BICM}$  sub-blocks, where each sub-block is of size  $K_{pc}$ . The total number of  $T_{BICM}$ sub-blocks can be calculated as:

$$T_{BICM} = \frac{(J \times D^2)}{K_{pc}},\tag{5}$$

where the  $T_{BICM}$  is inversely proportional to the  $K_{pc}$  bits. The CRC is performed on each sub-block of size  $K_{pc}$  bits and the resulting concatenated block of size  $K_{pc} + l_{crc}$  bits is encoded by the  $(N_{pc}, K_{pc} + l_{crc})$  systematic polar encoder. The encoded codeword  $N_{pc}$  contains  $K_{pc}+l_{crc}$  bits at the most reliable bit positions and the remaining  $F_{pc} = N_{pc} - K_{pc} - l_{crc}$ least reliable bit positions, known as frozen-bits, are the added parity bits in the codeword  $N_{pc}$ . The polar transformed vector is first permuted by the interleaver  $\pi_2$  and then passed to the mapper  $\Phi$ . We assume that the mapper is responsible both for BRG labeling and for modulation of the input bit sequences to M-QAM constellation points (similar to [21]). The resulting  $(N_{pc}/\log_2(M))$  modulated symbols are transmitted over the AWGN channel. At the receiver-side, demodulation and demapping is performed by a soft-decision de-mapper  $\Phi^{-1}$  [43]. For  $L_0 = 0$ , all the log-likelihood ratios (LLRs) are first deinterleaved by  $\pi_2^{-1}$  and then decoded by the CA-SCL polar decoder below the threshold  $P_{sc}$  of an outer staircase code. The rate per M-QAM symbol of the BICM based CPSC scheme can be defined as:

$$R_{BICM} = \left( \left( \frac{K_{pc}}{N_{pc}} \right) \times R_{sc} \right) \times \log_2(M).$$
 (6)



Fig. 1. The system model of the MLC-based concatenated polar-staircase code for 64-QAM modulation over an AWGN channel.

# B. MLC scheme

For  $L_0 \neq 0$ , there are two data-paths at the DEMUX  $(L_0, K_{pc})$ , as shown in Fig. 1. For the fixed outer code parameters, the number of  $L_0$  and  $K_{pc}$  bits can be tuned at the DEMUX  $(L_0, K_{pc})$  to achieve rate flexibility with two degrees of freedom in choosing the rate of the CM system. The  $\pi_1$  interleaved bits of size  $(J \times D^2)$  are split into  $T_{MLC}$  sub-blocks and each sub-block is of size  $L_0 + K_{pc}$ . The total number of  $T_{MLC}$  sub-blocks can be calculated as:

$$T_{MLC} = \frac{(J \times D^2)}{(K_{pc} + L_0)},$$
(7)

where the value of  $T_{MLC}$  (number of inner decoders, in our case) decreases as we increase the parameters in the denominator. The data on the branch  $L_0$  bypass the inner encoder and are assigned to the high-reliability bit-levels of the constellation points, as shown in Fig. 1, where a BRG 64-QAM mapping is exemplified. The data on the  $K_{pc}$  branch is first concatenated with the  $l_{crc}$  bits and then the resulting bits are encoded by the  $(N_{pc}, K_{pc} + l_{crc})$  polar encoder. The inner encoded  $N_{pc} = (K_{pc} + l_{crc} + F_{pc})$  bits are first scrambled by  $\pi_2$  and then assigned to the remaining bit-levels of the QAM constellation symbols, as shown in Fig. 1. In the simulations, we assume that the mapping  $\Phi$  is responsible for assigning the  $L_0$  bits to the high-reliability bit-levels and  $N_{pc}$  bits to the remaining bit-levels of constellation symbols. The resulting  $((N_{pc} + L_0)/\log_2(M))$  modulated symbols are transmitted over the AWGN channel.

At the receiver-side, demodulation and de-mapping are performed by a standard soft-decision de-mapper  $\Phi^{-1}$  [43], which is complementary to the mapper. Hard-decision is performed on the most-reliable  $L_0$  bits to estimate the  $\hat{L}_0$  bits. The LLRs generated for the remaining  $\hat{N}_{pc}$  bits are first de-interleaved by  $\pi_2^{-1}$  and then fed to the CRC-aided SCL decoder. The inner decoder outputs hard-decision data stream of length  $\hat{K}_{pc}$ . The hard-decision data-streams of both  $\hat{L}_0$  and  $\hat{K}_{pc}$  are multiplexed before the de-interleaver  $\pi_1^{-1}$ . To calculate the  $\Delta CCC$  value, the average BER  $P_{sc}$  is calculated at the input of  $\pi_1^{-1}$ . The de-interleaved data stream is fed to the HDD based sliding window decoder to attain the output BER below  $1 \times 10^{-6}$ , which is the best feasible BER in our simulations due to the computation time. The rate per M-QAM symbol of the MLC based CPSC scheme using (2) can be defined as

$$R_{MLC} = \left( \left( \frac{K_{pc} + L_0}{N_{pc} + L_0} \right) \times R_{sc} \right) \times \log_2(M).$$
(8)

5

In this work, we consider two target performance measures 1) relative complexity-reduction  $T_{red}$  (%) and 2) relative performance gain  $\eta$  (dB), which will be used in Sec. IV to evaluate the effectiveness of our proposed MLC scheme with respect to BICM. At a fixed data rate, the relative complexityreduction  $T_{red}$  (%) achieved for the inner decoder using the proposed MLC scheme w.r.t BICM can be calculated as:

$$T_{red} = 100 \times \left(1 - \frac{T_{MLC}}{T_{BICM}}\right),\tag{9}$$

where, for a given frame length,  $T_{red}$  measures the reduction in the number of inner soft-decision decoders when we switch from BICM to MLC. In this paper, the absolute complexity of the inner SCL decoder is not considered, but rather the relative complexity savings, which are the same regardless of the complexity of the decoder, as demonstrated in Sec. IV. This complexity metric is chosen because it is agnostic to the applied FEC. On the contrary, the MLC scheme of [29] measures complexity as the number of messages passed per information bit during inner soft-decision decoding and is thus LDPC specific. At a fixed data rate, the relative performance gain  $\eta$  (dB) is the difference between the  $\Delta CCC$  values of MLC and BICM scheme and can be calculated as:

$$\eta (dB) = \Delta CCC_{MLC}(dB) - \Delta CCC_{BICM}(dB).$$
(10)

In the simulations, we assume that the receiver has the flexibility to switch the inner soft-decision decoders on and off, which results in power savings which are linear in the number of decoders that are turned off. For example, for a fixed inner code blocklength and  $T_{red}$  of 50%, the receiver will require 50% fewer inner decoders or it will turn off 50% of the inner decoders and hence results in power savings.

# **IV. SIMULATION RESULTS**

In the simulations, we use the following parameters for the outer code:  $R_{sc} = 239/255$ , W = 5,  $I_{sc} = 14$  and BER threshold of  $P_{sc} = 5 \times 10^{-3}$ . These parameters follow the 400ZR staircase code as presented in Sec. II-C. All the simulation results are evaluated by simulating J = 100staircase blocks, i.e  $100 \times 512^2$  bits. We used real-valued LLRs in all the simulations. For the polar code, a CRC of length 6 [44] is chosen for  $N_{pc}$  of 128, and 1024 and a CRC of length 24 [34] is chosen for  $N_{pc}$  of 8192. As discussed in Sec. II-B, for each of the inner codeword lengths of 128, 1024, and 8192 the design SNR is optimized for an average  $\Delta CCC$  performance of all considered rates. A list size of 32 is used in the simulations except for the Sec. IV-D, where we compare the CA-SCL decoders of list sizes 8 and 32.

The BER threshold  $P_{sc}$  is used to estimate the performance of both MLC- and BICM-based CM schemes, as depicted in Fig. 1. The proposed CPSC scheme uses a staircase code as outer code having an error floor around  $4 \times 10^{-21}$  [31]. Due to the limited computational resources, all the simulation results were obtained at the SNR at which the output BER is  $1 \times 10^{-6}$ .

The staircase decoding is done for completeness and as a sanity check and is not a part of estimating  $\Delta CCC$ . Irrespective of the CM (BICM or MLC) scheme, the latency of our proposed CFEC scheme will be dominated by the length of outer de-interleaver  $\pi_1^{-1}$ , which is  $J \times D^2$ . Choosing a shorter interleaver, e.g. interleaving one block at a time as in [19], the latency will be dominated by the window size of the outer staircase code. Without a loss of generality, we assume that all the  $\hat{N}_{pc}$  blocks can be decoded in parallel, and the complexity reduction in the number of inner decoders is thus equivalent to the number of blocks per staircase codeword.

# A. BICM-based rate flexible concatenated codes

In this section, we evaluate the BICM-based CPSC scheme with a fixed and flexible rate inner polar code. In Fig. 2, example results are presented for fixed and flexible rate inner polar codes of various lengths of  $N_{pc} = 128$ , 1024, and 8192 and 16-QAM modulation. For both kind of inner polar codes, the  $\Delta CCC$  performance improves with increasing code length. However, the complexity of the decoder also increases by increasing code length. Thus, there exists a trade-off between performance improvement and complexity.

The penalty for using a rate flexible polar code w.r.t. the onthe-fly calculated optimum code for the simulated data rates in Fig. 2 is in most cases less than 0.03 dB (exception is the  $N_{pc} = 8192$ , data rate = 2.88 (bits/QAM symbol) case with a 0.14 dB penalty). As in [21], all the results of Fig. 2 are evaluated by using CRC of length  $l_{crc} = 6$ . It can be seen from the solid lines in Fig. 2 that at data rate = 3.48 (bits/QAM symbol), the rate flexible CPSC scheme with  $N_{pc}$ of 128, 1024, and 8192 gives performance gains of 0.13, 0.32, and 0.34 dB, respectively, compared to the 400ZR FEC [2].



Fig. 2. The gap to constellation constrained capacity ( $\Delta CCC$ ) performance of various inner polar code blocklengths ( $N_{pc}$ ) of the BICM-based CPSC scheme with an outer code and 16-QAM modulation at different data rates.

#### B. Design Example of the Proposed MLC Scheme

As a proof of concept, we present an example of our proposed complexity-reduced MLC scheme with a single FEC code designed at data rates in the range of 2.87 to 3.57 (bits/QAM symbol) and in the range of 4.18 to 5.36 (bits/QAM symbol) for 16- and 64-QAM modulation, respectively. The inner code length of 1024 is chosen in the example.

The rule for tuning at the DEMUX  $(L_0, K_{pc})$ : In practical CM systems, it is usually essential to find a balanced operating point in terms of both error-correcting performance (or  $\Delta CCCC$  in our case) and complexity. Hence, for a good balance, we select only those  $L_0$  and  $K_{pc}$  values where positive relative complexity-reduction is achieved and  $\Delta CCCC$  performance is similar or better compared to that of the pragmatic sub-optimal BICM approach.

The rate flexible BICM scheme  $(L_0 = 0)$  with 16- and 64-QAM modulation is simulated for various values of  $K_{pc}$  in the range of 783 to 976 bits and in the range of 760 to 976 bits, which corresponds to data rates in the range of 2.87 to 3.57 (bits/QAM symbol) and in the range of 4.18 to 5.36 (bits/QAM symbol), for the two modulation formats, respectively. The total number of required inner decoders  $T_{BICM}$  for the BICM scheme with 16- and 64-QAM modulation are found to be (5) in the range of 33480 to 26860 and in the range of 34493 to 26860. The data rate of a rate flexible BICM scheme is only dependent on  $K_{pc}$  (6), when all the other parameters such as  $N_{pc}$ ,  $l_{crc}$ ,  $R_{sc}$ , and the constellation size are constant. Hence, by increasing  $K_{pc}$ , the data rate of the CM system increases too. It can be observed from (5) that the number of inner decoders,  $T_{BICM}$ , is inversely proportional to  $K_{pc}$ . Lowering the  $T_{BICM}$  effectively lowers the required processing power of a chip at a given receiver throughput and under the assumption that the receiver can switch the inner decoders on and off. For 16- and 64-QAM modulation, the observations from (5) and



Fig. 3. The total number of inner polar decoders required to decode 100 staircase blocks received from the AWGN channel with 16- and 64-QAM modulation. As opposed to the MLC lines, the BICM lines are obtained by changing the number of  $K_{pc}$  inner information bits in the range of 783 to 976 bits and in the range of 760 to 976 bits for 16- and 64-QAM modulation, respectively.

(6) can be graphically verified by the circled solid and dashed lines of Fig. 3, respectively.

To see the effect of  $L_0$  bypassing bits, we simulated the MLC scheme with 16- and 64-QAM for four different values of  $K_{pc}$  (840, 880, 900, and 920 bits). For a fixed  $K_{pc}$ , the data rate (8) can be changed by changing the value of  $L_0$ . Hence, for a fixed  $K_{pc}$ , we can gradually select/deselect the value of  $L_0$  according to the rule for tuning at the DEMUX  $(L_0, K_{pc})$ . For example, for  $K_{pc}$  of 840 bits,  $L_0$  of zero, and 16-QAM modulation, the  $R_{MLC} = R_{BICM}$  and  $T_{MLC} = T_{BICM}$  are equal to 3.08 (bits/QAM symbol) and 31208 inner decoders, respectively. We observe that as we increase the value of  $L_0$  to 2, 64, 128, and 256, the data rate of the MLC scheme increases to 3.08, 3.12, 3.15, and 3.21 (bits/QAM symbol), respectively, and the number of inner decoders required to decode J = 100staircase blocks go down to 31134, 28999, 27081, and 23919, respectively. The relation between  $R_{MLC}$  and  $T_{MLC}$  for  $K_{pc}$ = 840 bits and 16-QAM modulation is shown by a squared solid line in Fig. 3.

Similarly, the values of  $L_0$  bypassing bits for the  $K_{pc}$  of 880, 900, and 920 bits and 16-QAM modulation are swept in the range [2; 512], [2; 1200], and [2; 1200], respectively. Whereas, for 64-QAM modulation, the values of  $L_0$  for the  $K_{pc}$  of 840, 880, 900, and 920 bits are swept in the range [2; 640], [2; 1200], [2; 1560], and [2; 1120], respectively. The reduction in the number of required polar decoders for 16- and 64-QAM modulation and for various values of  $K_{pc}$  with the corresponding  $L_0$  values can be seen by the solid and dashed lines in Fig. 3, respectively.

Furthermore, for  $K_{pc}$  of 840 bits, and 16-QAM modulation, it can be seen from the squared solid line in Fig. 4 that as we increase the value of  $L_0$ , the required average output



7

Fig. 4. The required average output BER of the inner polar decoders for 16and 64-QAM modulation over the AWGN channel.

BER of the inner polar decoders to achieve a fixed BER of  $P_{sc}$  decreases. This is because of the increase in the  $L_0$ bits assigned to the high-reliability bit-levels. As we increase the value of  $L_0$ , the portion of  $L_0$  bits in error increases, and hence to achieve the fixed BER threshold of  $P_{sc}$ , we compensate it by decreasing the average BER of the inner polar decoders. Similarly, for the MLC scheme with 16- and 64-QAM modulation, the average output BER of the inner polar decoders required to achieve  $P_{sc}$  of  $5 \times 10^{-3}$  for various values of  $K_{pc}$  are shown by the solid and dashed lines in Fig. 4, respectively.

The  $\triangle CCC$  performance of the BICM scheme with 16- and 64-QAM modulation can be seen by the circled solid lines in Figs. 5 and 6, respectively. The  $\Delta CCC$  performance and the relative complexity-reduction  $T_{red}$  for 16-QAM modulation,  $K_{pc}$  of 840 bits, and various values of  $L_0$  are shown by squared solid and dashed lines in Fig. 5, respectively. The  $\Delta CCC$  value is calculated by using (1) and the relative complexity-reduction of the MLC scheme w.r.t BICM is calculated by using (9). Similarly, for the MLC scheme with 16- and 64-QAM modulation, the  $\Delta CCC$  performance and the relative complexity-reduction  $T_{red}$  for  $K_{pc}$  of 840, 880, 900, and 920 bits and various values of  $L_0$  are shown by the solid and dashed lines in Fig. 5 and Fig. 6, respectively. Table I summarizes the details of relative performance gain  $\eta$  (10) and  $T_{red}$  complexity-savings for 16- and 64-QAM modulation that can be achieved for four different values of  $K_{pc}$ . It can be seen from Table I that for  $R_{MLC}$  = 5.35 (bits/QAM symbol), the proposed MLC scheme offers rate flexibility with two degrees of freedom with different performance-complexity trade-off.

It is evident from Figs. 5 and 6 that the  $\triangle CCC$  performance of the MLC scheme improves up to a certain value of  $L_0$  and after that, it starts degrading exponentially. The performance improvement of the MLC is because of the low-rate inner polar codes (more protection for the  $K_{pc}$ ) and the number of



Fig. 5. The gap to constellation constrained capacity ( $\Delta CCC$ ) performance (dB) and the relative complexity-reduction ( $T_{red}$ ) with  $N_{pc}$  of 1024, and at different data rates over the AWGN channel with 16-QAM modulation.



Fig. 6. The gap to constellation constrained capacity ( $\Delta CCC$ ) performance (dB) and the relative complexity-reduction ( $T_{red}$ ) with  $N_{pc}$  of 1024, and at different data rates over the AWGN channel with 64-QAM modulation.

bypassing bits are assigned to the high-reliability bit-levels. After a certain value of  $L_0$ , the  $\Delta CCC$  performance starts degrading because as we increase the bypassing bits, the portion of  $L_0$  bits in error increases both because of no protection and also because after filling the highest reliability bit-levels, the remaining bypassing bits are assigned to the second-highest reliability bit-levels. Additionally, the portion of inner-coded bits in error also increases due to shifting them further towards the lower reliability bit-levels. For example, for data rates above 3.2 and 4.95 for the 16- and 64-QAM, respectively, the  $\Delta CCC$  performance degrades rapidly for the lower values of  $K_{pc}$  (e.g.  $K_{pc} = 840$  bits) which can be seen

TABLE I Summary of the simulated results for  $N_{PC} = 1024$  and with 16-and 64-QAM over the AWGN channel.

8

|           | Tuning<br>Parameters           | data rate<br>(bits/symbol) | $T_{red}(\%)$ | $\eta$ (dB) |
|-----------|--------------------------------|----------------------------|---------------|-------------|
| 16<br>QAM | $K_{PC} = 840$<br>$L_0 = 128$  | 3.15                       | 11.1          | 0.015       |
|           | $K_{PC} = 880$<br>$L_0 = 384$  | 3.36                       | 27.3          | 0.02        |
|           | $K_{PC} = 900$<br>$L_0 = 1200$ | 3.54                       | 53.9          | $\sim 0$    |
|           | $K_{PC} = 920$<br>$L_0 = 1200$ | 3.57                       | 53.9          | 0.17        |
| 64<br>QAM | $K_{PC} = 840$<br>$L_0 = 512$  | 4.95                       | 33.3          | 0.06        |
|           | $K_{PC} = 880$<br>$L_0 = 2000$ | 5.35                       | 66.1          | 0.06        |
|           | $K_{PC} = 900$<br>$L_0 = 1560$ |                            | 60.4          | 0.20        |
|           | $K_{PC} = 920$<br>$L_0 = 1120$ |                            | 52.2          | 0.29        |

from the squared solid lines as compared to the asterisk solid lines of  $K_{pc}$  = 920 bits in Figs. 5 and 6. This observation suggests that compared to the performance of BICM over the range of data rates, the proposed MLC offers noticeable performance and complexity benefits if the inner code rate  $R_{pc}$  is quite high (> 0.8). It is also noticed that as we go from lower (16-QAM) to higher order (64-QAM) modulation format with relatively constant complexity-savings, the relative performance gain  $\eta$  increases.

# C. The effect of the inner code blocklengths

For various lengths of the inner code, the performance, and relative complexity-reduction of the BICM (dashed lines) and MLC scheme (solid lines) with 16- and 64-QAM modulation are shown in Fig. 7a and Fig. 7b, respectively.

For 16- and 64-QAM modulation and  $N_{pc}$  of 1024, the parameters for the BICM and MLC scheme are the same as in Sec. IV-B. Whereas, for  $N_{pc}$  of 128 and modulation order of 16, and 64-QAM, we simulated the BICM scheme for  $R_{BICM}$ in the range of 2.85 to 3.58 (bits/QAM symbol) and in the range of 4.28 to 5.37 (bits/QAM symbol), respectively. For the inner code blocklength of 8192, we simulated the BICM scheme for  $R_{BICM}$  in the range of 2.87 to 3.57 (bits/QAM symbol) and in the range of 4.31 to 5.36 (bits/QAM symbol) for 16, and 64-QAM, respectively.

System Design rule: As shown in Table I, for 64-QAM modulation and data rate 5.35 (bits/QAM symbol), we can choose different combinations of the two tuning parameters,  $K_{pc}$  and  $L_0$ , with different relative performance and complexity benefits. At each data rate in Figs. 7a and 7b, we select a combination of  $K_{pc}$  and  $L_0$  values for which the  $\Delta CCC$  value of the MLC scheme is minimum. For example, for  $N_{pc}$  = 1024, and 16-QAM modulation, we take the envelope or Pareto front [19] of the MLC performance curves of Fig. 5.

In Fig. 7a, the  $\Delta CCC$  performance and relative complexityreduction can be seen for the MLC scheme with 16-QAM modulation and  $N_{pc}$  of 128, 1024, and 8192, which corresponds to the  $R_{MLC}$  in the range of 3.20 to 3.57, 3.08 to 3.57, and in the range of 3.12 to 3.57 (bits/QAM symbol),



Fig. 7. The optimized simulation results for the gap to constellation constrained capacity ( $\Delta CCC$ ) performance (dB) and relative complexity-reduction ( $T_{red}$ ) with the different inner code-lengths  $N_{pc}$ , and at different data rates over the AWGN channel.

respectively. Similarly, for 64-QAM modulation, and various lengths of  $N_{pc}$  (128, 1024, and 8192), the corresponding  $R_{MLC}$  of Fig. 7b is in the range of 4.80 to 5.35, 4.62 to 5.36, and in the range of 4.67 to 5.36 (bits/QAM symbol), respectively. As shown in Fig. 7a, for 16-QAM modulation, and at a data rate of 3.57 (bits/QAM symbol), our proposed MLC scheme with  $N_{pc}$  of 128, 1024, and 8192 can achieve relative performance gains of 0.03 dB, 0.17 dB, and 0.41 dB and the relative complexity-reduction of 57.5%, 53.9%, and 53.7%, respectively. Likewise, for 64-QAM modulation and

data rate 5.35 (bits/QAM symbol), the MLC scheme offers  $\eta$  of 0.09 dB, 0.29 dB, and 0.55 dB, respectively. Additionally, it requires 57.2%, 52.2%, and 53.7% fewer inner decoders compared to the BICM, as shown in Fig. 7b.

9

#### D. The effect of the inner decoder list size

To observe the performance penalty of using inner CA-SCL decoders of lower complexity (smaller list size), we simulated and compared the results of Sec. IV-C for 64-QAM modulation. For the MLC scheme with  $N_{pc}$  of 128, 1024, and 8192, the maximum performance penalty of using the CA-SCL decoder (of lower complexity) with list size 8 is less than 0.25 dB, 0.08 dB, and 0.09 dB, respectively, when compared with the MLC curves of Fig. 7b with list size 32. Similarly, for the BICM scheme with  $N_{pc}$  of 128, 1024, and 8192, the maximum performance penalty of switching from higher (list size = 32) to lower (list size = 8) complexity list decoder is less than 0.24 dB, 0.08 dB, and 0.12 dB, respectively.

#### E. Comparison to other CFEC schemes

To compare our proposed CM scheme with the current standard [2], we simulated the 400ZR FEC over the AWGN channel with 16- and 64-QAM modulation, which corresponds to data rates of 3.49 and 5.23 (bits/QAM symbol), respectively. Compared to 400ZR FEC, the rate flexible BICM scheme with the  $N_{pc}$  of 128, 1024, and 8192 offers performance gains of 0.13 dB, 0.32 dB, and 0.31 dB for 16-QAM and gains of 0.1 dB, 0.27 dB, and 0.26 dB for 64-QAM, as shown in Figs. 7a and 7b, respectively.

It can be seen from Fig. 7a, that for 16-QAM modulation, the proposed MLC scheme with  $N_{pc}$  of 128, 1024, and 8192 offers performance gains of up to 0.15 dB, 0.46 dB, and 0.55 dB, respectively, when compared with the 400ZR FEC. Compared to the BICM, the proposed MLC scheme requires 38%, 31%, and 36% fewer inner polar, respectively, as shown in Fig. 7a. Similarly, for 64-QAM modulation, compared to 400ZR FEC, the proposed MLC scheme with  $N_{pc}$  of 128, 1024, and 8192 provides gains of 0.15 dB, 0.45 dB, and 0.58 dB, respectively, and simultaneously reduces the relative complexity by 36%, 42%, and 43%, as shown in Fig. 7b. For  $N_{pc}$  = 8192, 64-QAM modulation, and with the same relative complexity savings, there is a penalty of 0.06 dB when we switch from higher (list size = 32) to lower (list size = 8) complexity CA-SCL decoder.

The proposed MLC-based CPSC scheme utilizes the same outer code structure as in the 400ZR FEC algorithm [22], and an inner polar code and can be adopted in future 800ZR+ FEC applications, without changing the frame format and other specifications.

The inner polar codes provide a good combination of seamless rate flexibility and a good performance at highrates, but the proposed MLC scheme is not restricted to the inner code being a polar code. For example, for 16-QAM modulation, the data rate of the BICM based 400ZR FEC is 3.49 (bits/QAM symbol). As an add-on, rate flexibility can be achieved by varying the number of  $L_0$  bypassing bits and the data rate of this MLC-based Hamming-staircase code will IEEE TRANSACTIONS ON COMMUNICATIONS



Fig. 8. The gap to constellation constrained capacity ( $\Delta CCC$ ) performance (dB) and a relative complexity-reduction ( $T_{red}$ ) with different concatenated codes, and at different data rates over the AWGN channel with 16-QAM.

become  $(239/255) \times ((119 + L_0)/(128 + L_0)) \times \log_2(M)$ , but limited to a single degree of rate flexibility. We also simulated the Hamming-staircase code of 400ZR FEC [2] with our proposed MLC scheme over the AWGN channel and 16-QAM modulation. The values of  $L_0$  are swept in the range [4;1024]. The  $\Delta CCC$  performance is shown by a solid line in Fig. 8. This would be a straight-forward add-on to the 400ZR FEC. For BICM- and MLC-based 400ZR FEC, the total number of inner decoders required to decode J staircase blocks can be calculated by using (5) and (7), respectively. The relative complexity-reduction is not considered here because the Hamming code rate is fixed, and for different values of  $L_0$ , the system operates at a different rate.

The proposed MLC scheme can also be considered with other SDD based rate flexible codes as inner code, potentially leading to other efficient codes with two degrees of freedom in rate flexibility. For example, as shown by dashed and solid lines in Fig. 8, we simulated the BICM- and MLC-based LDPC-staircase code for data rates in the range of 2.80 to 3.37 (bits/QAM symbol) and in the range of 2.82 to 3.56 (bits/QAM symbol), respectively. The outer code parameters for BICM- and MLC-based LDPC-staircase codes are the same as mentioned above. The code length of the inner LDPC code is 64800 bits and thus much longer than that of the polar codes. The number of iterations for the belief propagation based LDPC decoder is 32 [3]. In Fig. 8, the dashed line of BICM-based rate flexible LDPC-staircase code is achieved by simulating a set of LDPC codes of different rates (3/4, 4/5, 5/6, 8/9, and 9/10) [3]. Similar to our proposed MLC scheme of Fig. 1, we considered SDD based rate flexible LDPC code as an inner code and simulated for five different inner code rates. At each data rate, we select a combination of inner code rate (number of inner information bits, in our case) and  $L_0$  values for which the  $\triangle CCC$  value of the MLC-based LDPC-staircase code is minimum. The performance and relative complexitysavings of the MLC scheme with the inner LDPC code and 16-QAM modulation are shown by solid lines in Fig. 8. Albeit the better performance of the MLC-based LPDC-staircase code w.r.t. the proposed MLC with the inner polar code, which is due to the higher complexity of the inner decoder, the relative complexity savings are similar in both cases.

Our proposed CM scheme utilizes a single CFEC code for various rates, inner-code blocklengths, modulation formats, and for switching between (MLC and BICM) CM schemes. Contrary to this, in [27]–[29], CFEC codes were optimized for each data rate, modulation format, and CM scheme. For example, in [29], to get the performance improvement or complexity savings, the inner codes are optimized for each data rate, modulation format, and CM scheme. Contrary to this, we utilize a single set of ordered reliabilities defining the frozen-set for different data rates for fixed modulation formats and CM schemes.

Although a detailed complexity comparison of the proposed MLC scheme with the current standard [2], and other MLC schemes of [27]-[29] is beyond the scope of this work, we compare the performance of our proposed MLC scheme with others in Fig. 8. For 16-QAM modulation,  $N_{pc}$  = 8192 bits, list size = 32, and at data rates 3.12, 3.28, and 3.43 (bits/QAM symbol), our proposed MLC scheme exhibits performance gains of around 0.51 dB, 0.65 dB, and 0.76 dB, respectively, when compared with the MLC scheme of [27]. For 16-QAM modulation and data rate of 3.2 (bits/QAM symbol), the performance results of MLC schemes of [28], [29] are shown in Fig. 8, where the  $\Delta CCC$  performance improves with the increase in the decoder's complexity. Whereas, our proposed MLC-based polar- and LDPC-staircase codes of Fig. 8 are designed to offer notable (performance + complexity) gains at data rates above 3.2 (bits/QAM symbol), when compared with the BICM-based polar- and LDPC-staircase codes of Fig. 8.

#### V. CONCLUSION

In this paper, an MLC-based concatenated polar-staircase coded modulation scheme with two degrees of freedom in rate flexibility is proposed. It is compared with rate flexible polar coded BICM, from a performance-complexity standpoint. To achieve a multilevel coded modulation, the inner code structure is re-designed in such a way that some of the bits bypass the inner polar code and are assigned to the high-reliability bitlevels of the higher-order modulation formats.

The proposed MLC scheme with different polar code blocklengths, inner decoder list sizes, and M-QAM modulation formats can be used as a flexible FEC solution for different types of flexible optical networks (data center interconnects, edge, metro, long-haul, and subsea) with different speed, latency, and power requirements. We also showed that the proposed MLC scheme is applicable to other fixed- or flexiblerate inner codes.

The proposed MLC scheme simultaneously provides up to 0.55 dB of performance gain and 53.7% in relative complexity reduction compared to the BICM (Fig. 7b). The proposed

scheme allows two degrees of freedom to choose any operating point without changing the channel code and the underlying modulation format and it is achieved by tuning the number of inner code information bits and the bypassing bits. We showed that the MLC scheme integrated with an outer staircase code and inner polar code performs up to 0.58 dB better than the 400ZR FEC and at the same time reduces the relative complexity by 43% compared to the polar coded BICM (Fig. 7b).

# ACKNOWLEDGMENT

The authors would like to thank Zeuxion ApS for discussion and providing the code for staircase coding simulations and Jakob D. Andersen for providing the Hamming Chase decoder.

#### REFERENCES

- [1] L. Paraschis and K. Raj, "Innovations in DCI transport networks," in *Optical Fiber Telecommunications VII*, pp. 673–718, Elsevier, 2020.
- [2] "Implementation agreement 400ZR," in Optical Internetworking Forum 0.10-Draft, 2018.
- [3] A. e. Morello, "DVB-S2: The second generation standard for satellite broad-band services," *Proceedings of the IEEE*, vol. 94, no. 1, pp. 210– 227, 2006.
- [4] E. Arikan, "Channel polarization: A method for constructing capacityachieving codes for symmetric binary-input memoryless channels," *IEEE Transactions on Information Theory*, vol. 55, no. 7, pp. 3051–3073, 2009.
- [5] B. S. G. Pillai, B. Sedighi, K. Guan, N. P. Anthapadmanabhan, W. Shieh, K. J. Hinton, and R. S. Tucker, "End-to-end energy modeling and analysis of long-haul coherent transmission systems," *Journal of Lightwave Technology*, vol. 32, no. 18, pp. 3093–3111, 2014.
- [6] G. Lechner, T. Pedersen, and G. Kramer, "Analysis and design of binary message passing decoders," *IEEE Transactions on Communications*, vol. 60, no. 3, pp. 601–607, 2011.
- [7] F. Steiner, E. B. Yacoub, B. Matuz, G. Liva, and A. G. i Amat, "One and two bit message passing for SC-LDPC codes with higherorder modulation," *Journal of Lightwave Technology*, vol. 37, no. 23, pp. 5914–5925, 2019.
- [8] A. Sheikh, A. G. i Amat, and G. Liva, "Binary message passing decoding of product-like codes," *IEEE Transactions on Communications*, 2019.
- [9] G. Liga, A. Sheikh, and A. Alvarado, "A novel soft-aided bit-marking decoder for product codes," *arXiv preprint arXiv:1906.09792*, 2019.
- [10] E. Abbe and A. Barron, "Polar coding schemes for the AWGN channel," in 2011 IEEE International Symposium on Information Theory Proceedings, pp. 194–198, IEEE, 2011.
- [11] I. Tal and A. Vardy, "List decoding of polar codes," *IEEE Transactions on Information Theory*, vol. 61, no. 5, pp. 2213–2226, 2015.
- [12] D. Wu, A. Liu, Y. Zhang, and Q. Zhang, "Parallel concatenated systematic polar codes," *Electronics Letters*, vol. 52, no. 1, pp. 43–45, 2015.
- [13] T. Koike-Akino, C. Cao, Y. Wang, K. Kojima, D. S. Millar, and K. Parsons, "Irregular polar turbo product coding for high-throughput optical interface," in 2018 Optical Fiber Communications Conference and Exposition (OFC), pp. 1–3, IEEE, 2018.
- [14] C. Condo, V. Bioglio, H. Hafermann, and I. Land, "Practical product code construction of polar codes," *IEEE Transactions on Signal Processing*, vol. 68, pp. 2004–2014, 2020.
- [15] B. Feng, J. Jiao, L. Zhou, S. Wu, B. Cao, and Q. Zhang, "A novel high-rate polar-staircase coding scheme," in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), pp. 1–5, IEEE, 2018.
- [16] L. Zhou, B. Feng, J. Jiao, K. Liang, S. Wu, and Q. Zhang, "Performance analysis of soft decoding algorithms for polar-staircase coding scheme," in 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–6, IEEE, 2018.
- [17] C. Condo, V. Bioglio, and I. Land, "Staircase construction with nonsystematic polar codes," in *Optical Fiber Communication Conference*, pp. Th1G–6, Optical Society of America, 2020.
- [18] L. M. Zhang and F. R. Kschischang, "Low-complexity soft-decision concatenated LDGM-staircase FEC for high-bit-rate fiber-optic communication," *Journal of Lightwave Technology*, vol. 35, no. 18, pp. 3991– 3999, 2017.

- [19] M. Barakatain and F. R. Kschischang, "Low-complexity concatenated LDPC-staircase codes," *Journal of Lightwave Technology*, vol. 36, no. 12, pp. 2443–2449, 2018.
- [20] Y. Cai, W. Wang, W. Qian, J. Xing, K. Tao, J. Yin, S. Zhang, M. Lei, E. Sun, H.-C. Chien, *et al.*, "FPGA investigation on error-flare performance of a concatenated staircase and Hamming FEC code for 400G inter-data center interconnect," *Journal of Lightwave Technology*, vol. 37, no. 1, pp. 188–195, 2019.
- [21] T. Mehmood, M. P. Yankov, A. Fisker, K. Gormsen, and S. Forchhammer, "Rate-adaptive concatenated polar-staircase codes for data center interconnects," in *Optical Fiber Communication Conference*, pp. Th11– 6, Optical Society of America, 2020.
- [22] B. Smith, I. Lyubomirsky, and S. Bhoja, "Leveraging 400G ZR FEC technology," in *IEEE 802.3 Beyond 10 km Optical PHYs Study Group*, 2017.
- [23] L. Schmalen, L. M. Zhang, and U. Gebhard, "Distributed rate-adaptive staircase codes for connectionless optical metro networks," in 2017 Optical Fiber Communications Conference and Exhibition (OFC), pp. 1– 3, IEEE, 2017.
- [24] A. Napoli, B. Spinnler, R. Clemens, M. Mocker, and B. Sommerkorn-Krombholz, "800G: Reach, symbol rate and frame format considerations," in *Optical Internetworking Forum 2019.210.01 draft*, 2019.
- [25] U. Wachsmann, R. F. Fischer, and J. B. Huber, "Multilevel codes: Theoretical concepts and practical design rules," *IEEE Transactions on Information Theory*, vol. 45, no. 5, pp. 1361–1391, 1999.
- [26] H. Imai and S. Hirakawa, "A new multilevel coding method using errorcorrecting codes," *IEEE Transactions on Information Theory*, vol. 23, no. 3, pp. 371–377, 1977.
- [27] A. Bisplinghoff, S. Langenbach, and T. Kupfer, "Low-power, phaseslip tolerant, multilevel coding for M-QAM," *Journal of Lightwave Technology*, vol. 35, no. 4, pp. 1006–1014, 2016.
- [28] Y. Koganei, T. Oyama, K. Sugitani, H. Nakashima, and T. Hoshida, "Multilevel coding with spatially coupled repeat-accumulate codes for high-order QAM optical transmission," *Journal of Lightwave Technol*ogy, vol. 37, no. 2, pp. 486–492, 2019.
- [29] M. Barakatain, D. Lentner, G. Böecherer, and F. R. Kschischang, "Performance-complexity tradeoffs of concatenated FEC for higherorder modulation," *Journal of Lightwave Technology*, accepted for 2020.
- [30] G. Caire, G. Taricco, and E. Biglieri, "Bit-interleaved coded modulation," *IEEE Transactions on Information Theory*, vol. 44, no. 3, pp. 927– 946, 1998.
- [31] B. P. Smith, A. Farhood, A. Hunt, F. R. Kschischang, and J. Lodge, "Staircase codes: FEC for 100 Gb/s OTN," *Journal of Lightwave Technology*, vol. 30, no. 1, pp. 110–117, 2011.
- [32] "OTU4 long-reach interface," in *document ITU-T Rec. G.709.2/Y.1331.2*, *ITU*, July 2018.
- [33] N. Stolte, Recursive codes with the Plotkin-Construction and their Decoding. PhD thesis, Ph. D. dissertation, University of Technology Darmstadt, Germany, 2002.
- [34] V. Bioglio, C. Condo, and I. Land, "Design of polar codes in 5G new radio," *IEEE Communications Surveys & Tutorials*, 2020.
- [35] H. Vangala, E. Viterbo, and Y. Hong, "A comparative study of polar code constructions for the AWGN channel," *arXiv preprint arXiv:1501.02473*, 2015.
- [36] D. Chase, "Class of algorithms for decoding block codes with channel measurement information," *IEEE Transactions on Information theory*, vol. 18, no. 1, pp. 170–182, 1972.
- [37] C. Zhang, S. Forchhammer, M. P. Yankov, T. Mehmood, and K. J. Larsen, "Fast SD-Hamming decoding in FPGA for high-speed concatenated FEC for optical communication," in 2020 IEEE Global Communications Conference (GLOBECOM), pp. 1–6, IEEE, 2020.
- [38] G. Ungerboeck, "Channel coding with multilevel/phase signals," *IEEE Transactions on Information Theory*, vol. 28, no. 1, pp. 55–67, 1982.
- [39] E. Zehavi, "8-PSK trellis codes for a Rayleigh channel," *IEEE Transactions on Communications*, vol. 40, no. 5, pp. 873–884, 1992.
- [40] L. Szczecinski and A. Alvarado, Bit-interleaved coded modulation: fundamentals, analysis and design. John Wiley & Sons, 2015.
- [41] L. Beygi, E. Agrell, P. Johannisson, and M. Karlsson, "A novel multilevel coded modulation scheme for fiber optical channel with nonlinear phase noise," in 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, pp. 1–6, IEEE, 2010.
- [42] F. Yu, D. Chang, N. Stojanovic, C. Xie, M. Li, L. Jin, Z. Xiao, X. Shi, and L. Li, "Hybrid soft/hard decision multilevel coded modulation for beyond 100Gbps optical transmission," in *39th European Conference* and Exhibition on Optical Communication (ECOC 2013), pp. 1–3, IET, 2013.

- [43] S. Ten Brink, J. Speidel, and R.-H. Yan, "Iterative demapping and decoding for multilevel modulation," in *IEEE GLOBECOM 1998 (Cat. NO. 98CH36250)*, vol. 1, pp. 579–584, IEEE, 1998.
- [44] P. Koopman and T. Chakravarty, "Cyclic redundancy code (CRC) polynomial selection for embedded networks," in *International Conference* on Dependable Systems and Networks, 2004, pp. 145–154, IEEE, 2004.



Søren Forchhammer (M'04) received the M.S. degree in engineering and the Ph.D. degree from the Technical University of Denmark, Lyngby, in 1984 and 1988, respectively. Currently, he is a Professor with DTU Fotonik, Technical University of Denmark, where he has been since 1988. He is Head of the Coding and Visual Communication Technology Group at DTU Fotonik. His main interests include, signal and image processing for communication, source coding, information theory, signal processing and coding for optical communication, distributed

12

coding, visual communication technology.



Tayyab Mehmood (S'13) was born in Muzaffargarh, Pakistan. He received a B.Sc. degree (with high distinction) from Islamia University Bahawalpur (IUB), Pakistan in 2014, in the field of telecommunication engineering, and a M.Sc. degree from National University of Science & Technology (SEECS, NUST) Islamabad, Pakistan in 2016, in the area of optical signal processing for radio over fiber systems. He is currently working toward the Ph.D. degree with the Department of Photonics Engineering, DTU. He is supervised by Prof. Søren Forchhammer.

Prof. Gerhard Kramer and Dr. Metodi P. Yankov. His current research interests include coding, modulation, and optical communication systems. He held a research associate position at the Department of Optical Engineering, Sejong University, Seoul (2017-2018) and a lecturer position at the GIK Institute of Engineering Sciences & Technology, Pakistan (2018).



**Metodi Plamenov Yankov** (M'13) received a B. Eng. degree from the Technical University of Sofia, Bulgaria in 2010 in the field of radio communications, a M. Sc. degree from the Technical University of Denmark (DTU), Lyngby in 2012 in the area of signals and transmission technology for telecommunications and a PhD degree from the Technical University of Denmark on the topic of constellation constrained capacity estimation near capacity achieving digital methods. He held a postdoc position at the DNRF Centre of Excellence

SPOC (2015-2017) and an industrial post-doc position at Fingerprint Cards A/S, Denmark (2017-2019). Since, he has been employed at DTU Fotonik, the Coding and Visual Communications group as a researcher. His research interests include among others information and communication theory of digital transceivers, digital signal processing techniques in general and for wireless and optical fiber communications in particular, machine learning, forward error correction codes, and information theory of biometrics. He is currently a member of the IEEE Photonics society.



Shajeel Iqbal received the B.E. degree in electronics from the National University of Sciences and Technology (NUST), Islamabad, Pakistan, in 2012, and the M.E. degree in information and communication engineering from Chosun University, Gwangju, South Korea, in 2016. He received the Ph.D. degree with the Department of Photonics Engineering (DTU Fotonik), Technical University of Denmark (DTU), Kongens Lyngby, Denmark, in 2019. He is a Postdoctoral with DTU and an Industrial Postdoctoral with Comcores, ApS, Lyngby, Denmark. His

research interests include channel coding, information theory, communications theory, and FPGA design for wireless and optical communication systems.