# **DNN-aided Read-voltage Threshold Optimization** for MLC Flash Memory with Finite Block Length

Cheng Wang, Kang Wei, Lingjun Kong, Long Shi, Zhen Mei, Jun Li, and Kui Cai

Abstract—The error correcting performance of multi-levelcell (MLC) NAND flash memory is closely related to the block length of error correcting codes (ECCs) and log-likelihood-ratios (LLRs) of the read-voltage thresholds. Driven by this issue, this paper optimizes the read-voltage thresholds for MLC flash memory to improve the decoding performance of ECCs with finite block length. First, through the analysis of channel coding rate (CCR) and decoding error probability under finite block length, we formulate the optimization problem of read-voltage thresholds to minimize the maximum decoding error probability. Second, we develop a cross iterative search (CIS) algorithm to optimize read-voltage thresholds under the perfect knowledge of flash memory channel. However, it is challenging to analytically characterize the voltage distribution under the effect of data retention noise (DRN), since the data retention time (DRT) is hard to be recorded for flash memory in reality. To address this problem, we develop a deep neural network (DNN) aided optimization strategy to optimize the read-voltage thresholds, where a multi-layer perception (MLP) network is employed to learn the relationship between voltage distribution and readvoltage thresholds. Simulation results show that, compared with the existing schemes, the proposed DNN-aided read-voltage threshold optimization strategy with a well-designed LDPC code can not only improve the program-and-erase (PE) endurance but also reduce the read latency.

Index Terms-MLC NAND flash memory, read-voltage threshold, finite block length, LDPC codes, deep neural network.

#### I. INTRODUCTION

N AND flash memory is widely used over the past decade due to low power consumption and the ity. The original NAND flash memory cell can only store one bit with two levels, which is called single-level-cell (SLC). Using the multi-level-cell (MLC) or triple-level cell (TLC) technique [1]–[3], the flash memory can store multiple bits over a single memory cell. However, as the number of levels in each memory cell increases, serious scaling challenges loom up in the NAND flash memory, resulting in a negative effect on the reliability. These challenges originate from the characteristics of flash devices that can be seen as several noise models, such as programming noise (PN), cell-to-cell interference (CCI), random telegraph noise (RTN), and data retention noise (DRN) [4].

Among various noises in flash memory, the DRN is caused by the charge leakage at the floating-gate of flash memory cells [5]. The charge leakage starts when a flash memory cell is programmed. The overall period of this process is called the data retention time (DRT). As the size of memory chip decreases, the floating-gate of a flash memory cell stores much fewer electrons, which degrades the performance of flash memory. This is due to the fact that a small amount of charge leakage has remarkable influence on the floatinggate transistor. Compared with SLC, the MLC technology intensifies the decoding errors caused by the DRN, as the reduced interval of write voltage at each storage state distorts the voltage distribution of flash memory. As a result, the increasing number of program-and-erase (PE) cycles and the DRT limit the operational lifetime of flash memory.

1

To improve the reliability of flash memory, hard-decision error correcting codes (ECCs), such as Bose-Chaudhri-Hocquenghem (BCH) and Reed-Solomon (RS) codes were employed in flash memory [6], [7]. To enhance the decoding error performance of ECCs, [8]-[11] proposed the utilization of soft decision in flash memory. Later on, various softdecision decoding algorithms were proposed to achieve desirable error correcting performance. For example, the beliefpropagation (BP) algorithm is one of the probability-based iterative decoding algorithms with excellent performance [12]-[15]. It is well known that LDPC codes are decoded with soft information such as channel log-likelihood-ratios (LLRs). In order to achieve better error-correcting performance, the soft-decision decoder demands more reliable and accurate soft information that can be obtained by the read process [8], [9], [16]–[18]. For the flash memory channel, the problem of obtaining soft information can be turned into that of optimizing the read-voltage thresholds [16].

Driven by this observation, much effort has been put into the optimization of read-voltage thresholds [8], [9], [16], [19], [20]. The well-designed read-voltage thresholds can convert hard information (i.e., voltages of cells) into soft information (i.e., LLRs), which greatly improve the decoding performance of flash memory. Initially, flash memory employed the harddecision memory sensing that utilizes the hard information generated by the fixed read-voltage thresholds. However, the hard-decision method is only effective when the flash memory noise is small. To prolong the lifetime of flash memory, the soft read-voltage sensing strategy becomes a prevailing solution for flash memory. Prior works in [8], [16] introduced a nonuniform memory sensing strategy to reduce the memory sensing precision and read latency while maintaining good error-correction performance. These works obtain the readvoltage thresholds by utilizing entropy value of each unreliable region. Nevertheless, the optimization of read-voltage

C. Wang, K. Wei, and J. Li are with School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, P. R. China (e-mail:{cheng.wang, kang.wei, jun.li}@njust.edu.cn). L. Kong is with College of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, P. R. China (e-mail: ljkong@njupt.edu.cn) L. Shi, Z. Mei, and K. Cai are with Science and Math Cluster, Singapore University of Technology and Design, Singapore (e-mail: slong1007@gmail.com, mei\_zhen@outlook.com, cai\_kui@sutd.edu.sg). .

thresholds relies on extensive simulations and the memory sensing level is limited. To solve this dilemma, the work in [9] developed an adjustable sensing strategy for multiple reads of the same flash memory cell, which selects the wordline voltages by maximizing the mutual information (MMI) between the input and output of the equivalent discrete read channel.

However, the existing works have the following issues. First, the aforementioned threshold optimization strategies did not take into account the block length of ECCs that used in flash memory [8], [9], [16]. Notably, the block length of ECCs for emerging memories are usually short due to stringent requirements on low decoding complexity and read latency. In practice, there is an significant gap between the actual channel coding rate (CCR) and capacity of the flash memory model in [16] under finite block length [21]. Recent research has unveiled that the flash memory channel after sensing by readvoltage thresholds can be regarded as a discrete memoryless channel (DMC) [9]. Several theoretical approaches investigated the threshold optimization in DMC from the perspective of information theory [22], [23]. Following these theoretical approaches, we characterize the maximum coding rate in flash memory as a function of block length and error probability. Building upon the rate analysis, we optimize the read-voltage thresholds for flash memory.

Second, the prior works in [8], [9], [16] designed the read-voltage thresholds for flash memory assuming perfect knowledge of PE cycles and DRT. In practice, it is rather difficult to record DRT. Without the knowledge of PE cycles and DRT, the following methods were proposed to recover the soft information of flash memory channel under the effect of the DRN. A flash correct-and-refresh technique proposed in [24] read the data stored in flash memory periodically and utilized the ECCs to perform the decoding and reprogramme the flash memory. Later on, [25] developed a decision-directed estimation (DDE) algorithm to remit the DRN by utilizing a Gaussian mixture model to estimate the voltage distribution of flash memory. The DDE algorithm first compares the input and output of the decoder to find the best-fit parameters of the Gaussian model, and then utilizes the Gaussian model to adjust the read-voltage thresholds. Recently, a retention-aware belief-propagation (RABP) decoding scheme was proposed to combat the DRN in MLC flash memory [26]. If the decoding fails, the RABP algorithm adjusts the input LLRs based on the decoded bits and performs another round of decoding. Furthermore, [27] proposed a RABP aided channel update algorithm to estimate the voltage distribution of MLC flash memory. It regards voltage distribution of flash memory as Gaussian distribution and utilizes the decoding results to update the mean and variance of voltage distribution. However, the decoding processes in [24]-[27] result in either large energy consumption or high decoding latency, which contradicts with practical use of flash memory. In addition, these methods are applicable only when the DRN is within a small certain range such that the decoder can still provide sufficient correct information. In this context, these methods cannot handle the errors caused by the DRN that exceeds the correction capability of ECCs.

Recently, rapid development of deep learning inspires us to handle the variation of flash memory channel caused by the DRN. With an explosive increase in big data, the deep learning technologies, such as deep neural network (DNN), can distill the data effectively and extract abstract correlations from data [28], [29]. For the flash memory, in contrast to the existing methods that require a round of decoding to obtain the useful information, the DNN allows the system to train a model offline and explore the relationship between the input and output, and the well-trained DNN model can directly generate the information from the processed data. These findings motivate us to design a DNN-aided read-voltage optimization strategy that does not rely on the knowledge of DRT.

The primary goal of this paper is to optimize the readvoltage thresholds in MLC flash memory with finite ECC block length. Towards this goal, we first formulate the optimization problem of read-voltage thresholds under finite block length, and then propose the cross iterative searching (CIS) algorithm and DNN-aided optimization strategy to optimize the read-voltage thresholds, respectively. The main contributions of this paper are summarized as follows:

- *Read-voltage threshold optimization under finite block length*—We study the CCR of MLC flash memory under finite block length and optimize the read-voltage thresholds with perfect knowledge of PE cycles and DRT. Under finite block length, we first formulate the read-voltage optimization problem to maximize the CCR by minimizing the maximum error probability. Then, we develop a CIS algorithm to solve this problem. Simulation results show that, compared with MMI-based quantization and entropy-based quantization, the proposed CIS algorithm can significantly improve the lifetime of flash memory.
- *DNN-aided read-voltage threshold optimization*—We develop a DNN-aided optimization strategy to optimize the read-voltage thresholds without the knowledge of DRT. The core of the proposed DNN-aided scheme is to train a multi-layer perception (MLP) network to learn the relationship between the voltage distribution (i.e., input of the MLP) and the read-voltage thresholds (i.e., output of the MLP). Simulation results show that, compared with the RBAP decoding scheme, the DNN-aided scheme can not only improve the PE endurance but also reduce the read latency.

The remainder of this paper is organized as follows. Section II presents the MLC flash memory channel model and investigates its CCR under finite block length. Section III formulates the read-voltage thresholds optimization problem under finite block length and proposes the CIS algorithm. Section IV proposes the DNN-aided optimization strategy. Section V shows the simulation results. Section VI concludes this paper.

## II. SYSTEM MODEL

# A. Channel Model of MLC NAND Flash Memory

Let  $S = \{s_0, s_1, s_2, s_3\}$  denote the storage states of MLC flash memory. A flash memory cell must be erased before

programming. Let  $s_0$  denote the erased state of an MLC flash memory cell. With the reference to [16], the voltage distribution of the cell at state  $s_0$  is approximately modeled as a Gaussian distribution  $p_e(v) = \mathcal{N}(\mu_e, \sigma_e^2)$  with mean  $\mu_e$  and standard deviation  $\sigma_e$ , respectively. In addition, let  $s_1$ ,  $s_2$ , and  $s_3$  denote the programmed states. Moreover, the voltages at these programmed states are generated by using an incremental step-pulse programming technique. Then, the voltage distribution of the cell at each programmed state follows a uniform distribution [30]:

$$p_{\mathbf{p}_{s_i}}(v) = \begin{cases} 1/V_{\mathbf{p}}, & v \in [V_{s_i}V_{\mathbf{p}}) \\ 0, & \text{elsewhere,} \end{cases} \quad i = 1, 2, 3, \quad (1)$$

where  $V_p$  denotes the programming step voltage and  $V_{s_i}$  denotes the target programmed voltage of  $s_i$ .

The MLC flash memory channel is generally attenuated by the PN, cell-to-cell interference (CCI), RTN and DRN [16], [31], [32].

1) Programming Noise: Let  $n_{pn}$  denote the PN. The voltage programming process is influenced by the PN, which can be approximately modeled as a Gaussian distribution  $n_{pn}(v) = \mathcal{N}(0, \sigma_{pn}^2)$  with zero mean and standard deviation  $\sigma_{pn}$  [33]. The programming process does not change the voltage of erased state, but only effects the voltage distributions of states  $s_1, s_2, s_3$  [16].

2) Cell-to-cell Interference: Let  $n_c$  denote the CCI. As the major noise source in the MLC flash memory [16], [31], [34], the CCI results from the scaling down of the flash memory chip, leading to a voltage shift  $V_C$  among the cells:

$$V_{\rm C} = \sum_j \Delta V_j \zeta_j,\tag{2}$$

where  $\Delta V_j$  represents the voltage variation of the *j*-th interfering cell programmed after the victim cell, and  $\zeta_j$  represents the coupling coefficient between the *j*-th interfering cell and the victim cell. The effect of CCI can be estimated and the pre-distortion/post-compensation technique can be employed to mitigate the influence of CCI [32]. However, this technique cannot eliminate the CCI's effect on the erased state  $s_0$ . Let  $V_{s_0}$  denote the target voltage of the erased state. According to [16], [32], the voltage distribution of the cell for evenbit line and odd-bit line at the erased state is modeled by two Gaussian distributions, i.e.,  $n_c^{\text{even}} = \mathcal{N}(\tilde{\mu}_e^{\text{even}}, \sigma_e^2)$  and  $n_c^{\text{odd}} = \mathcal{N}(\tilde{\mu}_e^{\text{odd}}, \sigma_e^2)$ , with the same variance  $\sigma_e^2$  and different means:

$$\tilde{\mu}_{e}^{even} = V_{s_0} + V_{mean}(2K_x + K_y + 2K_{xy}),$$
 (3a)

$$\tilde{\mu}_{\rm e}^{\rm odd} = V_{s_0} + V_{\rm mean}(K_{\rm y} + K_{\rm xy}),\tag{3b}$$

where  $V_{\text{mean}} = (V_{s_0} + V_{s_3})/2 - V_{s_0}$ ;  $\tilde{\mu}_e^{\text{even}}$  and  $\tilde{\mu}_e^{\text{odd}}$  represent the variances of voltage for the even-bit line and odd-bit line cells, respectively;  $K_x$ ,  $K_y$ , and  $K_{xy}$  are the coupling coefficients of the floating gate in the horizontal, vertical, and diagonal directions, respectively.



Fig. 1. Voltage distribution and 6-level read quantization of an MLC flash memory.

3) Random Telegraph Noise: Let  $n_{\rm rtn}$  denote the RTN. The RTN can be approximately modeled as a Gaussian distribution  $n_{\rm rtn}(v) = \mathcal{N}(0, \sigma_{\rm rtn}^2)$  with zero mean and standard deviation  $\sigma_{\rm rtn}$ , where  $\sigma_{\rm rtn}$  varies with the number of program-anderase (PE) cycles in a power-law form [16]. From [27],  $\sigma_{\rm rtn} = 0.00027 (N_{\rm PE})^{0.64}$  with  $N_{\rm PE}$  being the number of PE cycles.

Fig. 1 (a) illustrates the voltage distribution of an MLC flash memory cell under the effect of PN, CCI, and RTN.

4) Data Retention Noise: Let  $n_d$  denote the DRN. The DRN is approximated as a Gaussian distribution  $n_{d_i}(v) = \mathcal{N}(\mu_{\mathbf{r}_{s_i}}, \sigma_{\mathbf{r}_{s_i}}^2)$ , i = 0,1,2,3, where  $\mu_{\mathbf{r}_{s_i}}$  and  $\sigma_{\mathbf{r}_{s_i}}$  are the data-dependent mean and standard deviation, respectively [16], [32]. Both  $\mu_{\mathbf{r}_{s_i}}$  and  $\sigma_{\mathbf{r}_{s_i}}$  are time-varying and voltage-dependent:

$$\mu_{\mathbf{r}_{s_i}} = \log(1+T)(V_i - V_0)[\beta_0(N_{\rm PE})^{\alpha_0} + \beta_1(N_{\rm PE})^{\alpha_1}], \quad (4a)$$

$$\sigma_{\mathbf{r}_{s_i}} = 0.4 \left| \mu_{\mathbf{r}_{s_i}} \right|,\tag{4b}$$

where T is the DRT,  $\alpha_0$ ,  $\alpha_1$ ,  $\beta_0$ , and  $\beta_1$  are constants.

Finally, the overall voltage distribution functions, calculated by the convolution integral of initial voltage distribution functions with various noise functions [27], are given by

$$p_{s_i}(v) = \frac{1}{\sigma_{s_i}\sqrt{2\pi}} e^{-\frac{(v-\mu_{s_i})^2}{2\sigma_{s_i}}}, i = 0, 1, 2, 3,$$
(5)

where

$$\mu_{s_0} = V_{s_0} - \mu_{\mathbf{r}_{s_0}},\tag{6a}$$

$$\sigma_{s_0} = \sqrt{\sigma_{\rm e}^2 + \sigma_{\rm rtn}^2 + \sigma_{\rm r_{s_0}}^2},\tag{6b}$$

$$\mu_{s_{\hat{i}}} = V_{s_{\hat{i}}} - V_{\rm p}/2 - \mu_{s_{\hat{i}}}, \tag{6c}$$

$$\sigma_{s_{\hat{i}}} = \sqrt{\sigma_{\text{pn}}^2 + \sigma_{\text{rtn}}^2 + \sigma_{r_{s_{\hat{i}}}}^2}, \hat{i} = 1, 2, 3.$$
(6d)

According to [27], the parameters of MLC flash memory are set as  $V_{s_0} = 1.4$ ,  $V_{s_1} = 2.6$ ,  $V_{s_1} = 3.2$ ,  $V_{s_3} = 3.93$ ,  $V_p = 0.2$ ,  $\sigma_e = 0.34$ ,  $\sigma_{pn} = 0.05$ ,  $\beta_0 = 0.00001$ ,  $\beta_1 = 0.00008$ ,  $\alpha_0 = 0.68$ , and  $\alpha_1 = 0.52$ , respectively. From (5), the increase of either  $N_{PE}$  or DRT changes the voltage distribution, which



Fig. 2. Coding for MLC flash memory.

causes the read errors and degrades the endurance of flash memory.

#### B. Read-voltage Quantization

For the MLC flash memory, the relationship among the block, cell wordline/bitline, and page is briefed as follows [34]. Each memory block contains multiple rows of cells. Each cell stores K = 2 bits, i.e., the most significant bit (MSB) and least significant bit (LSB). To reduce the raw bit error rate, the Gray coding is used to map the 2 bits in each cell to one of the storage states. As shown in Fig. 2, the storage states  $s_0, s_1, s_2, s_3$  correspond to the information bits 11, 10, 00, 01, respectively. The MSBs of all cells on the same wordline are combined to form an MSB page, and the LSBs of all cells on the same wordline are combined to form an LSB page.

ECC is used to detect and correct the raw bit errors that occur within flash memory. In this paper, we use two independent length-*N* ECC to encode the input sequence of the MSB and LSB pages as  $X_{\rm M} = (x_{{\rm M},1}, x_{{\rm M},2}, ...x_{{\rm M},N})$ and  $X_{\rm L} = (x_{{\rm L},1}, x_{{\rm L},2}, ...x_{{\rm L},N})$ , respectively. During the write process in the *n*-th cell, every K = 2 bits, i.e.,  $(x_{{\rm M},n}, x_{{\rm L},n})$  are first mapped to a storage state. Then, according to the storage state of a memory cell, the programming operation shifts the voltage of this cell to a well-designed write-voltage threshold. During the read process, to transform the voltage value into soft information (i.e., LLRs) for ECC decoding, the readback voltages need to be quantized by comparing with precomputed read thresholds.

Consider a voltage quantization strategy with *J*-level reads. The read voltages of memory cells are quantized into J+I regions. Let  $\mathcal{D} = \{d_1, d_2, \ldots, d_J\}$  collect *J*-level read-voltage thresholds, and  $\mathcal{R} = \{r_0, r_1, \ldots, r_J\}$  collect J+I output regions where  $r_j = [d_j, d_{j+1}]$  with  $d_0 = 0$  and  $d_{J+1} = +\infty$ . In addition, the read-voltage thresholds of flash memory cells yield  $0 < d_1 < d_2 < \cdots < d_J$ . Fig. 1 illustrates this quantization with 6-level read. For  $j = 1, 2, \cdots, J$  and  $k = 1, 2, \cdots, K$ , the initial LLR of the k-th bit in the j-th region is calculated by

$$L(j,k) = \log \frac{\int_{d_{j-1}}^{d_j} \sum_{i \in \mathcal{Q}_k} p_{s_i}(v) dv}{\int_{d_{j-1}}^{d_j} \sum_{i=1}^4 p_{s_i}(v) dv - \int_{d_{j-1}}^{d_j} \sum_{i \in \mathcal{Q}_k} p_{s_i}(v) dv},$$
(7)

where  $Q_k$  is the set of states each with the k-th bit being 1. Based on (7), we can obtain the LLR of each region.

The choice of read-voltage thresholds determines the LLRs, thus has great impact on the ECC decoding performance. Therefore, the goal of this paper is to maximize the read reliability of MLC flash memory by optimizing the readvoltage thresholds.

# C. CCR for Flash Memory Channel under Finite Block Length

A DMC comprises of an input set, output set, and a probability transition matrix where the probability distribution of the output depends only on the input at that time and is conditionally independent of previous channel inputs or outputs. Since the read process transforms storage states into discrete region values, the flash memory channel can be treated as a DMC.

Let  $W : S \to \mathcal{R}$  denote the DMC with transition probabilities  $W(r_j|s_i)$ ,  $s_i \in S$ ,  $r_j \in \mathcal{R}$ , where input  $s_i$  and output  $r_j$  correspond to the storage state and quantization region, respectively. The transition probability function of the voltage region  $r_j$  given input  $s_i$  is

$$W(r_j|s_i) = w_{r_j,s_i} = \int_{d_j}^{d_{j+1}} p_{s_i}(v) \mathrm{d}v$$
$$= Q\left(\frac{d_j - \mu_{s_i}}{\sigma_{s_i}}\right) - Q\left(\frac{d_{j+1} - \mu_{s_i}}{\sigma_{s_i}}\right), \quad (8)$$

where  $p_{s_i}(v)$  is given in (5) and  $Q(\epsilon) = \int_{\epsilon}^{\infty} \frac{1}{\sqrt{2\pi}} e^{\frac{-t^2}{2}} dt$ . Moreover, the probability of output  $r_j$  is given by

$$P(r_j) = p_{r_j} = \sum_{r_j \in \mathcal{R}} p_{s_i} w_{r_j, s_i}$$
$$= \sum_{r_j \in \mathcal{R}} p_{s_i} \left[ Q\left(\frac{d_j - \mu_{s_i}}{\sigma_{s_i}}\right) - Q\left(\frac{d_{j+1} - \mu_{s_i}}{\sigma_{s_i}}\right) \right].$$
(9)

From (8) and (9), the mutual information between input  $s_i$  and output  $r_j$  is

$$I(P,W) = \sum_{s_i \in S} \sum_{r_j \in \mathcal{R}} P(s_i) W(r_j | s_i) \log \frac{W(r_j | s_i)}{P(r_j)}$$
$$= \sum_{s_i \in S} \sum_{r_j \in \mathcal{R}} p_{s_i} w_{r_j, s_i} \log \frac{w_{r_j, s_i}}{p_{r_j}},$$
(10)

and the unconditional information variance is

$$U(P,W) = \sum_{s_i \in \mathcal{S}} \sum_{r_j \in \mathcal{R}} P(s_i) W(r_j | s_i) \left( \log \frac{W(r_j | s_i)}{P(r_j)} \right)^2$$
$$= \sum_{s_i \in \mathcal{S}} \sum_{r_j \in \mathcal{R}} p_{s_i} w_{r_j, s_i} \left( \log \frac{w_{r_j, s_i}}{p_{r_j}} \right)^2 - [I(P,W)]^2. \quad (11)$$

As [21] unveiled, for a finite block length code and DMC, the achievable CCR with a given error probability  $\epsilon$  and a code block length N yields

$$R(N,\epsilon,\gamma) \ge I(P,W) - \sqrt{\frac{U(P,W)}{N}}Q^{-1}(\epsilon) + \frac{\log N}{2N}, \quad (12)$$

where  $Q^{-1}$  is the inverse function of  $Q(\epsilon)$ .

## III. READ-VOLTAGE THRESHOLD OPTIMIZATION FOR MLC FLASH MEMORY

In this section, we give the upper bound of decoding error probability for MLC flash memory channel and formulate the read-voltage threshold optimization. Unlike conventional methods such as MMI and entropy-based quantization, our optimization problem focuses on finite block length.

#### A. Error Performance under Finite Block Length

First, we rewrite (12) as

$$Q^{-1}(\epsilon) \ge \mathcal{T}(N, \epsilon, \gamma, P, W), \tag{13}$$

where

$$\mathcal{T}(N,\epsilon,\gamma,P,W) = \left[I(P,W) - R(N,\epsilon,\gamma) + \frac{\log N}{2N}\right] \sqrt{\frac{N}{U(P,W)}}.$$
 (14)

For the flash memory, both I and U vary over different P and W, since P and W depend on the parameters of flash memory, such as number of PE cycles, DRT and read-voltage thresholds according to (5) and (8). Thus, the function  $\mathcal{T}$  in (14) can also be interpreted as a function with respect to these parameters:

$$\mathcal{T}(N, R, \mathcal{D}, E, T) = \left[ I(\mathcal{D}, E, T) - \bar{R} + \frac{\log N}{2N} \right] \sqrt{\frac{N}{U(\mathcal{D}, E, T)}}, \quad (15)$$

where  $\bar{R}$  is the code rate of ECCs used in flash memory.

As Q function is monotonically decreasing, the decoding error probability is upper bounded by  $\epsilon \leq Q\left(\mathcal{T}(N, \bar{R}, \mathcal{D}, E, T)\right)$ . Thus the maximum error probability is  $\epsilon_{\max} = Q\left(\mathcal{T}(N, \bar{R}, \mathcal{D}, E, T)\right)$ . In this context, our goal is to optimize the read-voltage thresholds by minimizing the maximum decoding error probability:

$$\mathcal{D}^* = \arg\min_{\mathbf{r}} \epsilon_{\max},\tag{16}$$

where  $\mathcal{D}^*$  is the set of optimal read-voltage thresholds.

Due to the write process of MLC flash memory, the MSB and LSB have different channel conditions [9], [35]. Consequently, the error probabilities of MSB and LSB vary over different quantization regions. Taking the 6-level read in Fig. 1 for example, the MSB errors often occur in region  $r_4$ , and the LSB errors often occur in regions of  $r_2$  and  $r_6$  [35]. In addition, according to (8) and (9), the transition probabilities W of MSB and LSB, denoted by  $W_M$  and  $W_L$ , are diverse. Furthermore, the decoding error probabilities of MSB and LSB are independent, since independent encoding processes are used for these two pages. In the view of this independence, the average maximum error probability for MLC flash memory over the two pages is given by

$$\epsilon_{\max} = \frac{Q\left(\mathcal{T}_{\mathrm{M}}\right) + Q\left(\mathcal{T}_{\mathrm{L}}\right)}{2},\tag{17}$$

where the T functions of MSB and LSB are denoted by

$$\mathcal{T}_{\mathrm{M}} = \left[ I(P, W_{\mathrm{M}}) - \bar{R} + \frac{\log N}{2N} \right] \sqrt{\frac{N}{U(P, W_{\mathrm{M}})}}, \quad (18a)$$

$$\mathcal{T}_{\rm L} = \left[ I(P, W_{\rm L}) - \bar{R} + \frac{\log N}{2N} \right] \sqrt{\frac{N}{U(P, W_{\rm L})}}.$$
 (18b)

Overall, we can formulate the optimization problem as

$$\mathcal{P}: \min \epsilon_{\max}$$
 (19a)

s.t. 
$$0 < d_1 < d_2 < \dots < d_J$$
. (19b)

Due to the dimension of  $\mathcal{D}$ , analytical solution of  $\mathcal{P}$  is computationally intractable. In the following, we develop an efficient method to solve this problem.

#### B. Cross Iterative Searching Algorithm

In this part, we utilize genetic algorithm and CIS algorithm to optimize the read-voltage thresholds in various read-levels. In the genetic algorithm, the evolution is implemented by using a set of stochastic genetic operators to mimic the natural process of reproduction and mutation. Although the genetic algorithm can solve complex problems, high quality solutions require massive computations to explore the entire search space for global optimization [36]. For our problem, the computation of genetic algorithm dramatically increases as the dimension of  $\mathcal{D}$  goes larger. To reduce the complexity, the cross iterative searching algorithm helps us to find local optimum solution within certain region which saves a lot of time. Combining the genetic algorithm and cross iterative searching algorithm, we can escape from local optimum and obtain near-optimal results.

As shown in **Algorithm** 1, we develop a CIS algorithm to solve the optimization problem given in (19a). In the readvoltage threshold optimization, all the read-voltage thresholds are constrained by (19b). Before the iterative searching process, the CIS algorithm needs to determine the initial value of the read-voltage thresholds (see line 1 of **Algorithm** 1). The well-designed initial value will accelerate the convergence speed and avoid trapping into local optimum. Let  $\mathcal{H} = \{h_1, h_2, h_3\}$  denote a set that collects the read-voltage thresholds under hard decision. We can identify the harddecision thresholds by letting

$$p_{s_0}(v = h_1) = p_{s_1}(v = h_1),$$
  

$$p_{s_1}(v = h_2) = p_{s_2}(v = h_2),$$
  

$$p_{s_2}(v = h_3) = p_{s_3}(v = h_3).$$
(20)

Then, we initialize the *J*-level read-voltage thresholds as  $\mathcal{D}^0 = \{d_1^0, d_2^0, \ldots, d_J^0\}$ , where  $d_1^0 = h_1 - \delta$ ,  $d_j^0 = h_1 + (j-1)\delta$ , for  $j = 2, \cdots, J - 1$ ,  $d_J^0 = h_3 + \delta$ , and  $\delta = \frac{h_3 - h_1}{J - 1}$ .

**Input:**  $\epsilon_{\max}$ , maximum iterations  $I_{\max}$ , stopping criteria  $\rho$ , block length N, code rate  $\overline{R}$ . **Output:** the read-voltage thresholds  $\mathcal{D}$ . 1 Initialization:  $i \leftarrow 0, \mathcal{D}^0$ ; 2 while  $|\epsilon_{\max}^{(i)} - \epsilon_{\max}^{(i-1)}| > 
ho$  and  $i < I_{\max}$  do i = i + 1, j = 1;3 while  $j \leq J$  do 4 Determine the range of  $d_j^{(i)}$ ; 5 Search for the local optimal  $d_j^{(i)}$  using  $\arg\min Q(\mathcal{T}(d_j^{(i)}, N, \bar{R}, E, T));$ 6 7 Calculate  $\epsilon_{\max}^{(i)}$ ; 8 9 Output  $\mathcal{D}^{(i)}$ .

Lines 2-8 show the iterative searching process. First, the ranges of read-voltage thresholds are determined in order to reduce the searching space (see line 5). During the (i + 1)-th iteration, we search  $d_j^{i+1}$  over  $[d_j^i - \lambda, d_j^i + \lambda]$ , where  $\lambda$  is a well-designed constant (e.g.,  $\lambda = 0.2$  in the simulations). Second, the thresholds are updated successively, where each read-voltage threshold is optimized while keeping remaining read-voltage thresholds fixed (see line 6). Finally, the searching algorithm ends and outputs the optimized read-voltage thresholds if  $|\epsilon_{\max}^{(i)} - \epsilon_{\max}^{(i-1)}| < \rho$  or the maximum number of iterations is reached (see lines 2 and 9).

# IV. DNN-AIDED READ-VOLTAGE THRESHOLD Optimization

## A. Motivation

Fig. 1 (b) illustrates that the original voltage thresholds become outdated, since the voltage distribution is changed under the effect of DRN in MLC flash memory. Without the precise read-voltage thresholds, we cannot obtain the correct LLRs in (7) that depend on these thresholds. Finally, due to the mismatch between new voltage distribution and outdated readvoltage thresholds, the decoder is unable to decode correctly based on the incorrect LLRs.

From (5), the voltage distribution mainly depends on the number of PE cycles and DRT. The number of PE cycles for the memory block can be recorded in flash memory [27]. Nevertheless, we cannot analytically characterize the voltage distribution under the effect of DRN, since the DRT is hard to be recorded. Hence, it is great challenging for existing technologies to track the voltage distribution under the effect of DRN. To address this issue, we design a DNN-aided optimization strategy to optimize the read-voltage thresholds.

# B. Data Process

The DNN is a powerful tool to extract deep information from raw data, which can build the non-linear mapping between inputs and outputs [28], [29]. However, its learning ability is limited when the input data lacks valuable information.



Fig. 3. 6-level read-voltage quantization for MLC flash memory.

For the flash memory, the input data comes from the read process. Due to the read errors and limited memory sensing precision, it is hard to obtain the accurate voltage of each cell. In the read process, the read-voltage thresholds can be used to determine the voltage locations over the quantization regions (i.e., the region where each voltage value falls into) and transform each location into a specific LLR of (7).

In this paper, we adopt the nonuniform quantization to obtain the voltage location information, since the nonuniform read-voltage quantization shows better error-correction performance than uniform under the same number of quantization levels [8], [9], [16]. As an illustration, Fig. 3 shows that, under the 6-level quantization, the nonuniform quantization can better capture the characteristics of the voltage distribution, where the histogram is used to count the number of voltage values that fall into each region. This observation illustrates that the nonuniform quantization can track the variation of voltage distribution under the effect of DRN with limited number of quantization levels. Therefore, by the nonuniform quantization, the DNN can efficiently learn the relationship between the location information and voltage distribution.

## C. Multi-layer Perception Network

To address the mismatch problem between new voltage distribution and outdated read-voltage thresholds, we propose a DNN-aided decoding strategy to optimize the read-voltage thresholds over different DRT. Before delving into the proposed scheme, we briefly introduce the DNN. The MLP is a feedforward DNN which can extract valuable information from extremely complex problems. In particular, it utilizes a supervised learning technique called backpropagation for training. A typical MLP network consists of at least three layers and each layer consists of a number of nodes. The adjacent layers are fully interconnected by weights that are chosen randomly at the beginning.



Fig. 4. The diagram of an MLP network

As shown in Fig. 4, the MLP is composed of input layer, hidden layers, and output layer. The input layer that owns J+1 nodes receives the input data and forwards it to the hidden layer. The output layer outputs D = f(WY + b), where W and b are the weights and biases of the hidden layer neurons respectively, and  $f(\cdot)$  is a non-linear activation function.

For each learning iteration, the MLP receives the input data (i.e., training set, validation set, or test set) and outputs some values. Based on the error between the MLP output and the expected output (i.e., label), the MLP performs a backpropagation to update the weights of the hidden layers. By the gradient decent algorithm, the weights are updated by  $W(i+1) = W(i) - \eta \frac{\partial E(i)}{\partial W(i)}$ , where  $\eta$  is the learning rate and E(i) is the error at *i*-th iteration. With the backpropagation, the DNN can minimize the error between the MLP output and expected output.

# D. Training

1) Training Data Generation: The training data of DNN includes the input data (i.e., histogram results of voltage values) and expected output data (i.e., read-voltage thresholds optimized by **Algorithm** 1). As shown in (5), the voltage distribution of flash memory channel depends on the number of PE cycles and DRT. To make the DNN learn the relationship between input data and expected output data, the training set must include the voltage values with different numbers of PE cycles and different DRT. In addition, the training data is generated within a set of PE cycles {4000, 5000, 6000} and a range of DRTs over  $[0, 10^6]$ .

TABLE I DNN Hyper-Parameters

| Learning rate   | $10^{-5}$ |
|-----------------|-----------|
| Epoch           | 100000    |
| Mini-batch size | 500       |
| Initializer     | Xavier    |
| Optimizer       | Adam      |
| Loss function   | MSE       |

2) Loss Function: The loss function is the measurement of errors between the MLP output and expected output. In our



Fig. 5. The architecture of DNN-aided MLC flash memory.

simulations, we employ the mean squared error (MSE) as the loss function, which defined as

$$L_{\rm MSE} = \frac{1}{J} \sum_{j=1}^{J} \left( d_j - \hat{d}_j \right)^2, \tag{21}$$

where  $d_j$  and  $\hat{d}_j$  are the expected output and MLP output, respectively.

3) DNN Parameters: The sizes of input layer and output layer depend on the read-voltage quantization levels. In the MLP network, we employ three hidden layers with 512, 256, 128 neurons, respectively. For each hidden layer and output layer, the activate functions are all Sigmoid Function, i.e.,  $S(x) = \frac{1}{1+e^{-x}}$ . The hyper-parameters are listed in Table I.

## E. DNN-aided Flash Memory

In this subsection, we develop a DNN-aided MLC flash memory structure as shown in Fig. 5. The DNN is well-trained with the histogram results and the read-voltage thresholds optimized by the proposed CIS algorithm. First, the controller reads the voltage value from the flash memory chip. Second, the controller converts these voltage values into LLRs. Then, the decoder uses these LLRs to perform decoding. If the decoding fails, the DNN is activated and uses the histogram results to update the read-voltage thresholds. After that, the decoder receives the updated LLRs and performs decoding again. If the second decoding fails, the controller records this block as a bad block.

#### V. SIMULATION RESULTS

In the simulations, we use the sum product algorithm as the decoding algorithm where the maximum number of decoding



Fig. 6. CCR versus PE cycles under different read-level quantization.

iterations is  $I_{\text{max}}$ . The simulations use three binary LDPC codes, i.e., 2K-QC-code, 4K-QC-code, and 2K-random-code. In 4K-QC-code, each entry of a small  $7 \times 71$  base matrix  $\mathbf{H}_B$  is replaced by either a circulant shift of a  $64 \times 64$  identity matrix or a  $64 \times 64$  zero matrix. The block length of this code is 4544 bits and the code rate is set as 0.9. This irregular code has column-weight of 5 and row-weight of either 50 or 51. The 2K-QC-code is chosen as a QC-LDPC code with uniform column-weight of 4 and row-weight of either 40 or 41. The code rate of 2K-QC-code are the same as 4K-QC-code. The 2K-random-code is an irregular LDPC code with input and output block length (frame size) of 1998 and 1776 bits, respectively. The code-rate is 0.89 and the column-weight is 4.

Fig. 6 plots the CCR under different optimization strategies, read levels, and block length versus PE cycles. The CCR of mutual information strategy [9] follows (10), and the CCR of finite block length strategy follows (12). First, it is observed that the loss of CCR enlarges as the block length decreases. Second, the quantization with larger read-levels contributes to a higher CCR. This is due to the fact that larger read-level quantization provides more precise voltage information especially with high PE cycles.

Fig. 7 plots the frame-error-rate (FER) curves over different  $N_{\text{PE}}$  under the proposed CIS algorithm, MMI-based quantization and entropy-based quantization (with the optimized entropy parameter  $\theta = 0.3$  [16]) with 2*K*-QC-code and 4*K*-QC-code, respectively. Consider that the number of PE cycles ranges over [15000, 19000] and DRT is set to be zero (i.e., T = 0 that represents the early retention time). It is observed that the proposed CIS algorithm can endure the largest PE cycles among all the three methods. For example, at FER =  $10^{-4}$ , the MMI-based quantization and entropy-based quantization with 2*K*-QC-code can endure around 15100 and 15600 PE cycles, respectively. In contrast, the proposed CIS algorithm



Fig. 7. FER performance of LDPC 2K-QC-code and 4K-QC-code versus different  $N_{\rm PE}$  under 6-level read quantization with  $I_{\rm max}=25$ .



Fig. 8. FER performance of LDPC 2K-QC-code versus different  $N_{\rm PE}$  under different read-level quantization with  $I_{\rm max} = 30$ .

can extend the endurance limit of PE cycles to 15900.

Fig. 8 compares the FER performance versus different  $N_{\text{PE}}$  between the proposed CIS algorithm, MMI-based quantization, and entropy-based quantization with 2*K*-QC-code. It is observed that the proposed CIS algorithm is superior to both MMI-based quantization and entropy-based quantization under both 6-level and 9-level quantization. In addition, higher level read quantization performance better. For example, at FER =  $10^{-4}$ , the proposed scheme improves the endurance by 2100 PE cycles under the 9-level quantization compared with 6-level read. This is due to the fact that, with the higher level read quantization, more accurate LLRs are fed into the DDN-



Fig. 9. FER performance of LDPC 2*K*-QC-code and 2*K*-random-code versus different DRT with  $N_{\text{PE}} = 8000$  and  $I_{\text{max}} = 25$ .

aided decoder. Note that this figure does not show the FER of entropy-based 9-level quantization, since the entropy-based quantization cannot freely choose the read-levels.

Fig. 9 shows the FER curves versus different DRT between the proposed CIS algorithm, MMI-based quantization, and entropy-based quantization with 2K-QC-code and 2K-randomcode. Note that all these quantization methods use the perfect knowledge of DRT and PE cycles. It is observed that the proposed algorithm is superior to other algorithms with different LDPC codes under the effect of DRN. For example, at FER =  $10^{-4}$ , the proposed algorithm can extend the endurance limit of DRT up to 1000 hours and 2000 hours with 2K-QCcode and 2K-random-code, respectively.

Fig. 10 plots the FER curves of the BP decoding, the RABP decoding in [26], the proposed DNN-aided scheme, and the CIS algorithm. In this figure, the RABP decoding utilizes the information generated by the first-round BP decoding to amend the LLRs and perform the second-round BP decoding. First, it is observed that the FER of the proposed DNN scheme approaches that of CIS. Second, the proposed DNN scheme can significantly improve the tolerance of flash memory against the DRN compared with the BP decoding and RABP decoding. For example, at  $FER = 10^{-4}$ , the proposed DNN scheme can extend the endurance of flash memory up to nearly 30000, 200000, 1000000 hours, while keeping the  $N_{\rm PE}$  fixed at 6000, 5000, 4000, respectively. In addition, the proposed scheme improves the read latency of flash memory compared with the RABP decoding. This is due to the fact that the RABP decoding demands the second-round decoding to amend the inaccurate results in first-round decoding caused by the DRN. However, the proposed DNN scheme estimates the read-voltage thresholds every 1000 blocks. Consequently, there is no need for the proposed scheme to do the secondround decoding, which reduces the read latency.



Fig. 10. FER performance of LDPC 2K-random-code under different strategies versus different DRT with  $N_{\text{PE}} = \{4000, 5000, 6000\}$  and  $I_{\text{max}} = 50$ .

## VI. CONCLUSIONS

In this paper, we optimized the read-voltage thresholds for MLC flash memory under finite block length. First, we analyzed the flash memory channel under finite block length and formulated the threshold optimization problem. Based on the finite block length theory, we converted the problem of maximizing CCR problem into that of minimizing the maximum decoding error probability. With perfect knowledge of PE cycles and DRT, we proposed the CIS algorithm to solve this optimization problem. Furthermore, to address the intractable LLRs under the effect of DRN in reality, we proposed the DNN-aided scheme to optimize the readvoltage thresholds without the knowledge of DRT, where the nonuniform quantization is employed to generate the voltage location information as the input to the MLP. The simulation results demonstrated that the proposed algorithms improve the PE endurance compared with the existing baseline methods. In particular, the proposed DNN-aided scheme can reduce the read latency compared with the RABP decoding scheme.

#### REFERENCES

- K. Kim, "Future memory technology: Challenges and opportunities," in *Proc. Int. Symp. VLSI Technol. Syst. Appl.*, San Jose, CA, USA, Apr. 2008, pp. 5–9.
- [2] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, "Error patterns in MLC NAND flash memory: measurement, characterization, and analysis," in *Proc. DATE*, Mar. 2012, pp. 521–526.
- [3] H. Lee, J. Shy, Y. Chen, and Y. Ueng, "LDPC coded modulation for TLC flash memory," in *Proc. IEEE ITW*, Kaohsiung, Taiwan, Nov. 2017, pp. 204–208.
- [4] Q. Li, A. Jiang, and E. F. Haratsch, "Noise modeling and capacity analysis for NAND flash memories," in *Proc. IEEE ISIT*, Honolulu, HI, USA, Jun. 2014, pp. 2262–2266.
- [5] Y. Cai, Y. Luo, E. F. Haratsch, K. Mai, and O. Mutlu, "Data retention in MLC NAND flash memory: characterization, optimization, and recovery," in *Proc. HPCA*, Burlingame, CA, USA, Feb. 2015, pp. 551–563.
- [6] S. G. Cho, D. Kim, J. Choi, and J. Ha, "Block-wise concatenated BCH codes for NAND flash memories," *IEEE Trans. Commun.*, vol. 62, no. 4, pp. 1164–1177, Apr. 2014.

- [7] B. Chen, X. Zhang, and Z. Wang, "Error correction for multi-level NAND flash memory using reed-solomon codes," in *Proc. IEEE SiPS*, Washington, DC, USA, Oct. 2008, pp. 94–99.
- [8] G. Dong, N. Xie, and T. Zhang, "On the use of soft-decision errorcorrection codes in NAND flash memory," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 2, pp. 429–439, Feb. 2011.
- [9] J. Wang, K. Vakilinia, T. Y. Chen, T. Courtade, G. Dong, T. Zhang, H. Shankar, and R. Wesel, "Enhanced precision through multiple reads for LDPC decoding in flash memories," *IEEE J. Sel. Areas Commun.*, vol. 32, no. 5, pp. 880–891, May 2014.
- [10] P. Chen, K. Cai, and S. Zheng, "Rate-adaptive protograph LDPC codes for multi-level-cell NAND flash memory," *IEEE Commun. Lett.*, vol. 22, no. 6, pp. 1112–1115, Jun. 2018.
- [11] C. Wang, J. Li, L. Kong, F. Shu, and F. C. M. LAU, "Adaptive 2D scheduling based nonbinary majority-logic decoding for NAND flash memory," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, Aug. 2019, to be published.
- [12] R. Gallager, "Low-density parity-check codes," *IRE Trans. Inf. Theory*, vol. 8, no. 1, pp. 21–28, Jan. 1963.
- [13] H. Xiao and A. H. Banihashemi, "Graph-based message-passing schedules for decoding LDPC codes," *IEEE Trans. Commun.*, vol. 52, no. 12, pp. 2098–2105, Dec. 2004.
- [14] E. Sharon, S. Litsyn, and J. Goldberger, "Efficient serial message-passing schedules for LDPC decoding," *IEEE Trans. Inf. Theory*, vol. 53, no. 11, pp. 4076–4091, Nov. 2007.
- [15] K. Wei, J. Li, L. Kong, F. Shu, and F. C. M. Lau, "Page-based dynamic partitioning scheduling for LDPC decoding in MLC NAND flash memory," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 66, no. 12, pp. 2082–2086, Feb. 2019.
- [16] C. A. Aslam, Y. L. Guan, and K. Cai, "Read and write voltage signal optimization for multi-level-cell (MLC) NAND flash memory," *IEEE Trans. Commun.*, vol. 64, no. 4, pp. 1613–1623, Feb. 2016.
- [17] C. A. Aslam, Y. L. Guan, and K. Cai, "Non-binary LDPC code with multiple memory reads for multi-level-cell (MLC) flash," in *Proc. Int. Conf. APSIPA*, Berkeley, CA ,USA, Dec. 2014, pp. 1–9.
- [18] Z. Mei, K. Cai, L. Shi, and X. He, "On channel quantization for spin-torque transfer magnetic random access memory," *IEEE Trans. Commun.*, pp. 7526–7539, Nov. 2019.
- [19] B. Peleato, R. Agarwal, J. M. Cioffi, M. Qin, and P. H. Siegel, "Adaptive read thresholds for NAND flash," *IEEE Trans. Commun.*, vol. 63, no. 9, pp. 3069–3081, Sep. 2015.
- [20] K. Wei, J. Li, L. Kong, F. Shu, and Y. Li, "Read-voltage optimization for finite code length in MLC NAND flash memory," in *Proc. IEEE ITW*, Guangzhou, China, Nov. 2018.
- [21] Y. Polyanskiy, H. V. Poor, and S. Verdu, "Channel coding rate in the finite blocklength regime," *IEEE Trans. Inf. Theory*, vol. 56, no. 5, pp. 2307–2359, May 2010.
- [22] B. M. Kurkoski and H. Yagi, "Quantization of binary-input discrete memoryless channels," *IEEE Trans. Inf. Theory*, vol. 60, no. 8, pp. 4544– 4552, Aug. 2014.
- [23] F. J. C. Romero and B. M. Kurkoski, "LDPC decoding mappings that maximize mutual information," *IEEE J. Sel. Areas Commun.*, vol. 34, no. 9, pp. 2391–2401, Sep. 2016.
- [24] Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. S. Unsal, and K. Mai, "Flash correct-and-refresh: retention-aware error management for increased flash memory lifetime," in *Proc. IEEE ICCD*, Montreal, QC, Canada, Sep. 2012, pp. 94–101.
- [25] D. H. Lee and W. Sung, "Decision directed estimation of threshold voltage distribution in NAND flash memory," *IEEE Trans. Signal Process.*, vol. 62, no. 4, pp. 919–927, Feb. 2014.
- [26] C. A. Aslam, Y. L. Guan, and K. Cai, "Retention-aware beliefpropagation decoding for NAND flash memory," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 64, no. 6, pp. 725–729, Jun. 2017.
- [27] C. A. Aslam, Y. L. Guan, and K. Cai, "Decision-directed retentionfailure recovery with channel update for MLC NAND flash memory," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 1, pp. 353–365, Jan. 2018.
- [28] S. Dörner, S. Cammerer, J. Hoydis, and S. T. Brink, "Deep learning based communication over the air," *IEEE J. Sel. Topics Signal Process.*, vol. 12, no. 1, pp. 132–143, Feb. 2018.
- [29] F. Liang, C. Shen, and F. Wu, "An iterative BP-CNN architecture for channel decoding," *IEEE J. Sel. Topics Signal Process.*, vol. 12, no. 1, pp. 144–159, Feb. 2018.
- [30] K.-D. Suh et al., "A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme," *IEEE J. Solid-State Circuits*, vol. 30, no. 11, pp. 1149–1156, Nov. 1995.

- [31] G. Dong, Y. Pan, N. Xie, C. Varanasi, and T. Zhang, "Estimating information-theoretical NAND flash memory storage capacity and its implication to memory system design space exploration," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 9, pp. 1705–1714, Sep. 2012.
- [32] G. Dong, S. Li, and T. Zhang, "Using data postcompensation and predistortion to tolerate cell-to-cell interference in MLC NAND flash memory," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 10, pp. 2718–2728, Oct. 2010.
- [33] K. Takeuchi, T. Tanaka, and H. Nakamura, "A double-level-V<sub>th</sub> select gate array architecture for multilevel NAND flash memories," *IEEE J. Solid-State Circuits*, vol. 31, no. 4, pp. 602–609, Apr. 1996.
- [34] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, "Error characterization, mitigation, and recovery in flash-memory-based solidstate drives," *Proc. IEEE*, vol. 105, no. 9, pp. 1666–1704, Sep. 2017.
- [35] H. Sun, W. Zhao, M. Lv, G. Dong, N. Zheng, and T. Zhang, "Exploiting intracell bit-error characteristics to improve min-sum ldpc decoding for mlc nand flash-based storage in mobile device," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 8, pp. 2654–2664, Aug. 2016.
- [36] H. Ali, A. Doucet, and D. I. Amshah, "Gsr: A new genetic algorithm for improving source and channel estimates," *IEEE Trans. Circuits SystCircuits Syst. I, Reg. Papers*, vol. 54, no. 5, pp. 1088–1098, May 2007.