Coded worn block mechanism to reduce garbage collection in SSD
Introduction
Solid state drives (SSD) have revolutionized storage systems and consequently elicited strong interest in many computing systems, such as mobile computing and embedded systems. However, program/erase (P/E) cycles decided by physical structure of SSD are limited. In most cases, perfect wear leveling is difficult to achieve, and deviations in erase counts still exist for particular reasons [1], [2]. Actually, several blocks, referring to worn blocks, reach the limit of P/E times in advance. The worn blocks are retired when they exceed P/E limit and SSD endurance concern has become critical [3].
SSD provide some spare blocks as over-provisioning (OP) space. Due to OP space, the Flash Translation Layer (FTL) of a SSD can execute garbage collection (GC) with more flexibility [4]. FTL uses OP space to improve performance because more OP space increases garbage collection efficiency. As blocks are retired, the reduced size of OP space leads to degraded SSD performance [5].
In reality, worn blocks can still store data because they have extra P/E cycles [4], [6], [7]. However, using worn blocks to store data has higher retention error. Charge leaks of a worn block cause voltage to drift to the left, and may damage data in it. The longer a worn block retains data, the more electrons leak from the floating gate and higher bit error rate is. Some research works have paid attention to reduce retention error and ensure reliability of data in worn blocks [8]. These series of measures is collectively referred to as flash correct-and-refresh (FCR). FCR reads a data at fixed periods before the accumulated bit error exceeds ECC [9] error correction capability, then rewrite it. The smart retirement flash translation layer (SR-FTL) uses a strategy similar to FCR to ensure the size of OP space by aggressively using worn blocks, and updates the data before retention errors occur [4]. However, the abovementioned methods accelerate wear of worn blocks, prematurely consume its extra P/E cycles, and reduce lifespan of SSD. Specifically, they have following limitations.
First, SSD has a feature called “out-of-place update” [10], [11], [12]. A page has to be erased before being rewritten/programmed. A data is accessed in page. A page can be written only after the entire block to which it belongs has been erased [13]. Therefore, frequent data update operation in worn blocks greatly slow down performance of SSD because extra update operations are used to ensure reliability of data.
Second, data update operations in FCR series methods make garbage collection problems more serious. GC will remap valid pages in victim blocks and erase victim blocks for reuse, resulting a time-consuming page movements [14]. The correct and refresh operations make data distribution in flash more fragmented. The unnecessary and unefficient GC operations lead to increase P/E counts and degrade in system performance.
Third, data scrubbing in worn blocks has a negative impact on wear leveling. Worn blocks have reached P/E limit, and they have extra P/E cycles. The extra read and write operations caused by refresh methods to ensure data reliability in worn blocks will make them wear-out faster. The data scrubbing in FCR methods are contrary to the idea of wear leveling [15], [16], [17], which reduces life of SSD.
In this study, we propose a coded worn block FTL (CW-FTL) management scheme to address above challenges. CW-FTL stores cold data in worn blocks, but with added parity to address the higher error rate of worn blocks. CW-FTL reduces GC count of SSD and alleviates degradation of SSD performance by mitigating the reduction in OP space of SSD. In addition, in order to ensure reliability of data, CW-FTL stores encoded cold data that is not frequently updated in worn blocks.
Reducing incorrect data that exceeds ECC error correction capability is a very complicated problem. Data error in a worn block is of many types [6], [18], [19]. First, an error due to the erase operation cannot set a cell to erase state is erase error. Second, an error caused by program disturbance is called program interference error, which shifts the voltage distribution to the right. Third, an error caused by charge leaks is called retention error, which shifts the voltage profile to the left. Lastly, an error caused by read disturbance causes the voltage profile to shift to the right, namely read error. With the increase of P/E times, error rate raises significantly [6]. The retention error dominates all types of errors in worn blocks [19]. In this study, we mainly focus on the retention error.
Specifically, we make the following contributions.
- •
We design a data management scheme CW-FTL. It uses worn blocks to store cold data, thus reducing consumption of OP space and alleviating performance degradation of SSD.
- •
We use error correcting code for cold data in worn blocks to improve data reliability. The parity check code is used to encode cold data to demonstrate the effectiveness of CW-FTL. Other complex error correction codes can be used depending on implementation cost.
- •
We implement the proposed CW-FTL on simulation tools, and have conducted extensive tests to verify the benefits of CW-FTL. Experiment results illustrate that CW-FTL can reduce GC count of SSD while preserving its performance.
The rest of the paper is organized as follows. The background and motivation are presented in Section 2. Section 3 describes the design of CW-FTL. Performance evaluations of CW-FTL are presented in Section 4. Section 5 discusses the related work. Finally, we conclude the paper.
Section snippets
Typical policies of FCR
Many methods have used FCR strategy. [8] proposed four FCR schemes, remapping-based FCR, in-place reprogramming FCR, hybrid FCR, and adaptive-rate FCR.
The main idea of remapping-based FCR is to termly read data in flash, and then write back to another block after error correction. FCR use this refresh operation to prevent accumulation of retention errors.
In-place reprogramming FCR writes the original position after correcting a data, thus avoiding additional erase operations. The reason why
Basic idea of CW-FTL overview
The appearance and increase of worn blocks reduce OP space and degrade performance of SSD. For this problem, existing methods actively put hot data into worn blocks based on FCR strategy. However, the physical P/E limitation of worn blocks will be approached faster by using these methods. CW-FTL aims to alleviate SSD performance degradation through storing cold data in worn blocks, and using error correcting code to ensure data reliability. CW-FTL can slow down OP space consumption and reduce
Evaluation methodology
We have implemented CW-FTL using a simulation environment that based on SSDsim [20]. The SSDsim platform is configured as follows: 128 KB DRAM and two flash channels, each of which consists two chips (8 GB, 2 dies/chip, 2 planes/die). In this experiment, block size is set to 8 MB and a block includes 128 pages. In this simulation, the original OP space is set to about 30% of total memory capacity. The limit P/E times of a memory cell is usually set to be relatively small. Because of the scale
Related work
New technology trends are making flash memory more and more unreliable, but there are also many studies that enhance reliability. Many studies have focused on bit error rate to increase the useful life of SSDs and performance. A flash block stored multiple bits in a Multi-Level Cell flash memory is becoming unreliable, which can revived by storing one bit to extend lifetime [33]. While the revived-block capacity is halved, its lifetime is significantly extended without jeopardizing the stored
Conclusion
We propose a worn block reuse scheme CW-FTL for used SSDs to alleviate SSD performance degradation. Our scheme manages striped worn blocks to deal with the increasing error rates of flash as P/E cycles increases. We design a data management mechanism that takes full advantage of worn blocks to alleviate SSD performance degradation. We have implemented proposed method in simulation tools. The performance of CW-FTL is evaluated with different traces. From the mathematical analyses and
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors are thankful to the anonymous reviewers for their valuable feedback and comments toward this study. This work is supported in part by the National Natural Science Foundation of China (Grant 61872135), and the Xiangjiang Artificial Intelligence Academy (Grant 202021A03).
References (45)
- et al.
The harey tortoise: Managing heterogeneous write performance in SSDs
- et al.
Wear unleveling: Improving NAND flash lifetime by balancing page endurance
- et al.
Design tradeoffs for SSD reliability
- et al.
An aggressive worn-out flash block management scheme to alleviate SSD performance degradation
- et al.
Operating system support for dynamic over-provisioning of solid state drives
- et al.
Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis
- et al.
Optimizing NAND flash-based SSDs via retention relaxation
- et al.
Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime
- et al.
On-chip error correcting techniques for new-generation flash memories
Proc. IEEE
(2003) - et al.
Beating the I/O bottleneck: A case for log-structured file systems
SIGOPS Oper. Syst. Rev.
(1989)