Reducing SSD access latency via NAND flash program and erase suspension☆
Introduction
NAND flash-based SSDs have better random access performance over hard drives and have potential in high performance computing system market. However, NAND flash has performance and cost problems which limit its application [2]. The problem addressed in this paper is the read vs. program/erase (P/E) contention. Due to slow P/E speed of NAND flash, once P/E is committed to the flash chip, pending or subsequent read requests suffer from the prolonged service latency caused by the waiting time. In particular, the basic functional unit of the flash chip, i.e., one flash plane, is exclusively used to service the read, program, and erase operations [3]. As disk read requests are resulted from upper level cache misses, the compromised read latency of the disk causes degraded application performance. To reduce read latency, on-disk write buffers may avoid or postpone the write commitments to the flash [4], [5], [6]. Executing the garbage collection processes during the idle time of the drive [3], [7], or making them preempt-able to foreground requests [8], [9] may also alleviate the contention between read and P/E. Furthermore, the read requests can be prioritized in a pending list to reduce the queuing time caused by the P/E. However, none of these approaches preempt the committed P/E for read requests.
To address this read vs. P/E contention problem, we propose a P/E Suspension scheme for NAND flash that allows the execution of the P/E operations to be suspended so as to service the pending reads and then the suspended P/E is resumed. The internal process of the program operation is done in a “step-by-step” fashion (Incremental Step Pulse Programming, or ISPP [10]), and thus the program can be suspended at the interval of two consecutive steps, or the on-going step could be canceled and re-executed upon resumption. The erase process requires the duration of erase-voltage pulse to be satisfied, and thus the erase can also be suspended and resumed as long as we ensure the required timing. Having reads enjoy the highest priority, we further extend this scheme to enable program (write requests) to preempt erase operations in order to reduce the service latency of writes.
The implementation of P/E suspension for NAND flash involves minimal modifications to the flash interface, i.e., merely the “program suspend/resume” and erase suspend/resume” commands need to be added in the command set of the flash interface [11]. The interpretation of these new commands requires the support from the corresponding control logic inside the flash chip. As shown in Section 2, the control logic of P/E process is realized using a state machine [12], which keeps the track of the execution of the P/E steps and the timing of each step. To support P/E suspension, the control logic is required to determine the appropriate time to suspend the P/E (suspension point) and to maintain or retrieve the previous state of the suspended P/E so as to resume it. The implementation feasibility of the proposed schemes is based on the fundamental/typical circuitry of flash memories [12].
This paper makes the following contributions.
- •
We analyze the impact of the long P/E latency on read performance, showing that even with the read prioritization scheduling, the read latency is still severely compromised.
- •
By exploiting the internal mechanism of the P/E algorithms in NAND flash memory, we propose a low-overhead P/E suspension scheme which suspends the on-going P/E operations for servicing the pending read requests. In particular, two strategies for suspending the program operation, Inter Phase Suspension (IPS) and Intra Phase Cancelation (IPC) are proposed. In addition, we render the second priority to writes, which may preempt the erase operations.
- •
Based on simulation experiments under various workloads, we demonstrate that the proposed design can significantly reduce the SSD read and write latency for both SLC and MLC NAND flash.
The rest of this paper is organized as follows: in Section 2, we give a brief overview of the background knowledge about NAND flash memory. In Section 3, we conduct simulations to show how the read latency is increased by chip contention. We describe our P/E suspension scheme in details in Section 4 and evaluate our approach via simulation experiments in Section 6. In Section 5, the request scheduling policy, miscellaneous implementation issues, as well as the overhead of our scheme on power consumption are further discussed. The related work is surveyed in Section 7. Finally we conclude our paper in Section 8.
Section snippets
Background and related work
In this section, we briefly overview the related background information, including the mechanism of NAND flash P/E, the organization of NAND cells on the chip, NAND chip interface, and the typical structure of SSD.
Motivation
In this section, we demonstrate how the read vs. P/E contention increases the read latency under various workloads. We have modified MS-add-on simulator [3] based on Disksim 4.0. Specifically, under the workloads of a variety of popular disk traces, we compare the read latency of two scheduling policies, FIFO and Read Priority Scheduling (RPS), to show the limitation of RPS. Furthermore, with RPS, we set the latency of program and erase operation to be equal to that of read and zero to justify
Design of P/E suspension scheme
In this section, the design of the implementation of P/E suspension is proposed in details. To realize the P/E suspension function, we seek for a low-cost solution, with which the user of NAND chip (the on-disk flash controller) only need to exploit this new flexibility by supporting the commands of P/E suspension and resumption while the actual implementation is done inside the chip.
Scheduling policy
We schedule the requests and suspension/resumption operations according to a priority-based policy. The highest priority is rendered to read requests, which are always scheduled ahead of writes and can preempt the on-going program and erase operations. The write requests can preempt only the erase operations, giving that there is no read requests pending for service. We allow nested suspension operations, i.e., a read request may preempt a program operation, which has preempted an erase
Evaluation
In this section, the proposed P/E suspension design is simulated with the same configuration and parameters as in Section 3. Under the workloads of the four traces used in Section 3, we evaluate the read/write latency performance gain and the overhead of P/E suspension. We demonstrate that the proposed design achieves a near-optimal read performance gain and the write performance is significantly improved as well.
Write and erase suspension
The idea of preempting low priority operations for high priority ones via breaking down an operation to small phases has been embodied in [22], [23], etc. Dimitrijevic et al. proposed Semi-preemptible IO [22] to divide HDD I/O requests to small disk commands to enable preemption for high priority requests. Similar to NAND flash, Phase Change Memory (PCM) has much larger write latency than read latency. Qureshi et al. proposed in [23] a few techniques to preempt the on-going writes of PCM for
Conclusion
One performance problem of NAND flash is that its program and erase latency is much higher than the read latency. This problem causes the chip contention between reads and P/Es due to the fact that with current NAND flash interface, the on-going P/E cannot be suspended and resumed. To alleviate the impact of the chip contention on the read performance, in this paper we propose a light-overhead P/E suspension scheme by exploiting the internal mechanism of P/E algorithm in NAND flash. We further
Guanying Wu received the PhD degree in engineering from Virginia Commonwealth University in 2013, the MS degree in computer engineering from Tennessee Technological University, USA, in 2009 and BS in electrical engineering from Zhejiang University, China, in 2007. His research interests lie in the areas of computer architecture, solid state storage, and embedded systems.
References (47)
- G. Wu, X. He, Reducing SSD read latency via NAND flash program and erase suspension, in: Proceedings of FAST’2012,...
- D. Narayanan, E. Thereska, A. Donnelly, S. Elnikety, A. Rowstron, Migrating server storage to ssds: analysis of...
- N. Agrawal, V. Prabhakaran, et al., Design tradeoffs for SSD performance, in: Proceedings of USENIX ATC, 2008, pp....
- H. Kim, S. Ahn, BPLRU: a buffer management scheme for improving random writes in flash storage, in: Proceedings of...
- et al.
FAB: flash-aware buffer management policy for portable media players
IEEE Trans. Consumer Electron.
(2006) - et al.
Performance trade-offs in using NVRAM write buffer for flash memory-based storage devices
IEEE Trans. Comput.
(2009) - Y. Kim, S. Oral, G. Shipman, J. Lee, D. Dillow, F. Wang, Harmonia: a globally coordinated garbage collector for arrays...
- J. Lee, Y. Kim, G.M. Shipman, S. Oral, F. Wang, J. Kim, A semi-preemptive garbage collector for solid state drives, in:...
- W. Bux, X.-Y. Hu, I. Iliadis, R. Haas, Scheduling in flash-based solid-state drives – performance modeling and...
- K. Arase, Semiconductor NAND type flash memory with incremental step pulse programming, US Patent 5,812,457 (September...
Nonvolatile Memory Technologies with Emphasis on Flash
Introduction to flash memory
Proc. IEEE
DiffECC: improving SSD read performance using differentiated error correction coding schemes
MASCOTS
Cited by (5)
EDC: An Elastic Data Cache to Optimizing the I/O Performance in Deduplicated SSDs
2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsAn Efficient Data Migration Scheme to Optimize Garbage Collection in SSDs
2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and SystemsLatency/wearout in a flash-based storage system with replication on write
2019, Conference of Open Innovation Association, FRUCTICS: Interrupt-Based Channel Sneaking for Maximally Exploiting Die-Level Parallelism of NAND Flash-Based Storage Devices
2018, IEEE Transactions on Very Large Scale Integration (VLSI) SystemsHeuristic map tiles prefetch strategy based on road network analysis
2018, Proceedings - 2017 10th International Symposium on Computational Intelligence and Design, ISCID 2017
Guanying Wu received the PhD degree in engineering from Virginia Commonwealth University in 2013, the MS degree in computer engineering from Tennessee Technological University, USA, in 2009 and BS in electrical engineering from Zhejiang University, China, in 2007. His research interests lie in the areas of computer architecture, solid state storage, and embedded systems.
Ping Huang received the PhD degree in computer architecture from HuaZhong University of Science and Technology, China, in 2013. He is currently a postdoctoral research fellow in Virginia Commonwealth University, USA. His research interests focus on computer architecture, non-volatile memory and storage systems.
Xubin He received the PhD degree in electrical engineering from University of Rhode Island, USA, in 2002 and both the BS and MS degrees in computer science from Huazhong University of Science and Technology, China, in 1995 and 1997, respectively. He is currently an associate professor in the Department of Electrical and Computer Engineering at Virginia Commonwealth University. His research interests include computer architecture, storage systems, virtualization, and high availability computing. He received the Ralph E. Powe Junior Faculty Enhancement Award in 2004 and the Sigma Xi Research Award (TTU Chapter) in 2005 and 2010. He is a senior member of the IEEE, a member of USENIX and the IEEE Computer Society.
- ☆
A preliminary version of this work was presented at 10th USENIX Conference on File and Storage Technologies (FAST’2012) [1]. This research is supported by the U.S. National Science Foundation (NSF) under Grant Nos. CCF-1102605, CCF-1102624, and CNS-1218960. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agency.