Reducing SSD access latency via NAND flash program and erase suspension

https://doi.org/10.1016/j.sysarc.2013.12.002Get rights and content

Abstract

In NAND flash memory, once a page program or block erase (P/E) command is issued to a NAND flash chip, the subsequent read requests have to wait until the time-consuming P/E operation to complete. Preliminary results show that the lengthy P/E operations increase the read latency by 2× on average. This increased read latency caused by the contention may significantly degrade the overall system performance. Inspired by the internal mechanism of NAND flash P/E algorithms, we propose in this paper a low-overhead P/E suspension scheme, which suspends the on-going P/E to service pending reads and resumes the suspended P/E afterwards. Having reads enjoy the highest priority, we further extend our approach by making writes be able to preempt the erase operations in order to improve the write latency performance. In our experiments, we simulate a realistic SSD model that adopts multi-chip/channel and evaluate both SLC and MLC NAND flash as storage materials of diverse performance. Experimental results show the proposed technique achieves a near-optimal performance on servicing read requests. The write latency is significantly reduced as well. Specifically, the read latency is reduced on average by 46.5% compared to RPS (Read Priority Scheduling) and when using write–suspend–erase the write latency is reduced by 13.6% relative to FIFO.

Introduction

NAND flash-based SSDs have better random access performance over hard drives and have potential in high performance computing system market. However, NAND flash has performance and cost problems which limit its application [2]. The problem addressed in this paper is the read vs. program/erase (P/E) contention. Due to slow P/E speed of NAND flash, once P/E is committed to the flash chip, pending or subsequent read requests suffer from the prolonged service latency caused by the waiting time. In particular, the basic functional unit of the flash chip, i.e., one flash plane, is exclusively used to service the read, program, and erase operations [3]. As disk read requests are resulted from upper level cache misses, the compromised read latency of the disk causes degraded application performance. To reduce read latency, on-disk write buffers may avoid or postpone the write commitments to the flash [4], [5], [6]. Executing the garbage collection processes during the idle time of the drive [3], [7], or making them preempt-able to foreground requests [8], [9] may also alleviate the contention between read and P/E. Furthermore, the read requests can be prioritized in a pending list to reduce the queuing time caused by the P/E. However, none of these approaches preempt the committed P/E for read requests.

To address this read vs. P/E contention problem, we propose a P/E Suspension scheme for NAND flash that allows the execution of the P/E operations to be suspended so as to service the pending reads and then the suspended P/E is resumed. The internal process of the program operation is done in a “step-by-step” fashion (Incremental Step Pulse Programming, or ISPP [10]), and thus the program can be suspended at the interval of two consecutive steps, or the on-going step could be canceled and re-executed upon resumption. The erase process requires the duration of erase-voltage pulse to be satisfied, and thus the erase can also be suspended and resumed as long as we ensure the required timing. Having reads enjoy the highest priority, we further extend this scheme to enable program (write requests) to preempt erase operations in order to reduce the service latency of writes.

The implementation of P/E suspension for NAND flash involves minimal modifications to the flash interface, i.e., merely the “program suspend/resume” and erase suspend/resume” commands need to be added in the command set of the flash interface [11]. The interpretation of these new commands requires the support from the corresponding control logic inside the flash chip. As shown in Section 2, the control logic of P/E process is realized using a state machine [12], which keeps the track of the execution of the P/E steps and the timing of each step. To support P/E suspension, the control logic is required to determine the appropriate time to suspend the P/E (suspension point) and to maintain or retrieve the previous state of the suspended P/E so as to resume it. The implementation feasibility of the proposed schemes is based on the fundamental/typical circuitry of flash memories [12].

This paper makes the following contributions.

  • We analyze the impact of the long P/E latency on read performance, showing that even with the read prioritization scheduling, the read latency is still severely compromised.

  • By exploiting the internal mechanism of the P/E algorithms in NAND flash memory, we propose a low-overhead P/E suspension scheme which suspends the on-going P/E operations for servicing the pending read requests. In particular, two strategies for suspending the program operation, Inter Phase Suspension (IPS) and Intra Phase Cancelation (IPC) are proposed. In addition, we render the second priority to writes, which may preempt the erase operations.

  • Based on simulation experiments under various workloads, we demonstrate that the proposed design can significantly reduce the SSD read and write latency for both SLC and MLC NAND flash.

The rest of this paper is organized as follows: in Section 2, we give a brief overview of the background knowledge about NAND flash memory. In Section 3, we conduct simulations to show how the read latency is increased by chip contention. We describe our P/E suspension scheme in details in Section 4 and evaluate our approach via simulation experiments in Section 6. In Section 5, the request scheduling policy, miscellaneous implementation issues, as well as the overhead of our scheme on power consumption are further discussed. The related work is surveyed in Section 7. Finally we conclude our paper in Section 8.

Section snippets

Background and related work

In this section, we briefly overview the related background information, including the mechanism of NAND flash P/E, the organization of NAND cells on the chip, NAND chip interface, and the typical structure of SSD.

Motivation

In this section, we demonstrate how the read vs. P/E contention increases the read latency under various workloads. We have modified MS-add-on simulator [3] based on Disksim 4.0. Specifically, under the workloads of a variety of popular disk traces, we compare the read latency of two scheduling policies, FIFO and Read Priority Scheduling (RPS), to show the limitation of RPS. Furthermore, with RPS, we set the latency of program and erase operation to be equal to that of read and zero to justify

Design of P/E suspension scheme

In this section, the design of the implementation of P/E suspension is proposed in details. To realize the P/E suspension function, we seek for a low-cost solution, with which the user of NAND chip (the on-disk flash controller) only need to exploit this new flexibility by supporting the commands of P/E suspension and resumption while the actual implementation is done inside the chip.

Scheduling policy

We schedule the requests and suspension/resumption operations according to a priority-based policy. The highest priority is rendered to read requests, which are always scheduled ahead of writes and can preempt the on-going program and erase operations. The write requests can preempt only the erase operations, giving that there is no read requests pending for service. We allow nested suspension operations, i.e., a read request may preempt a program operation, which has preempted an erase

Evaluation

In this section, the proposed P/E suspension design is simulated with the same configuration and parameters as in Section 3. Under the workloads of the four traces used in Section 3, we evaluate the read/write latency performance gain and the overhead of P/E suspension. We demonstrate that the proposed design achieves a near-optimal read performance gain and the write performance is significantly improved as well.

Write and erase suspension

The idea of preempting low priority operations for high priority ones via breaking down an operation to small phases has been embodied in [22], [23], etc. Dimitrijevic et al. proposed Semi-preemptible IO [22] to divide HDD I/O requests to small disk commands to enable preemption for high priority requests. Similar to NAND flash, Phase Change Memory (PCM) has much larger write latency than read latency. Qureshi et al. proposed in [23] a few techniques to preempt the on-going writes of PCM for

Conclusion

One performance problem of NAND flash is that its program and erase latency is much higher than the read latency. This problem causes the chip contention between reads and P/Es due to the fact that with current NAND flash interface, the on-going P/E cannot be suspended and resumed. To alleviate the impact of the chip contention on the read performance, in this paper we propose a light-overhead P/E suspension scheme by exploiting the internal mechanism of P/E algorithm in NAND flash. We further

Guanying Wu received the PhD degree in engineering from Virginia Commonwealth University in 2013, the MS degree in computer engineering from Tennessee Technological University, USA, in 2009 and BS in electrical engineering from Zhejiang University, China, in 2007. His research interests lie in the areas of computer architecture, solid state storage, and embedded systems.

References (47)

  • G. Wu, X. He, Reducing SSD read latency via NAND flash program and erase suspension, in: Proceedings of FAST’2012,...
  • D. Narayanan, E. Thereska, A. Donnelly, S. Elnikety, A. Rowstron, Migrating server storage to ssds: analysis of...
  • N. Agrawal, V. Prabhakaran, et al., Design tradeoffs for SSD performance, in: Proceedings of USENIX ATC, 2008, pp....
  • H. Kim, S. Ahn, BPLRU: a buffer management scheme for improving random writes in flash storage, in: Proceedings of...
  • H. Jo et al.

    FAB: flash-aware buffer management policy for portable media players

    IEEE Trans. Consumer Electron.

    (2006)
  • S. Kang et al.

    Performance trade-offs in using NVRAM write buffer for flash memory-based storage devices

    IEEE Trans. Comput.

    (2009)
  • Y. Kim, S. Oral, G. Shipman, J. Lee, D. Dillow, F. Wang, Harmonia: a globally coordinated garbage collector for arrays...
  • J. Lee, Y. Kim, G.M. Shipman, S. Oral, F. Wang, J. Kim, A semi-preemptive garbage collector for solid state drives, in:...
  • W. Bux, X.-Y. Hu, I. Iliadis, R. Haas, Scheduling in flash-based solid-state drives – performance modeling and...
  • K. Arase, Semiconductor NAND type flash memory with incremental step pulse programming, US Patent 5,812,457 (September...
  • ONFI Working Group, The open NAND flash interface, http://onfi.org/,...
  • J. Brewer et al.

    Nonvolatile Memory Technologies with Emphasis on Flash

    (2008)
  • R. Bez et al.

    Introduction to flash memory

    Proc. IEEE

    (2003)
  • G. Wu et al.

    DiffECC: improving SSD read performance using differentiated error correction coding schemes

    MASCOTS

    (2010)
  • Samsung, http://www.samsung.com/global/business/semiconductor/products/fusionmemory/Products-OneNAND.html,...
  • Toshiba, http://www.toshiba.com/taec/news/press-releases/2006/memy-06-337.jsp,...
  • S.-W. Lee, W.-K. Choi, D.-J. Park, FAST: An Efficient Flash Translation Layer for Flash Memory, in: Lecture Notes in...
  • J.U. Kang, H. Jo, J.S. Kim, J. Lee, A superblock-based flash translation layer for nand flash memory, in: Proceedings...
  • A. Gupta, Y. Kim, B. Urgaonkar, DFTL: a flash translation layer employing demand-based selective caching of page-level...
  • Storage Performance Council, SPC trace file format specification, http://traces.cs.umass.edu/index.php/Storage/Storage,...
  • S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production...
  • Z. Dimitrijevic, R. Rangaswami, E. Chang, Design and implementation of semi-preemptible IO, in: Proceedings of...
  • M.K. Qureshi, M.M. Franceschini, L.A. Lastras-Montano, Improving read performance of phase change memories via write...
  • Cited by (5)

    Guanying Wu received the PhD degree in engineering from Virginia Commonwealth University in 2013, the MS degree in computer engineering from Tennessee Technological University, USA, in 2009 and BS in electrical engineering from Zhejiang University, China, in 2007. His research interests lie in the areas of computer architecture, solid state storage, and embedded systems.

    Ping Huang received the PhD degree in computer architecture from HuaZhong University of Science and Technology, China, in 2013. He is currently a postdoctoral research fellow in Virginia Commonwealth University, USA. His research interests focus on computer architecture, non-volatile memory and storage systems.

    Xubin He received the PhD degree in electrical engineering from University of Rhode Island, USA, in 2002 and both the BS and MS degrees in computer science from Huazhong University of Science and Technology, China, in 1995 and 1997, respectively. He is currently an associate professor in the Department of Electrical and Computer Engineering at Virginia Commonwealth University. His research interests include computer architecture, storage systems, virtualization, and high availability computing. He received the Ralph E. Powe Junior Faculty Enhancement Award in 2004 and the Sigma Xi Research Award (TTU Chapter) in 2005 and 2010. He is a senior member of the IEEE, a member of USENIX and the IEEE Computer Society.

    A preliminary version of this work was presented at 10th USENIX Conference on File and Storage Technologies (FAST’2012) [1]. This research is supported by the U.S. National Science Foundation (NSF) under Grant Nos. CCF-1102605, CCF-1102624, and CNS-1218960. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agency.

    View full text