skip to main content
10.1145/3203217.3203236acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Modeling SSD RAID reliability under general settings

Published: 08 May 2018 Publication History

Abstract

Solid-state drives (SSDs) are susceptible to the limited number of program/erase (P/E) cycles and uncorrectable flash errors, and hence achieving high reliability of SSD storage systems is a critical issue. RAID provides a viable option for enhancing system reliability by distributing redundancy across a number of SSDs. However, the flash error rate of an SSD increases with the number of P/E cycles, and this time-varying nature complicates the reliability analysis of SSD RAID. In addition, there remains very limited formal analysis that quantifies the reliability dynamics of an SSD RAID array under general settings. To this end, we propose a new continuous time Markov chain (CTMC) model to characterize the reliability dynamics of SSD RAID over time under two general settings: (1) fault tolerance against a general number of device failures and (2) non-uniform workload. We validate the correctness of our CTMC model via trace-driven simulations. Based on our model, we further analyze the impact of different RAID parameters on the reliability dynamics of an SSD RAID array.

References

[1]
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design Tradeoffs for SSD Performance. In Proc. of USENIX ATC.
[2]
Mahesh Balakrishnan, Asim Kadav, Vijayan Prabhakaran, and Dahlia Malkhi. 2010. Differential RAID: Rethinking RAID for SSD Reliability. ACM Trans. on Storage 6, 2 (Jul 2010), 4.
[3]
John S. Bucy, Jiri Schindler, Steven W. Schlosser, and Gregory R. Ganger. 2008. The DiskSim Simulation Environment Version 4.0 Reference Manual. Technical Report CMUPDL-08-101. Carnegie Mellon University.
[4]
Yu Cai, E.F. Haratsch, O. Mutlu, and Ken Mai. 2012. Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis. In Proc. of DATE.
[5]
Feng Chen, David A. Koufaty, and Xiaodong Zhang. 2009. Understanding Intrinsic Characteristics and System Implications of Flash Memory Based Solid State Drives. In Proc. of ACM SIGMETRICS.
[6]
E. de Souza e Silva and H. R. Gail. 2000. Transient Solutions for Markov Chains. Computational Probability W. K. Grassmann (editor). Kluwer Academic Publishers (2000), 43--81.
[7]
Peter Desnoyers. 2012. Analytic Modeling of SSD Write Performance. In Proc. of SYSTOR.
[8]
J. G. Elerath and M. Pecht. 2007. Enhanced Reliability Modeling of RAID Storage Systems. In DSN.
[9]
Garth A. Gibson and David A. Patterson. 1993. Designing Disk Arrays for High Data Reliability. J. Parallel Distrib. Comput. 17, 1--2 (Jan. 1993), 4--27.
[10]
Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing Flash Memory: Anomalies, Observations, and Applications. In Proc. of IEEE/ACM MICRO.
[11]
Laura M. Grupp, John D. Davis, and Steven Swanson. 2012. The Bleak Future of NAND Flash Memory. In FAST.
[12]
Peter G. Harrison, Naresh M. Patel, and Soraya Zertal. 2010. Response Time Distribution of Flash Memory Accesses. Performance Evaluation 67, 4 (April 2010), 248 -- 259.
[13]
Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, and Roman Pletka. 2009. Write Amplification Analysis in Flash-based Solid State Drives. In Proc. of SYSTOR.
[14]
Ilias Iliadis and Vinodh Venkatesan. 2015. Beyond MTTDL: A Closed-Form RAID-6 Reliability Equation. Trans. Storage 11, 2, Article 9 (March 2015), 10 pages.
[15]
Soojun Im and Dongkun Shin. 2011. Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD. IEEE Trans. on Computers 60 (Jan 2011), 80--92.
[16]
Nikolaus Jeremic, Gero Mühl, Anselm Busse, and Jan Richling. 2011. The Pitfalls of Deploying Solid-state Drive RAIDs. In Proc. of SYSTOR.
[17]
Myoungsoo Jung and Mahmut Kandemir. 2013. Revisiting Widely Held SSD Expectations and Rethinking System-level Implications. In SIGMETRICS.
[18]
Jaeho Kim, Jongmin Lee, Jongmoo Choi, Donghee Lee, and S.H. Noh. 2013. Improving SSD Reliability with RAID via Elastic Striping and Anywhere Parity. In Proc. of IEEE/IFIP DSN.
[19]
Sehwan Lee, Bitna Lee, Kern Koh, and Hyokyung Bahn. 2011. A Lifespan-aware Reliability Scheme for RAID-based Flash Storage. In Proc. of SAC.
[20]
Yongkun Li, Helen H. W. Chan, Patrick P. C. Lee, and Yinlong Xu. 2016. Elastic Parity Logging for SSD RAID Arrays. In Proc. of IEEE/IFIP DSN.
[21]
Yongkun Li, Patrick P. C.Lee, and John C. S. Lui. 2013. Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization. In Proc. of SIGMETRICS.
[22]
Yongkun Li, Patrick P. C. Lee, and John C. S. Lui. 2016. Analysis of Reliability Dynamics of SSD RAID. IEEE Trans. on Computers 65, 4 (Apr 2016), 1131 -- 1144.
[23]
F. Machida, R. Xia, and K. Trivedi. 2015. Performability Modeling for RAID Storage Systems by Markov Regenerative Process. IEEE Transactions on Dependable and Secure Computing PP, 99 (2015), 1--1.
[24]
Bo Mao, Hong Jiang, Suzhen Wu, Lei Tian, Dan Feng, Jianxi Chen, and Lingfang Zeng. 2012. HPDA: A Hybrid Parity-based Disk Array for Enhanced Performance and Reliability. ACM Trans. on Storage 8, 1 (Feb 2012), 4.
[25]
Justin Meza, Qiang Wu, Sanjev Kumar, and Onur Mutlu. 2015. A Large-Scale Study of Flash Memory Failures in the Field. In Proceedings of ACM SIGMETRICS.
[26]
N. Mielke, T. Marquart, Ning Wu, J. Kessenich, H. Belgal, E. Schares, F. Trivedi, E. Goodness, and L.R. Nevill. 2008. Bit Error Rate in NAND Flash Memories. In IEEE Int. Reliability Physics Symp.
[27]
Sangwhan Moon and A. L. Narasimha Reddy. 2013. Don't Let RAID Raid the Lifetime of Your SSD Array. In HotStorage.
[28]
Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. 2009. Migrating Server Storage to SSDs: Analysis of Tradeoffs. In Proc. of ACM EuroSys.
[29]
Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-Defined Flash for Web-Scale Internet Storage Systems. In Proc. of ACM ASPLOS.
[30]
Kwanghee Park, Dong-Hwan Lee, Youngjoo Woo, Geunhyung Lee, Ju-Hong Lee, and Deok-Hwan Kim. 2009. Reliability and Performance Enhancement Technique for SSD Array Storage System Using RAID Mechanism. In IEEE ISCIT.
[31]
David A. Patterson, Garth Gibson, and RandyH. Katz. 1988. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proc. of ACM SIGMOD.
[32]
J.S. Plank, J. Luo, C.D. Schuman, L. Xu, and Z. Wilcox-O'Hearn. 2009. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage. In Proc. of USENIX FAST.
[33]
E. W. D. Rozier, W. Belluomini, V. Deenadhayalan, J. Hafner, K. Rao, and P. Zhou. 2009. Evaluating the Impact of Undetected Disk Errors in RAID systems. In 2009 DSN. 83--92.
[34]
Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. XORing Elephants: Novel Erasure Codes for Big Data. Proc. VLDB Endow. 6, 5 (March 2013), 12.
[35]
Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash Reliability in Production: The Expected and the Unexpected. In Proc. of USENIX FAST.
[36]
Jonathan Thatcher, Tom Coughlin, Jim Handy, and Neal Ekker. 2009. NAND Flash Solid State Storage for the Enterprise: An In-depth Look at Reliability. In SNIA report.
[37]
Benny Van Houdt. 2013. A Mean Field Model for a Class of Garbage Collection Algorithms in Flash-based Solid State Drives. In Proc. of ACM SIGMETRICS.
[38]
Benny Van Houdt. 2013. Performance of Garbage Collection Algorithms for Flash-based Solid State Drives with Hot/cold Data. Performance Evaluation 70, 10 (Sep 2013), 692 -- 703.
[39]
W. Weibull. 1951. A Statistical Distribution Function of Wide Applicability. Journal of Applied Mechanics 18 (1951), 293--297.

Cited By

View all
  • (2019)Fault Tolerance in Distributed Database Management Systems - Improving reliability with RAID2019 Innovations in Power and Advanced Computing Technologies (i-PACT)10.1109/i-PACT44901.2019.8960197(1-5)Online publication date: Mar-2019

Index Terms

  1. Modeling SSD RAID reliability under general settings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CF '18: Proceedings of the 15th ACM International Conference on Computing Frontiers
    May 2018
    401 pages
    ISBN:9781450357616
    DOI:10.1145/3203217
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CTMC
    2. SSD RAID
    3. reliability
    4. transient analysis

    Qualifiers

    • Research-article

    Conference

    CF '18
    Sponsor:
    CF '18: Computing Frontiers Conference
    May 8 - 10, 2018
    Ischia, Italy

    Acceptance Rates

    Overall Acceptance Rate 273 of 785 submissions, 35%

    Upcoming Conference

    CF '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Fault Tolerance in Distributed Database Management Systems - Improving reliability with RAID2019 Innovations in Power and Advanced Computing Technologies (i-PACT)10.1109/i-PACT44901.2019.8960197(1-5)Online publication date: Mar-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media