skip to main content
research-article

Rebuild processing in RAID5 with emphasis on the supplementary parity augmentation method[37]

Published: 31 May 2012 Publication History

Abstract

The rotated parity RAID5 disk array tolerates single disk failures by continuing operation by on-demand reconstruction of data blocks of the failed disk, until the systematic reconstruction of the contents of the failed disk is completed by the rebuild process on a spare disk. Supplementary Parity Augmentation (SPA), unlike the pyramid code, which has two parities covering half of the arrays disks each, extends RAID5's P parity with an additional S parity, which covers half of the disks. The extra load with respect to RAID5 of updating the S parity by one half of the disks is compensated by the more efficient on demand reconstructtion and rebuild processing when a disk fails. Although SPA has the same disk space redundancy level as RAID6, unlike RAID6 it can only deal with roughly half of all possible double disk failure cases for eight disks. For rebuild processing SPA reads half of the disks required by RAID5 and this leads to a higher Mean Time to Data Loss than RAID5, since fewer Latent Sector Errors are encountered. We review performance and reliability modeling of RAID5 arrays to provide insights into SPA's performance and reliability, which cannot be gained from numerical results alone. SPA is outperformed by the Intra-Disk Redundancy schemes combined with RAID5, which results in RAID6's reliability and RAID5 performance.

References

[1]
M. Blaum, J. Brady, J. Bruck, and J. Menon. "EVENODD: An Efficient scheme for tolerating double disk failures in RAID architectures", IEEE Trans. Computers 44(2): 192--202 (February 1995).
[2]
A. Blum, A. Goyal, P. Heidelberger, S. S. Lavenberg,M. Nakayama, and P. Shahabuddin. "Modeling and analysis of system dependability using the system availability estimator", In Proc. 24th IEEE Ann'l Int'l Symp. on Fault-Tolerant Computing Systems (FTCS), Austin, TX, June 1994, 137--141.
[3]
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. "RAID: High-performance, reliable secondary storage", ACM Computing Surveys 26(2): 145--185 (June 1994).
[4]
P. F. Corbett, R. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. "Row-diagonal parity for double disk failure correction", Proc. USENIX Conf. on File and Storage Technologies (FAST'04), San Francisco, CA, March-April 2004, 1--14.
[5]
A. Dholakia, E. Eleftheriou, X.-Y. Hu, I. Iliadis, J. Menon, and K. K. Rao. "A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors", ACM Trans. on Storage (TOS) 4(1): article 1 (May 2008).
[6]
G. A. Gibson. Redundant Disk Arrays: Reliable, Parallel Secondary Storage, The MIT Press, 1992.
[7]
M. Holland, G. A. Gibson, and D. P. Siewiorek. "Architectures and algorithms for on-line failure recovery in redundant disk arrays", Distributed and Parallel Databases 2(3): 295--335 (July 1994).
[8]
R. Y. Hou, J. Menon, and Y. N. Patt. "Balancing I/O response time and disk rebuild time in a RAID5 disk array", In Proc. 26th Hawaii Int'l Conf. System Sciences (HICSS 26), Vol. I, Honolulu, HI, January 1993, 70--79.
[9]
C. Huang, M. Chen, and J. Li. "Pyramid codes: Flexible schemes to trade space for access efficiency", Proc. 6th Int'l Symp. on Network Computing and Applications (NCA 2007), Cambridge, MA, July 2007, 79--86.
[10]
I. Iliadis, R. Haas, X.-Y. Hu, and E. Eleftheriou. "Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems", ACM Trans. on Storage (TOS) 7(2): Article 5 (July 2011).
[11]
B. L. Jacob, S. W. Ng, and D. T. Wang. Memory Systems: Cache, DRAM, Disk, Elsevier, 2008.
[12]
H. H. Kari. "Latent Sector Faults and Relability of Disk Arrays", PhD Thesis, University of Helsinki, Finland, 1977.
[13]
L. Kleinrock. Queueing Systems, Vol. I: Theory; Vol. II: Applications to Computer Systems, Wiley-Interscience, 1975/76.
[14]
E. Krevat, J. Tucek, and G. R. Ganger. "Disks are like snowflakes: No two are alike", In Proc. 13th Workshop on Hot Topics in Operating Systems (HotOS 2011), Napa Valley, CA. May 2011.
[15]
J. Y. B. Lee and J. C. S. Lui. "Automatic recovery from disk failure in continuous-media servers", IEEE Trans. Parallel Distributed Systems (TPDS) 13(5): 499--515 (May 2002).
[16]
S. W. Ng and R. L. Mattson. "Uniform parity group distribution in disk arrays with multiple disk failures", IEEE Trans. Computers 43(4): 501--506 (April 1994).
[17]
V. Nicola. M. Nakayama, P. Heidelberger, and A. Goyal. "Fast simulation of highly dependable systems with general failure and repair processes", IEEE Trans. Computers 42(12): 1440--1452 (December 1993).
[18]
J. Menon. "Performance of RAID5 disk arrays with read and write caching", Distributed and Parallel Databases 2(3): 261--293 (July 1994).
[19]
A. Merchant and P. S. Yu. "Analytic modeling of clustered RAID with mapping based on nearly random permutation", IEEE Trans. Computers 45(3): 367--373 (March 1996).
[20]
R. R. Muntz and J. C. S. Lui. "Performance analysis of disk arrays under failure", Proc. 16th Int'l Conf. on Very Large Data Bases (VLDB'90), Brisbane, Queensland, Australia, August 1990, 162--173.
[21]
Y. W. Ng and A. Avizienis. "A unified reliability model for fault-tolerant computers", IEEE Trans. Computers 29(11): 1002--1011 (November 1980).
[22]
K. K. Ramakrishnan, P. Biswas, and R. Karedla. "Analysis of file I/O traces in commercial computing environments", Proc. ACM SIGMETRICS/PERFORMANCE'92 Int'l Conf. on Measurement and Modeling of Computer Systems, Newport, Rhode Island, June 1992, 78--90.
[23]
K. K. Rao, J. L. Hafner, and R. A. Golding. "Reliability for networked storage nodes", IEEE Trans. Dependable Secure Computing (TPDS) 8(3): 404--418 (May-June 2011).
[24]
B. Schroeder and G. A. Gibson. "Understanding disk failure rates: What does anMTTF of 1,000,000 hours mean to you?", ACM Trans. on Storage (TOS) 3(3): article 8 (October 2007).
[25]
B. Schroeder, S. Damouras, and P. Gill. "Understanding latent sector errors and how to protect against them", ACM Trans. on Storage (TOS) 6(3): article 2 (2010).
[26]
H. Takagi. Queueing Analysis: A Foundation of Performance Evaluation Vacation and Priority Systems, Part 1, North-Holland, 1991.
[27]
A. Thomasian and J. Menon. "Performance analysis of RAID5 disk arrays with a vacationing server model for rebuild mode operation", Proc. 10th IEEE Int'l Conf. on Data Engineering (ICDE), Houston, TX, February 1994, 111--119.
[28]
A. Thomasian. "Rebuild options in RAID5 disk arrays", Proc. 7th IEEE Symp. Parallel and Distributed Systems, San Antonio, TX, October 1995, 511--518.
[29]
A. Thomasian and J. Menon. "RAID5 performance with distributed sparing", IEEE Trans. Parallel and Distributed Systems 8(6): June 1997, 640--657.
[30]
A. Thomasian. "Clustered RAID arrays and their access costs", The Computer Journal 48(6): 702--713 (November 2005).
[31]
A. Thomasian, G. Fu, and C. Han. "Performance of two-disk failure-tolerant disk arrays". IEEE Trans. Computers 56(6): 799--814 (June 2007).
[32]
A. Thomasian, G. Fu, and S. W. Ng. "Analysis of rebuild processing in RAID5 disk arrays", The Computer Journal 50(2): 217--231 (March 2007).
[33]
A. Thomasian and M. Blaum. "Higher reliability redundant disk arrays: Organization, operation, and coding", ACM Trans. on Storage 5(3): article 7 (November 2009).
[34]
A. Thomasian. "Survey and analysis of disk scheduling methods", ACM SIGARCH Computer Architecture News 39(2): 8--25 (May 2011).
[35]
A. Thomasian and J. Xu. "X-code double parity array operation with two disk failures", Information Processing Letters (IPL) 111(12): 568--574 (June 2011).
[36]
A. Thomasian and J. Xu. "Data allocation in heterogeneous disk arrays", Proc. 6th Int'l Conf. on Networking, Architecture, and Storage (NAS'11), Dalian, China, July 2011, 82--91.
[37]
L. Tian, Q. Cao, H. Jiang, D. Feng, C. Xie, and Q. Xin. "Online availability upgrades for parity-based RAIDs through supplementary parity augmentations", ACM Trans. on Storage (TOS) 6(4): article 17 (May 2011).
[38]
K. S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications, Wiley, 2001.
[39]
Z. Wang, A. G. Dimakis, and J. Bruck. "Rebuilding for array codes in distributed storage systems", Proc. IEEE GLOBECOM Workshop on Application of Communication Theory to Emerging Memory Technologies, Miami, FL, December 2010, 1995--1999.
[40]
B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou. "Scalable performance of the Panasas parallel file system", In Proc. 6th USENIX Conf. on File and Storage Technologies (FAST'08), San Jose, CA, February 2008, 17--33.
[41]
L. Xiang, Y. Xu, J. C. S. Lui, and Q. Chang. "Optimal recovery of single disk failure in RDP code storage systems", Proc. ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems, New York, NY, June 2010, 119--130.
[42]
L. Xu and J. Bruck. "X-code: MDS array codes with optimal encoding", IEEE Trans. Information Theory 45(1): 272--276 (January 1999).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 40, Issue 2
May 2012
49 pages
ISSN:0163-5964
DOI:10.1145/2234336
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2012
Published in SIGARCH Volume 40, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media