Abstract
Replication and erasure coding are two alternative methods for disk arrays to deal with disk failures. This work concentrates on mirrored disk arrays, classified as RAID1, and hybrid disk arrays, which implement redundancy by storing XORed data blocks instead of replicas. We evaluate the reliability of disk arrays without and with repair using traditional reliability modeling techniques. A shortcut method based on asymptotic expansions is also used to compare the reliability of RAID(4+k) arrays with mirrored and hybrid disks. RAID1 with distributed redundancy attains more balanced disk loads and improved performance with respect to basic mirroring (BM) upon disk failure, but is less reliable than BM. Hybrid disk arrays incurring the same level of redundancy as RAID1 are more reliable than RAID1, but incur a higher cost for updates. The application of the asymptotic expansion method to hierarchical RAID shows that it is advantageous to associate higher redundancy with lower levels at the same overall redundancy overhead. It is also shown that sharing disk space sharing between RAID1 and RAID5 in heterogeneous disk arrays—HDAs may result in a lowered reliability. In addition to the classical rebuild model, we present an extension with a limited number of spares. Recovery methods based on reconfiguration from higher to lower reliability RAID arrays are also presented.
Similar content being viewed by others
Abbreviations
- BM:
-
Basic mirroring
- CD:
-
Chained declustering
- CTMC:
-
Continuous time Markov chain
- GRD:
-
Group rotate declustering
- HDA:
-
Heterogeneous disk array
- HRAID:
-
Hierarchical RAID
- ID:
-
Interleaved declustering
- LSI:
-
LSI logics’ RAID array
- MTTF:
-
Mean time to failure
- MTTR:
-
Mean time to repair
- MTTDL:
-
Mean time to data loss
- PDS:
-
Parity defining set (for Weaver codes)
- RAID:
-
Redundant array of independent disks
- SADA:
-
Self-adaptive disk array
- SSPiRAL:
-
Survivable storage using parity in redundant array layouts
- XOR:
-
eXclusive-OR
References
Amer, A., Long, D.D.E., Paris, J.F., Schwarz, T.: Increased reliability with SSPiRAL data layouts. In: Proceedings 16th IEEE Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecomm. Systems (MASCOTS’08), pp. 189–198. Baltimore, MD (2008)
Bachmat, E., Schindler, J.: Analysis of methods for scheduling low priority disk drive tasks. In: Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 55–65. Marina del Rey, CA (2002)
Blaum, M., Brady, J., Bruck, J., Menon, J., Vardy, A.: The EVENODD code and its generalization. In: Jin, H. et al. (eds.) Chapter 14 in High Performance Mass Storage and Parallel I/O: Technologies and Applications, pp. 187–208. IEEE & Wiley Press, New York (2002)
Chen, S.-Z., Towsley, D.F.: A performance evaluation of RAID architectures. IEEE Trans. Comput. 45(10), 1116–1130 (1996)
Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: RAID: High-performance, reliable secondary storage. ACM Computing Surveys 26(2), 145–185 (1994)
Gibson, G.A.: Redundant Disk Arrays: Reliable, Parallel Secondary Storage. MIT Press, Cambridge (1992)
Hafner, J.L.: WEAVER codes: highly fault tolerant erasure codes for storage systems. In: Proceedings 4th USENIX Conference on File and Storage Technologies (FAST’05), pp. 211–224. San Francisco, CA (2005)
Hsiao, H.-I., DeWitt, D.J.: Chained declustering: a new availability strategy for multiprocessor database machines. In: Proceedings of IEEE International Conference. on Data Engineering (ICDE’90), pp. 456–465. Los Angeles, CA (1990)
Iliadis, I., Venkatesan, V.: Expected annual fraction of data loss as a metric for data storage reliability. In: Proceedings of IEEE 22nd Int’l Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS’14), pp. 375–384. Paris, France (2014)
Jacob, B.L., Ng, S.W., Wang, D.T.: Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann, Burlington (2008)
Paris, J.F., Schwarz, T. J. E., Long, D.D.E.: Self-adaptive disk arrays. In: Proceedings of 8th International Symposium on Stabilization, Safety, and Security of Distributed Systems, pp. 469–483. Dallas, TX (2006)
Patterson, D.A.: A simple way to estimate the cost of downtime. In: Proceedings of 16th Conference on Systems Administration (LISA 2002), pp. 185–188. Philadelphia, PA (2002)
Schroeder, B., Gibson, G.A.: Understanding disk failure rates: what does an MTTF of 1,000,000 hours mean to you? ACM Trans. Storage 3(3), 8-1–8-31 (2007)
Thomasian, A.: Reconstruct versus read-modify writes in RAID. Inf. Process. Lett. 93(4), 163–168 (2005)
Thomasian, A.: Shortcut method for reliability comparisons in RAID5. J. Syst. Softw. 79(11), 1599–1605 (2006)
Thomasian, A., Blaum, M.: Mirrored disk organization reliability analysis. IEEE Trans. Comput. 55(12), 1640–1644 (2006)
Thomasian, A., Blaum, M.: Higher reliability redundant disk arrays: organization, operation, and coding. ACM Trans. Storage Syst. 5(3), 7:1–7:59 (2009)
Thomasian, A., Menon, J.: Performance analysis of RAID5 disk arrays with a vacationing server model for rebuild mode operation. In: Proceedings 10th International Conference on Data Engineering (ICDE), pp. 111–119. Houston, TX (1994)
Thomasian, A., Menon, J.: RAID5 performance with distributed sparing. IEEE Trans. Parallel Distrib. Syst. 8(6), 640–657 (1997)
Thomasian, A., Tang, Y.: Performance, reliability, and performability of a hybrid RAID array and a comparison with traditional RAID1 arrays. Clust. Comput. 15(3), 239–253 (2012)
Thomasian, A., Xu, J.: Reliability and performance of mirrored disk organizations. Comput. J. 51(6), 615–629 (2008)
Thomasian, A., Xu, J.: RAID level selection for heterogeneous disk arrays. Clust. Comput. 14(2), 115–127 (2011)
Thomasian, A., Xu, J.: Data allocation in a heterogeneous disk array (HDA) with multiple RAID levels for database applications. Comput. Syst. 21(5), 345–359 (2016). https://arxiv.org/abs/1510.04868
Thomasian, A., Tang, Y., Hu, Y.: Hierarchical RAID: design, performance, reliability, and recovery. J. Parallel Distrib. Comput. 72(12), 1753–1769 (2012)
Trivedi, K.S.: Probability and Statistics with Reliability, Queuing, and Computer Science Applications, 2nd edn. Wiley, New York (2001)
Wilkes, J., Golding, R., Staelin, C., Sullivan, T.: The HP AutoRAID hierarchical storage system. ACM Trans. Comput. Syst. 14(1), 108–136 (1996)
Wilner, A. Multiple drive failure tolerant RAID system. US Patent US 6,327,672 B1, LSI Logic Corporation, Milpitas, CA, (2001)
Acknowledgements
Dr. Jun Xu at NJIT and Dr. Yujie Tang at Shenzhen Institute of Advanced Technology: www.siat.ac.cn collaborated on research topics covered in this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thomasian, A. Mirrored and hybrid disk arrays and their reliability. Cluster Comput 22 (Suppl 1), 2485–2494 (2019). https://doi.org/10.1007/s10586-018-2127-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-2127-x