Abstract
Modern data storage systems employ advanced erasure codes to protect data from storage node failures because of their ability to provide high data reliability at high storage efficiency. In contrast to previous studies, we consider the practical case where the length of codewords in an erasure coded system is much smaller than the number of storage nodes in the system. In this case, there exists a large number of possible ways in which different codewords can be stored across the nodes of the system. In this paper, it is shown that a declustered placement of codewords can significantly improve system reliability compared to other placement schemes. A detailed reliability analysis is presented that accounts for the rebuild times involved, the amounts of partially rebuilt data when additional nodes fail during rebuild, and an intelligent rebuild process that attempts to rebuild the most critical codewords first.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (RAID). In: Proc. 1988 ACM SIGMOD Int’l Conference on Management of Data, pp. 109–116 (1988)
Chen, P.M., Lee, E.K., Gibson, G.A., Katz, R.H., Patterson, D.A.: RAID: high-performance, reliable secondary storage. ACM Computing Surveys 26(2), 145–185 (1994)
Thomasian, A., Blaum, M.: Higher reliability redundant disk arrays: Organization, operation, and coding. ACM Trans. Storage 5(3), 1–59 (2009)
Leong, D., Dimakis, A.G., Ho, T.: Distributed storage allocation for high reliability. In: Proc. IEEE Int’l Conference on Communications, pp. 1–6 (2010)
Leslie, M., Davies, J., Huffman, T.: A comparison of replication strategies for reliable decentralised storage. Journal of Networks 1(6), 36–44 (2006)
Thomasian, A., Blaum, M.: Mirrored disk organization reliability analysis. IEEE Transactions on Computers 55, 1640–1644 (2006)
Li, X., Lillibridge, M., Uysal, M.: Reliability analysis of deduplicated and erasure-coded storage. ACM SIGMETRICS Performance Evaluation Review 38(3), 4–9 (2011)
Xin, Q., Miller, E.L., Schwarz, T.J.E.: Evaluation of distributed recovery in large-scale storage systems. In: Proc. 13th IEEE Int’l Symposium on High Performance Distributed Computing (HPDC 2004), pp. 172–181 (2004)
Venkatesan, V., Iliadis, I., Fragouli, C., Urbanke, R.: Reliability of clustered vs. declustered replica placement in data storage systems. In: Proc. 19th Annual IEEE/ACM Int’l Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2011), pp. 307–317 (2011)
Venkatesan, V., Iliadis, I., Haas, R.: Reliability of data storage systems under network rebuild bandwidth constraints. In: Proc. 20th Annual IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2012), pp. 189–197 (2012)
Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: A quantitative comparison. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 328–338. Springer, Heidelberg (2002)
Plank, J.S., Huang, C.: Tutorial: Erasure coding for storage applications. Slides presented at 11th Usenix Conference on File and Storage Technologies (FAST 2013) (February 2013)
Greenan, K.M., Miller, E.L., Wylie, J.: Reliability of flat XOR-based erasure codes on heterogeneous devices. In: Proc. 38th Annual IEEE/IFIP Int’l Conference on Dependable Systems and Networks (DSN 2008), pp. 147–156 (June 2008)
Venkatesan, V., Iliadis, I.: A general reliability model for data storage systems. In: Proc. 9th Int’l Conference on Quantitative Evaluation of Systems (QEST 2012), pp. 209–219 (2012)
Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: Proc. 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2010), pp. 61–74 (2010)
Ramabhadran, S., Pasquale, J.: Analysis of long-running replicated systems. In: Proc. 25th IEEE Int’l Conference on Computer Communications (INFOCOM 2006), pp. 1–9 (2006)
Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network coding for distributed storage. Proceedings of the IEEE 99(3) (2011)
IBM: XiV Storage System Specifications, http://www.xivstorage.com
Venkatesan, V., Iliadis, I.: Effect of codeword placement on the reliability of erasure coded data storage systems. Technical Report RZ 3827, IBM Research - Zurich (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Venkatesan, V., Iliadis, I. (2013). Effect of Codeword Placement on the Reliability of Erasure Coded Data Storage Systems. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds) Quantitative Evaluation of Systems. QEST 2013. Lecture Notes in Computer Science, vol 8054. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40196-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-40196-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40195-4
Online ISBN: 978-3-642-40196-1
eBook Packages: Computer ScienceComputer Science (R0)