ABSTRACT
The update performance in erasure-coded data centers is often bottlenecked by the constrained cross-rack bandwidth. We propose CAU, a cross-rack-aware update mechanism that aims to mitigate the cross-rack update traffic in erasure-coded data centers. CAU builds on three design elements: (i) selective parity updates, which select the appropriate parity update approach based on the update pattern and the data layout to reduce the cross-rack update traffic; (ii) data grouping, which relocates and groups updated data chunks in the same rack to further reduce the cross-rack update traffic; and (iii) interim replication, which stores a temporary replica for each newly updated data chunk. We evaluate CAU via trace-driven analysis, local cluster experiments, and Amazon EC2 experiments. We show that CAU enhances state-of-the-arts by mitigating the cross-rack update traffic as well as maintaining high update performance in both local cluster and geo-distributed environments.
- 2011. HDFS RAID. http://wiki.apache.org/hadoop/HDFS-RAID. (2011).Google Scholar
- M. Aguilera, R. Janakiraman, and L. Xu. 2005. Using Erasure Codes Efficiently for Storage in a Distributed System. In Proc. of IEEE/IFIP DSN. Google ScholarDigital Library
- F. Ahmad, S. Chakradhar, A. Raghunathan, and T. Vijaykumar. 2014. Shuffle-Watcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters. In Proc. of USENIX ATC. Google ScholarDigital Library
- T. Benson, A. Akella, and D. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proc. of ACM IMC. Google ScholarDigital Library
- B. Calder, J. Wang, A. Ogus, et al. 2011. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. In Proc. of ACM SOSP. Google ScholarDigital Library
- J. Chan, Q. Ding, P. Lee, and H. Chan. 2014. Parity Logging with Reserved Space: Towards Efficient Updates and Recovery in Erasure-Coded Clustered Storage. In Proc. of USENIX FAST. Google ScholarDigital Library
- Y. Chen, S. Mu, J. Li, C. Huang, J. Li, A. Ogus, and D. Phillips. 2017. Giza: Erasure Coding Objects across Global Data Centers. In Proc. of USENIX ATC. Google ScholarDigital Library
- M. Chowdhury, S. Kandula, and I. Stoica. 2013. Leveraging Endpoint Flexibility in Data-Intensive Clusters. In Proc. of ACM SIGCOMM. Google ScholarDigital Library
- A. Cidon, R. Escriva, S. Katti, M. Rosenblum, and E. Sirer. 2015. Tiered Replication: A Cost-effective Alternative to Full Cluster Geo-replication. In Proc. of USENIX ATC. Google ScholarDigital Library
- Peter Corbett, Bob English, Atul Goel, Tomislav Grcanac, Steven Kleiman, James Leong, and Sunitha Sankar. 2004. Row-diagonal Parity for Double Disk Failure Correction. In Proc. of USENIX FAST. Google ScholarDigital Library
- D. Ford, F. Labelle, F. Popovici, M. Stokely, V. Truong, L. Barroso, C. Grimes, and S. Quinlan. 2010. Availability in Globally Distributed Storage Systems. In Proc. of USENIX OSDI. Google ScholarDigital Library
- S. Frolund, A. Merchant, Y. Saito, S. Spence, and A. Veitch. 2004. A Decentralized Algorithm for Erasure-Coded Virtual Disks. In Proc. of IEEE/IFIP DSN. Google ScholarDigital Library
- P. Gill, N. Jain, and N. Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proc. of ACM SIGCOMM. Google ScholarDigital Library
- Y. Hu, X. Li, M. Zhang, P. Lee, X. Zhang, P. Zhou, and D. Feng. 2017. Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice. ACM Trans. on Storage 13, 4 (2017). Google ScholarDigital Library
- C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. 2012. Erasure Coding in Windows Azure Storage. In Proc. of USENIX ATC. Google ScholarDigital Library
- V. Jalaparti, P. Bodik, I. Menache, S. Rao, K. Makarychev, and M. Caesar. 2015. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can. In Proc. of ACM SIGCOMM. Google ScholarDigital Library
- Geoffrey Lefebvre and Michael J. Feeley. 2004. Separating Durability and Availability in Self-Managed Storage. In Proc. of ACM SIGOPS European Workshop. Google ScholarDigital Library
- H. Li, Y. Zhang, Z. Zhang, S. Liu, D. Li, X. Liu, and Y. Peng. 2017. PARIX: Speculative Partial Writes in Erasure-Coded Systems. In Proc. of USENIX ATC. Google ScholarDigital Library
- R. Li, Y. Hu, and P. Lee. 2017. Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems. IEEE Trans. on Parallel and Distributed Systems 28, 9 (2017), 2500--2513.Google ScholarDigital Library
- S. Li, Q. Zhang, Z. Yang, and Y. Dai. 2017. BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. In Proc. of IEEE MSST.Google Scholar
- S. Muralidhar, W. Lloyd, S. Roy, et al. 2014. F4: Facebook's Warm Blob Storage System. In Proc. of USENIX OSDI. Google ScholarDigital Library
- Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write Off-loading: Practical Power Management for Enterprise Storage. ACM Trans. on Storage 4, Article 10 (2008), 23 pages. Google ScholarDigital Library
- M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly. 2013. The Quantcast File System. Proc. of the VLDB Endowment 6, 11 (2013), 1092--1101. Google ScholarDigital Library
- X. Pei, Y. Wang, X. Ma, and F. Xu. 2016. T-Update: A Tree-structured Update Scheme with Top-down Transmission in Erasure-coded Systems. In Proc. of IEEE INFOCOM.Google Scholar
- J. Plank. 1997. A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems. Software - Practice & Experience 27, 9 (1997), 995--1012. Google ScholarDigital Library
- J. Plank, S. Simmerman, and C. Schuman. 2008. Jerasure: A Library in C/C++ Facilitating Erasure Coding for Storage Applications-Version 1.2. University of Tennessee, Tech. Rep. CS-08-627 23 (2008).Google Scholar
- K. Rashmi, N. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. 2013. A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster. In USENIX Workshop on HotStorage. Google ScholarDigital Library
- I. Reed and G. Solomon. 1960. Polynomial Codes over Certain Finite Fields. Journal of the Society for Industrial & Applied Mathematics 8, 2 (1960), 300--304.Google ScholarCross Ref
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. 2013. Xoring Elephants: Novel Erasure Codes for Big Data. In Proc. of the VLDB Endowment, Vol. 6. 325--336. Google ScholarDigital Library
- J. Schindler, S. Shete, and K. Smith. 2011. Improving Throughput for Small Disk Requests with Proximal I/O. In Proc. of USENIX FAST. Google ScholarDigital Library
- R. Sears and R. Ramakrishnan. 2012. bLSM: A General Purpose Log Structured Merge Tree. In Proc. of ACM SIGMOD. Google ScholarDigital Library
- Z. Shen, P. Lee, J. Shu, and W. Guo. 2017. Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage Systems. In Proc. of IEEE SRDS.Google Scholar
- Z. Shen, J. Shu, and Y. Fu. 2016. Parity-switched Data Placement: Optimizing Partial Stripe Writes in XOR-Coded Storage Systems. IEEE Trans. on Parallel and Distributed Systems 27, 11 (Nov 2016), 3311--3322. Google ScholarDigital Library
- Z. Shen, J. Shu, and P. Lee. 2016. Reconsidering Single Failure Recovery in Clustered File Systems. In Proc. of IEEE/IFIP DSN.Google Scholar
- Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, and Ted Wobber. 2010. Extending SSD Lifetimes with Disk-Based Write Caches. In Proc. of USENIX FAST. Google ScholarDigital Library
- D. Stodolsky, G. Gibson, and M. Holland. 1993. Parity Logging Overcoming the Small Write Problem in Redundant Disk Arrays. In Proc. of ISCA. Google ScholarDigital Library
- A. Vulimiri, C. Curino, P. Godfrey, T. Jungblut, J. Padhye, and G. Varghese. 2015. Global Analytics in the Face of Bandwidth and Regulatory Constraints. In Proc. of USENIX NSDI. Google ScholarDigital Library
- Hakim Weatherspoon and John D Kubiatowicz. 2002. Erasure Coding vs. Replication: A Quantitative Comparison. In Proc. of IPTPS. Google ScholarDigital Library
Index Terms
- Cross-Rack-Aware Updates in Erasure-Coded Data Centers
Recommendations
A Rack-Aware Pipeline Repair Scheme for Erasure-Coded Distributed Storage Systems
ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingNowadays, modern industry data centers have employed erasure codes to provide reliability for large amounts of data at a low cost. Although erasure codes provide optimal storage efficiency, they suffer from high repair costs compared to traditional ...
Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice
Special Issue on MSST 2017 and Regular PapersRepair performance in hierarchical data centers is often bottlenecked by cross-rack network transfer. Recent theoretical results show that the cross-rack repair traffic can be minimized through repair layering, whose idea is to partition a repair ...
Adaptive Updates for Erasure-Coded Storage Systems Based on Data Delta and Logging
Parallel and Distributed Computing, Applications and TechnologiesAbstractWith the explosive growth of data in modern storage systems, erasure coding is widely used to ensure data reliability because of its low storage cost and high reliability. However, a small update can lead to a partial update for erasure-coded ...
Comments