skip to main content
10.1145/3582016.3582027acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Disaggregated RAID Storage in Modern Datacenters

Published: 25 March 2023 Publication History

Abstract

RAID (Redundant Array of Independent Disks) has been widely adopted for decades, as it provides enhanced throughput and redundancy beyond what a single disk can offer. Today, enabled by fast datacenter networks, accessing remote block devices with acceptable overhead (i.e. disaggregated storage) becomes a reality (e.g., for serverless applications). Combining RAID with remote storage can provide the same benefits while creating better fault tolerance and flexibility than its monolithic counterparts. The key challenge of disaggregated RAID is to handle extra network traffic generated by RAID, which can consume a vast amount of NIC bandwidth. We present dRAID, a disaggregated RAID system that achieves near-optimal read and write throughput. dRAID exploits peer-to-peer disaggregated data access to reduce bandwidth consumption in both normal and degraded states. It employs non-blocking multi-stage writes to maximize inter-node parallelism, and applies pipelined I/O processing to maximize inter-device parallelism. We introduce bandwidth-aware reconstruction for better load balancing. We show that dRAID provides up to 3× bandwidth improvement. The results on a lightweight object store show that dRAID brings 1.5×-2.35× throughput improvement on various workloads.

References

[1]
2003. Internet Small Computer Systems Interface (iSCSI). https://www.ietf.org/rfc/rfc3720.txt Retrieved Aug 30, 2022
[2]
2014. EMC XtremIO Data Protection (XDP). https://www.corporatearmor.com/documents/EMC_XtremIO_Data_Protection_Whitepaper.pdf Retrieved Apr 15, 2022
[3]
2016. Enterprise NVR Series SAS RAID Storage. https://www.security.honeywell.com/product-repository/enterprise-nvr-series-sas-raid-storage Retrieved Aug 30, 2022
[4]
2016. NVMe Over Fabrics. https://nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf Retrieved Mar 31, 2022
[5]
2018. Do you still set up RAID for SSD drives? https://www.reddit.com/r/sysadmin/comments/aavir0/do_you_still_set_up_raid_for_ssd_drives/ Retrieved Mar 15, 2022
[6]
2018. Taking Advantage of a Disaggregated Storage and Compute Architecture. https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture Retrieved Mar 15, 2022
[7]
2019. The Case for Disaggregated Storage. https://www.fungible.com/wp-content/uploads/2019/09/WP009.00.91020918-The-Case-for-Disaggregated-Storage.pdf Retrieved Mar 15, 2022
[8]
2019. What Are the Benefits of SSD RAID? https://insights.samsung.com/2019/03/21/what-are-the-benefits-of-ssd-raid/ Retrieved Mar 15, 2022
[9]
2021. Benefits of Disaggregating NVMe Storage with NVMe-oF Technology. https://nvmexpress.org/benefits-of-disaggregating-nvme-storage-with-nvme-of-technology/ Retrieved Mar 15, 2022
[10]
2021. HPE Smart Array S100i SR Gen10 Software RAID. https://www.hpe.com/psnow/doc/a00019427enw?jumpid=in_lit-psnow-red Retrieved Apr 15, 2022
[11]
2022. Amazon EBS volume types. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html Retrieved Mar 31, 2022
[12]
2022. Azure Managed Disks. https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types Retrieved Apr 1, 2022
[13]
2022. Dell PowerEdge RAID Controller 11 User’s Guide. https://www.dell.com/support/manuals/en-hk/poweredge-r7525/perc11_ug/technical-specifications-of-perc-11-cards?guid=guid-aaaf8b59-903f-49c1-8832-f3997d125edf Retrieved Aug 30, 2022
[14]
2022. Dell PowerStore Specifications. https://www.dell.com/en-us/dt/storage/powerstore-storage-appliance/powerstore-t-series.htm Retrieved Jan 26, 2023
[15]
2022. Flexible I/O Tester. https://github.com/axboe/fio Retrieved Apr 1, 2022
[16]
2022. InfiniBand Roadmap. https://www.infinibandta.org/infiniband-roadmap/ Retrieved Mar 15, 2022
[17]
2022. Intel Rapid Storage Technology enterprise: Product Brief. https://www.intel.com/content/www/us/en/architecture-and-technology/rapid-storage-technology-enterprise-brief.html Retrieved Aug 30, 2022
[18]
2022. Intelligent Storage Acceleration Library. https://github.com/intel/isa-l Retrieved Apr 1, 2022
[19]
2022. SPDK RAID-5 POC. https://github.com/apaszkie/spdk/blob/raid5_poc/lib/bdev/collections/raid5.c Retrieved Apr 1, 2022
[20]
2022. Storage Performance Development Kit. https://spdk.io/ Retrieved Mar 31, 2022
[21]
Dave Anderson and Jim Dykes. 2003. More Than an Interface—SCSI vs. ATA. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies. 245–257.
[22]
H Peter Anvin. 2007. The mathematics of RAID-6. http://ftp.dei.uc.pt/pub/linux/kernel/people/hpa/raid6.pdf Retrieved Mar 31, 2022
[23]
Pei Cao, Swee Boon Lim, Shivakumar Venkataraman, and John Wilkes. 1993. The TickerTAIP Parallel RAID Architecture. In Proceedings of the 20th Annual International Symposium on Computer Architecture. 52–63. https://doi.org/10.1145/165123.165130
[24]
Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson. 1994. RAID: High-Performance, Reliable Secondary Storage. ACM Comput. Surv., 26, 2 (1994), 145–185. https://doi.org/10.1145/176979.176981
[25]
Ching-Che Chung and Hao-Hsiang Hsu. 2014. Partial Parity Cache and Data Cache Management Method to Improve the Performance of an SSD-Based RAID. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22, 7 (2014), 1470–1480. https://doi.org/10.1109/TVLSI.2013.2275737
[26]
John Colgrove, John D. Davis, John Hayes, Ethan L. Miller, Cary Sandvig, Russell Sears, Ari Tamches, Neil Vachharajani, and Feng Wang. 2015. Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1683–1694. https://doi.org/10.1145/2723372.2742798
[27]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. 143–154. https://doi.org/10.1145/1807128.1807152
[28]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. RocksDB: Evolution of Development Priorities in a Key-Value Store Serving Large-Scale Applications. ACM Trans. Storage, 17, 4 (2021), Article 26, 32 pages. https://doi.org/10.1145/3483840
[29]
Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Operation of CloudLab. In Proceedings of the 2019 USENIX Annual Technical Conference. 1–14.
[30]
Dan Gibson, Hema Hariharan, Eric Lance, Moray McLaren, Behnam Montazeri, Arjun Singh, Stephen Wang, Hassan M. G. Wassel, Zhehua Wu, Sunghwan Yoo, Raghuraman Balasubramanian, Prashant Chandra, Michael Cutforth, Peter Cuy, David Decotigny, Rakesh Gautam, Alex Iriza, Milo M. K. Martin, Rick Roy, Zuowei Shen, Ming Tan, Ye Tang, Monica Wong-Chan, Joe Zbiciak, and Amin Vahdat. 2022. Aquila: A unified, low-latency fabric for datacenter networks. In Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation. 1249–1266.
[31]
Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. 2016. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In Proceedings of the 14th USENIX Conference on File and Storage Technologies. 263–276.
[32]
Tejun Heo, Dan Schatzberg, Andrew Newell, Song Liu, Saravanan Dhakshinamurthy, Iyswarya Narayanan, Josef Bacik, Chris Mason, Chunqiang Tang, and Dimitrios Skarlatos. 2022. IOCost: Block IO Control for Containers in Datacenters. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 595–608. https://doi.org/10.1145/3503222.3507727
[33]
Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. 2020. TCP ≈ RDMA: CPU-efficient Remote Storage Access with i10. In Proceedings of the 17th USENIX Conference on Networked Systems Design and Implementation. 127–140.
[34]
Jaehyun Hwang, Midhul Vuppalapati, Simon Peter, and Rachit Agarwal. 2021. Rearchitecting Linux Storage Stack for mu s Latency and High Throughput. In Proceedings of the 15th USENIX Conference on Operating Systems Design and Implementation. 113–128.
[35]
Soojun Im and Dongkun Shin. 2011. Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD. IEEE Trans. Comput., 60, 1 (2011), 80–92. https://doi.org/10.1109/TC.2010.197
[36]
Tianyang Jiang, Guangyan Zhang, Zican Huang, Xiaosong Ma, Junyu Wei, Zhiyue Li, and Weimin Zheng. 2021. FusionRAID: Achieving Consistent Low Latency for Commodity SSD Arrays. In Proceedings of the 19th USENIX Conference on File and Storage Technologies. 355–370.
[37]
Saurabh Kadekodi, Francisco Maturana, Sanjith Athlur, Arif Merchant, K. V. Rashmi, and Gregory R. Ganger. 2022. Tiger: Disk-Adaptive Redundancy Without Placement Restrictions. In Proceedings of the 16th USENIX Conference on Operating Systems Design and Implementation. 413–429.
[38]
Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya, Juncheng Yang, K. V. Rashmi, and Gregory R. Ganger. 2020. PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. Article 21, 17 pages.
[39]
Saurabh Kadekodi, K. V. Rashmi, and Gregory R. Ganger. 2019. Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. 345–358.
[40]
Jaeho Kim, Kwanghyun Lim, Youngdon Jung, Sungjin Lee, Changwoo Min, and Sam H. Noh. 2019. Alleviating Garbage Collection Interference Through Spatial Separation in All Flash Arrays. In Proceedings of the 2019 USENIX Annual Technical Conference. 799–812.
[41]
Youngjae Kim, Junghee Lee, Sarp Oral, David A. Dillow, Feiyi Wang, and Galen M. Shipman. 2014. Coordinating Garbage Collectionfor Arrays of Solid-State Drives. IEEE Trans. Comput., 63, 4 (2014), 888–901. https://doi.org/10.1109/TC.2012.256
[42]
Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash Storage Disaggregation. In Proceedings of the 11th European Conference on Computer Systems. Article 29, 15 pages. https://doi.org/10.1145/2901318.2901337
[43]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash ≈ Local Flash. In Proceedings of the 22TH International Conference on Architectural Support for Programming Languages and Operating Systems. 345–359. https://doi.org/10.1145/3037697.3037732
[44]
Huaicheng Li, Mingzhe Hao, Stanko Novakovic, Vaibhav Gogte, Sriram Govindan, Dan R. K. Ports, Irene Zhang, Ricardo Bianchini, Haryadi S. Gunawi, and Anirudh Badam. 2020. LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs. In Proceedings of the 35th International Conference on Architectural Support for Programming Languages and Operating Systems. 591–605. https://doi.org/10.1145/3373376.3378531
[45]
Huaicheng Li, Martin L. Putra, Ronald Shi, Xing Lin, Gregory R. Ganger, and Haryadi S. Gunawi. 2021. IODA: A Host/Device Co-Design for Strong Predictability Contract on Modern Flash Storage. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 263–279. https://doi.org/10.1145/3477132.3483573
[46]
Jai Menon and Jim Cortney. 1993. The Architecture of a Fault-Tolerant Cached RAID Controller. In Proceedings of the 20th Annual International Symposium on Computer Architecture. 76–87. https://doi.org/10.1145/165123.165144
[47]
Jai Menon, James Roche, and Jim Kasson. 1993. Floating parity and data disk arrays. J. Parallel and Distrib. Comput., 17, 1 (1993), 129–139. https://doi.org/10.1006/jpdc.1993.1011
[48]
Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference. 106–122. https://doi.org/10.1145/3452296.3472940
[49]
Sumit Kumar Monga, Sanidhya Kashyap, and Changwoo Min. 2021. Birds of a Feather Flock Together: Scaling RDMA RPCs with Flock. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 212–227. https://doi.org/10.1145/3477132.3483576
[50]
David A. Patterson, Garth Gibson, and Randy H. Katz. 1988. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data. 109–116. https://doi.org/10.1145/50202.50214
[51]
James S. Plank, Jianqiang Luo, Catherine D. Schuman, Lihao Xu, and Zooko Wilcox-O’Hearn. 2009. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage. In Proccedings of the 7th Conference on File and Storage Technologies. 253–265.
[52]
Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, and Amin Vahdat. 2022. Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. In Proceedings of the ACM SIGCOMM 2022 Conference. 66–85. https://doi.org/10.1145/3544216.3544265
[53]
Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, Joao Carreira, Neeraja J. Yadwadkar, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica, and David A. Patterson. 2021. What Serverless Computing is and Should Become: The next Phase of Cloud Computing. Commun. ACM, 64, 5 (2021), 76–84. https://doi.org/10.1145/3406011
[54]
Junyi Shu, Ruidong Zhu, Yun Ma, Gang Huang, Hong Mei, Xuanzhe Liu, and Xin Jin. 2023. dRAID artifacts. https://doi.org/10.5281/zenodo.7587687
[55]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 183–197. https://doi.org/10.1145/2785956.2787508
[56]
Daniel Stodolsky, Garth Gibson, and Mark Holland. 1993. Parity Logging Overcoming the Small Write Problem in Redundant Disk Arrays. In Proceedings of the 20th Annual International Symposium on Computer Architecture. 64–75. https://doi.org/10.1145/165123.165143
[57]
Nisha Talagala, Satoshi Asami, Tom Anderson, and David Patterson. 1997. Tertiary Disk: Large Scale Distributed Storage.
[58]
Jon Tate, Pall Beck, Hector Hugo Ibarra, Shanmuganathan Kumaravel, and Libor Miklas. 2018. Introduction to storage area networks. IBM Redbooks.
[59]
John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan. 1996. The HP AutoRAID Hierarchical Storage System. ACM Trans. Comput. Syst., 14, 1 (1996), 108–136. https://doi.org/10.1145/225535.225539
[60]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. ACM Trans. Storage, 13, 3 (2017), Article 22, 26 pages. https://doi.org/10.1145/3121133
[61]
Guangyan Zhang, Zican Huang, Xiaosong Ma, Songlin Yang, Zhufan Wang, and Weimin Zheng. 2018. RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures. In Proceedings of the 16th USENIX Conference on File and Storage Technologies. 279–293.

Cited By

View all
  • (2025)ZRAID: Leveraging Zone Random Write Area (ZRWA) for Alleviating Partial Parity Tax in ZNS RAIDProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707248(1151-1165)Online publication date: 30-Mar-2025
  • (2024)ScalaAFAProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692001(141-156)Online publication date: 10-Jul-2024
  • (2024)BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and PerformanceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695953(313-329)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Disaggregated Storage
  2. NVMe-oF
  3. RAID
  4. RDMA

Qualifiers

  • Research-article

Funding Sources

  • the National Key Research and Development Program of China award
  • the National Natural Science Foundation of China award
  • the National Natural Science Fund for the Excellent Young Scientists Fund Program (Overseas) award
  • the Beijing Outstanding Young Scientist Program award

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)360
  • Downloads (Last 6 weeks)23
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ZRAID: Leveraging Zone Random Write Area (ZRWA) for Alleviating Partial Parity Tax in ZNS RAIDProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707248(1151-1165)Online publication date: 30-Mar-2025
  • (2024)ScalaAFAProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692001(141-156)Online publication date: 10-Jul-2024
  • (2024)BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and PerformanceProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695953(313-329)Online publication date: 4-Nov-2024
  • (2024)Mozart: Taming Taxes and Composing Accelerators with Shared-MemoryProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676896(183-200)Online publication date: 14-Oct-2024
  • (2024)Fast Online Reconstruction for SSD-Based RAID-5 Storage SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.334803343:6(1886-1899)Online publication date: Jun-2024
  • (2024)AquaSonic: Acoustic Manipulation of Underwater Data Center Operations and Resource Management2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00201(331-349)Online publication date: 19-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media