ABSTRACT
The widespread deployment of storage disaggregation in the cloud has facilitated flexible scaling and storage overprovisioning, allowing for high utilization of storage capacity and IOPS. Instead of utilizing remote storage protocols to access remote disks, a middle-tier is introduced between compute servers and storage servers in order to serve I/O requests from compute servers and provide computations such as compression and decompression. However, due to the need for a cloud to concurrently serve millions of VMs that require access to disaggregated storage, the middle-tier requires a massive number of servers to process network traffic between computing and storage nodes. For example, a major cloud company may deploy hundreds of thousands of high-end servers to provide such a service for its cloud storage, because the existing CPU-based middle-tier suffers from a severe issue of compute-intensive compression/decompression on high-throughput storage traffic. To address this issue, we introduce SmartDS, a middle-tier-centric SmartNIC that serves storage I/O requests with low latency and high throughput, while maintaining high flexibility and programmability. The key idea behind SmartDS is the application-aware message split (AAMS) mechanism, which allows for the processing of the message's header on the host CPU to achieve high flexibility, and the message's payload on the SmartDS. Experimental results demonstrate that SmartDS provides up to 4.3× more throughput than a CPU-based middle-tier and enables the linear scale-up of multiple network ports and multiple SmartNICs, thus significantly reducing cloud infrastructure costs for disaggregated block storage.
- M. S. Abdelfattah, A. Hagiescu, and D. Singh, "Gzip on a chip: High performance lossless data compression on fpgas using opencl," in IWOCL, 2014.Google ScholarDigital Library
- A. Aghayev, S. Weil, M. Kuchnik, M. Nelson, G. R. Ganger, and G. Amvrosiadis, "File systems unfit as distributed storage backends: Lessons from 10 years of ceph evolution," in SOSP, 2019.Google ScholarDigital Library
- M. Alian, S. Agarwal, J. Shin, N. Patel, Y. Yuan, D. Kim, R. Wang, and N. S. Kim, "Idio: Network-driven, inbound network data orchestration on server processors," in MICRO, 2022.Google Scholar
- M. Alian and N. S. Kim, "Netdimm: Low-latency near-memory network interface architecture," in MICRO, 2019.Google ScholarDigital Library
- G. Alonso, "Technical perspective: Dfi: The data flow interface for high-speed networks," SIGMOD Rec., 2022.Google Scholar
- Amazon, "Amazon Elastic Block Store," https://aws.amazon.com/cn/blogs/architecture/category/storage/amazon-elastic-block-storage-ebs, 2022.Google Scholar
- M. T. Arashloo, A. Lavrov, M. Ghobadi, J. Rexford, D. Walker, and D. Wentzlaff, "Enabling programmable transport protocols in high-speed nics," in NSDI, 2020.Google Scholar
- M. Bartík, S. Ubik, and P. Kubalik, "Lz4 compression algorithm on fpga," in ICECS, 2015.Google Scholar
- Broadcom, "Stingray™ PS250," https://docs.broadcom.com/doc/PS250-PB, 2018.Google Scholar
- Broadcom, "BCM957508-P2200G," https://docs.broadcom.com/doc/957508-P2200G-DS, 2019.Google Scholar
- Broadcom, "BCM957504-N1100G," https://docs.broadcom.com/doc/957504-N1100G-DS, 2020.Google Scholar
- Broadcom, "Broadcom N2200G," https://www.broadcom.com/products/ethernet-connectivity/network-adapters/n2200g, 2022.Google Scholar
- Broadcom, "Broadcom Stingray PS1100R," https://docs.broadcom.com/doc/PS1100R-PB, 2022.Google Scholar
- M. S. Brunella, G. Belocchi, M. Bonola, S. Pontarelli, G. Siracusano, G. Bianchi, A. Cammarano, A. Palumbo, L. Petrucci, and R. Bifulco, "hxdp: Efficient software packet processing on fpga nics," Communications of the ACM, 2022.Google ScholarDigital Library
- B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. u. Haq, M. I. u. Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas, "Windows azure storage: A highly available cloud storage service with strong consistency," in SOSP, 2011.Google Scholar
- Y. Chen, A. Ganapathi, and R. H. Katz, "To compress or not to compress compute vs. io tradeoffs for mapreduce energy efficiency," in SIGCOMM, 2010.Google Scholar
- S. Choi, M. Shahbaz, B. Prabhakar, and M. Rosenblum, "λ-nic: Interactive server-less compute on programmable smartnics," in ICDCS, 2020.Google Scholar
- D. Cock, A. Ramdas, D. Schwyn, M. Giardino, A. Turowski, Z. He, N. Hossle, D. Korolija, M. Licciardello, K. Martsenko, R. Achermann, G. Alonso, and T. Roscoe, "Enzian: An Open, General, CPU/FPGA Platform for Systems Software Research," in ASPLOS, 2022.Google Scholar
- Ehernet Technology Consortium, "800G specification," https://ethernettechnologyconsortium.org/wpcontent/uploads/2020/03/800G-Specification_r1.0.pdf, 2020.Google Scholar
- A. Farshin, A. Roozbeh, G. Q. Maguire Jr, and D. Kostić, "Make the most out of last level cache in intel processors," in EuroSys, 2019.Google Scholar
- D. Firestone, A. Putnam, S. Mundkur, D. Chiou, A. Dabagh, M. Andrewartha, H. Angepat, V. Bhanu, A. Caulfield, E. Chung, H. K. Chandrappa, S. Chaturmohta, M. Humphrey, J. Lavier, N. Lam, F. Liu, K. Ovtcharov, J. Padhye, G. Popuri, S. Raindel, T. Sapre, M. Shaw, G. Silva, M. Sivakumar, N. Srivastava, A. Verma, Q. Zuhair, D. Bansal, D. Burger, K. Vaid, D. A. Maltz, and A. Greenberg, "Azure accelerated networking:smartnics in the public cloud," in NSDI, 2018.Google Scholar
- M. Flajslik and M. Rosenblum, "Network interface design for low latency request-response protocols," in ATC, 2013.Google Scholar
- J. Fowers, J.-Y. Kim, D. Burger, and S. Hauck, "A scalable high-bandwidth architecture for lossless compression on fpgas," in FCCM, 2015.Google Scholar
- J. Fried, Z. Ruan, A. Ousterhout, and A. Belay, "Caladan: Mitigating interference at microsecond timescales," in OSDI, 2020.Google Scholar
- Y. Gao, Q. Li, L. Tang, Y. Xi, P. Zhang, W. Peng, B. Li, Y. Wu, S. Liu, L. Yan, F. Feng, Y. Zhuang, F. Liu, P. Liu, X. Liu, Z. Wu, J. Wu, Z. Cao, C. Tian, J. Wu, J. Zhu, H. Wang, D. Cai, and J. Wu, "When cloud storage meets rdma," in NSDI, 2021.Google Scholar
- Y. Go, M. A. Jamshed, Y. Moon, C. Hwang, and K. Park, "Apunet: Revitalizing gpu as packet processing accelerator," in NSDI, 2017.Google Scholar
- S. Goswami, N. Kodirov, C. Mustard, I. Beschastnikh, and M. Seltzer, "Parking packet payload with p4," in CoNEXT, 2020.Google Scholar
- S. Grant, A. Yelam, M. Bland, and A. C. Snoeren, "Smartnic performance isolation with fairnic: Programmable networking for the cloud," in SIGCOMM, 2020.Google ScholarDigital Library
- C. Guo, H. Wu, Z. Deng, G. Soni, J. Ye, J. Padhye, and M. Lipshteyn, "Rdma over commodity ethernet at scale," in SIGCOMM, 2016.Google Scholar
- HiTech Global, "2-Port QSFP28 (2x100G) / QSFP+ (2x40G or 2x56G) FMC Module (Vita57.1)," http://www.hitechglobal.com/FMCModules/FMC_2QSFP28.htm, 2022.Google Scholar
- X. Hu, F. Wang, W. Li, J. Li, and H. Guan, "Qzfs: Qat accelerated compression in file system for application agnostic and cost efficient data storage," in ATC, 2019.Google Scholar
- Intel, "Intel data direct i/o technology: A primer," https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf, 2012.Google Scholar
- Intel, "Intel QuickAssist Technology," https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html, 2019.Google Scholar
- Intel, "Intel® SSD D7-P5520 Series," https://ark.intel.com/content/www/us/en/ark/products/213416/intel-ssd-d7p5520-series-1-92tb-2-5in-pcie-4-0-x4-3d4-tlc.html, 2020.Google Scholar
- Intel, "Intel® Infrastructure Processing Unit," https://www.intel.com/content/www/us/en/products/details/network-io/ipu.html, 2022.Google Scholar
- Intel, "Intel® Memory Latency Checker," https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html, 2022.Google Scholar
- Z. István, D. Sidler, G. Alonso, and M. Vukolic, "Consensus in a Box: Inexpensive Coordination in Hardware," in NSDI, 2016.Google Scholar
- J. Jang, S. J. Jung, S. Jeong, J. Heo, H. Shin, T. J. Ham, and J. W. Lee, "A specialized architecture for object serialization with applications to big data analytics," in ISCA, 2020.Google Scholar
- M. Khazraee, A. Forencich, G. C. Papen, A. C. Snoeren, and A. Schulman, "Rosebud: Making FPGA-Accelerated Middlebox Development More Pleasant," in ASPLOS, 2023.Google Scholar
- J. Kim, I. Jang, W. Reda, J. Im, M. Canini, D. Kostić, Y. Kwon, S. Peter, and E. Witchel, "Linefs: Efficient smartnic offload of a distributed file system with pipeline parallelism," in SOSP, 2021.Google ScholarDigital Library
- A. Klimovic, C. Kozyrakis, E. Thereska, B. John, and S. Kumar, "Flash storage disaggregation," in EuroSys, 2016.Google Scholar
- N. Lazarev, S. Xiang, N. Adit, Z. Zhang, and C. Delimitrou, "Dagger: Efficient and fast rpcs in cloud microservices with near-memory reconfigurable nics," in ASPLOS, 2021.Google Scholar
- N. Lazarev, S. Xiang, N. Adit, Z. Zhang, and C. Delimitrou, "Dagger: efficient and fast rpcs in cloud microservices with near-memory reconfigurable nics," in ASPLOS, 2021.Google Scholar
- B. Li, K. Tan, L. Luo, Y. Peng, R. Luo, N. Xu, Y. Xiong, P. Cheng, and E. Chen, "Clicknp: Highly flexible and high performance network processing with reconfigurable hardware," in SIGCOMM, 2016.Google ScholarDigital Library
- J. Li, Y. Lu, Q. Wang, J. Lin, Z. Yang, and J. Shu, "AlNiCo: SmartNIC-accelerated contention-aware request scheduling for transaction processing," in ATC, 2022.Google Scholar
- J. Lin, K. Patel, B. E. Stephens, A. Sivaraman, and A. Akella, "Panic: A high-performance programmable nic for multi-tenant networks," in OSDI, 2020.Google Scholar
- M. Liu, T. Cui, H. Schuh, A. Krishnamurthy, S. Peter, and K. Gupta, "Offloading distributed applications onto smartnics using ipipe," in SIGCOMM, 2019.Google Scholar
- M. Liu, S. Peter, A. Krishnamurthy, and P. M. Phothilimthana, "E3:energy-efficient microservices on smartnic-accelerated servers," in ATC, 2019.Google Scholar
- LZ4, "LZ4 Benchmarks," https://github.com/lz4/lz4, 2022.Google Scholar
- J. D. McCalpin, "Memory bandwidth and system balance in hpc systems," UT Faculty/Researcher Works, 2016.Google Scholar
- Mellanox, "ConnectX®-5 En Card Product Brief," https://www.mellanox.com/sites/default/files/relateddocs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf, 2017.Google Scholar
- Mellanox, "ConnectX®-6 En Card Product Brief," https://www.mellanox.com/sites/default/files/relateddocs/prod_adapter_cards/PB_ConnectX-6_EN_Card.pdf, 2017.Google Scholar
- R. Miao, L. Zhu, S. Ma, K. Qian, S. Zhuang, B. Li, S. Cheng, J. Gao, Y. Zhuang, P. Zhang, R. Liu, C. Shi, B. Fu, J. Zhu, J. Wu, D. Cai, and H. H. Liu, "From luna to solar: The evolutions of the compute-to-storage networks in alibaba cloud," in SIGCOMM, 2022.Google ScholarDigital Library
- Microsoft, "Introduction to Header-Data Split," https://learn.microsoft.com/en-us/windows-hardware/drivers/network/header-data-split, 2021.Google Scholar
- J. Min, M. Liu, T. Chugh, C. Zhao, A. Wei, I. H. Doh, and A. Krishnamurthy, "Gimbal: enabling multi-tenant storage disaggregation on smartnic jbofs," in SIGCOMM, 2021.Google ScholarDigital Library
- A. Mirhosseini, H. Golestani, and T. F. Wenisch, "Hyperplane: A scalable low-latency notification accelerator for software data planes," in MICRO, 2020.Google Scholar
- R. Neugebauer, G. Antichi, J. F. Zazo, Y. Audzevich, S. López-Buedo, and A. W. Moore, "Understanding pcie performance for end host networking," in SIGCOMM, 2018.Google Scholar
- Nvidia, "NVIDIA BLUEFIELD-2 DPU," https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf, 2021.Google Scholar
- Nvidia, "NVIDIA BLUEFIELD-3 DPU," https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-3-dpu.pdf, 2022.Google Scholar
- A. Ozsoy, M. Swany, and A. Chauhan, "Pipelined parallel lzss for streaming data compression on gpgpus," in ICPADS, 2012.Google Scholar
- P. M. Phothilimthana, M. Liu, A. Kaufmann, S. Peter, R. Bodik, and T. Anderson, "Floem: A programming system for nic-accelerated network applications," in OSDI, 2018.Google Scholar
- B. Pismenny, L. Liss, A. Morrison, and D. Tsafrir, "The benefits of general-purpose on-nic memory," in ASPLOS, 2022.Google Scholar
- S. Pontarelli, R. Bifulco, M. Bonola, C. Cascone, M. Spaziani, V. Bruschi, D. Sanvito, G. Siracusano, A. Capone, M. Honda, F. Huici, and G. Bianchi, "Flowblaze: Stateful packet processing in hardware," in NSDI, 2019.Google Scholar
- A. Pourhabibi, S. Gupta, H. Kassir, M. Sutherland, Z. Tian, M. P. Drumond, B. Falsafi, and C. Koch, "Optimus prime: Accelerating data transformation in servers," in ASPLOS, 2020.Google Scholar
- A. Pourhabibi, M. Sutherland, A. Daglis, and B. Falsafi, "Cerebros: Evading the rpc tax in datacenters," in MICRO, 2021.Google Scholar
- W. Qiao, J. Du, Z. Fang, M. Lo, M.-C. F. Chang, and J. Cong, "High-throughput lossless compression on tightly coupled cpu-fpga platforms," in FCCM, 2018.Google Scholar
- A. Sarma, H. Seyedroudbari, H. Gupta, U. Ramachandran, and A. Daglis, "Nfslicer: Data movement optimization for shallow network functions," arXiv preprint arXiv:2203.02585, 2022.Google Scholar
- H. N. Schuh, W. Liang, M. Liu, J. Nelson, and A. Krishnamurthy, "Xenic: Smartnic-accelerated distributed transactions," in ASPLOS, 2021.Google ScholarDigital Library
- L. Shalev, H. Ayoub, N. Bshara, and E. Sabbag, "A cloud-optimized transport protocol for elastic and scalable hpc," IEEE Micro, 2020.Google Scholar
- D. Sidler, Z. Wang, M. Chiosa, A. Kulkarni, and G. Alonso, "Strom: smart remote memory," in EuroSys, 2020.Google ScholarDigital Library
- Silicom, "Silicom FPGA SmartNIC N501x," https://www.silicom.dk/wp-content/uploads/2022/03/Silicom-FPGA-SmartNIC-N501x-Series_v1.0.pdf, 2022.Google Scholar
- E. Sitaridi, R. Mueller, T. Kaldewey, G. Lohman, and K. A. Ross, "Massively-parallel lossless data decompression," in ICPP, 2016.Google Scholar
- I. Smolyar, A. Markuze, B. Pismenny, H. Eran, G. Zellweger, A. Bolen, L. Liss, A. Morrison, and D. Tsafrir, "Ioctopus: Outsmarting nonuniform dma," in ASPLOS, 2020.Google ScholarDigital Library
- M. Sutherland, S. Gupta, B. Falsafi, V. Marathe, D. Pnevmatikatos, and A. Daglis, "The nebula rpc-optimized architecture," in ISCA, 2020.Google Scholar
- The Silesia corpus, "," https://sun.aei.polsl.pl//-sdeor/index.php, 2022.Google Scholar
- S. Thomas, G. M. Voelker, and G. Porter, "Cachecloud: Towards speed-of-light datacenter communication," in HotCloud, 2018.Google Scholar
- A. Tootoonchian, A. Panda, C. Lan, M. Walls, K. Argyraki, S. Ratnasamy, and S. Shenker, "Resq: Enabling slos in network function virtualization," in NSDI, 2018.Google Scholar
- M. Vemmou, A. Cho, and A. Daglis, "Patching up network data leaks with sweeper," in MICRO, 2022.Google Scholar
- Z. Wang, H. Huang, J. Zhang, and G. Alonso, "Shuhai: Benchmarking high bandwidth memory on fpgas," in FCCM, 2020.Google Scholar
- Wang, Zeke and Huang, Hongjing and Zhang, Jie and Wu, Fei and Alonso, Gustavo, "FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs," in ATC, 2022.Google Scholar
- J. Wirth, J. A. Hofmann, L. Thostrup, C. Binnig, and A. Koch, "Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases," in FPT, 2021.Google Scholar
- Xilinx, "Xilinx ALVEO™ U280," https://www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf, 2021.Google Scholar
- Xilinx, "Virtex UltraScale+ HBM VCU128 FPGA Evaluation Kit," https://www.xilinx.com/products/boards-and-kits/vcu128.html, 2022.Google Scholar
- Xilinx, "Xilinx Versal FPGA," https://www.xilinx.com/products/silicon-devices/acap/versal-hbm.html, 2022.Google Scholar
- Y. Yuan, J. Huang, Y. Sun, T. Wang, J. Nelson, D. R. Ports, Y. Wang, R. Wang, C. Tai, and N. S. Kim, "Rambda: Rdma-driven acceleration framework for memory-intensive μs-scale datacenter applications," in HPCA, 2023.Google Scholar
- B. Zhou, H. Jin, and R. Zheng, "A high speed lossless compression algorithm based on cpu and gpu hybrid platform," in TrustCom, 2014.Google Scholar
Index Terms
- SmartDS: Middle-Tier-centric SmartNIC Enabling Application-aware Message Split for Disaggregated Block Storage
Recommendations
SmartGate: Accelerate Cloud Gateway with SmartNIC
ICCSIE '23: Proceedings of the 8th International Conference on Cyber Security and Information EngineeringWith the development of cloud computing, more and more enterprises and organizations are migrating their businesses to the cloud for resource sharing, cost reduction, and improved operational efficiency. Infrastructure as a Service (IaaS) is an important ...
Janus: An Experimental Reconfigurable SmartNIC with P4 Programmability and SDN Isolation
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysDisparate deployment models of cloud computing pose varying requirements on cloud infrastructure components such as networking, storage, provisioning, and security. Infrastructure providers need to study these and often create custom infrastructure ...
SOSP: A SmartNIC-based Offloading Framework for Cloud Storage Pooling
icWCSN '22: Proceedings of the 2022 9th International Conference on Wireless Communication and Sensor NetworksAs Moore's Law is gradually reaching its limitation, traditional CPU-centric computing architecture cannot meet the ever growing computational requirements, especially in large distributed data centers. There is a growing consensus in the industry that ...
Comments