Analyzing and Optimizing Packet Corruption in RDMA Network

Gao, Yi-Xiao; Tian, Chen; Chen, Wei; Li, Duo-Xing; Yan, Jian; Gong, Yuan-Yuan; Wang, Bing-Quan; Wu, Tao; Han, Lei; Qi, Fa-Zhi; Zeng, Shan; Dou, Wan-Chun; Chen, Gui-Hai

doi:10.1007/s11390-022-2123-8

Analyzing and Optimizing Packet Corruption in RDMA Network

Regular Paper
Published: 30 July 2022

Volume 37, pages 743–762, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yi-Xiao Gao¹,
Chen Tian¹,
Wei Chen¹,
Duo-Xing Li¹,
Jian Yan²,
Yuan-Yuan Gong¹,
Bing-Quan Wang²,
Tao Wu²,
Lei Han²,
Fa-Zhi Qi³,
Shan Zeng³,
Wan-Chun Dou¹ &
…
Gui-Hai Chen¹

385 Accesses
Explore all metrics

Abstract

Remote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss rate. However, except for switch buffer overflow, there is another kind of packet loss in the RDMA network, i.e., packet corruption, which has not been discussed in depth. The packet corruption incurs long application tail latency by causing timeout retransmissions. The challenges to solving packet corruption in the RDMA network include: 1) packet corruption is inevitable with any remedial mechanisms and 2) RDMA hardware is not programmable. This paper proposes some designs which can guarantee the expected tail latency of applications with the existence of packet corruption. The key idea is controlling the occurring probabilities of timeout events caused by packet corruption through transforming timeout retransmissions into out-of-order retransmissions. We build a probabilistic model to estimate the occurrence probabilities and real effects of the corruption patterns. We implement these two mechanisms with the help of programmable switches and the zero-byte message RDMA feature. We build an ns-3 simulation and implement optimization mechanisms on our testbed. The simulation and testbed experiments show that the optimizations can decrease the flow completion time by several orders of magnitudes with less than 3% bandwidth cost at different packet corruption rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Guo C, Wu H, Deng Z, Soni G, Ye J, Padhye J, Lipshteyn M. RDMA over commodity ethernet at scale. In Proc. the 2016 ACM SIGCOMM Conference, August 2016, pp.202-215. https://doi.org/10.1145/2934872.2934908.
Gao Y, Li Q, Tang L et al. When cloud storage meets RDMA. In Proc. the 18th USENIX Symposium on Net-worked Systems Design and Implementation, April 2021, pp.519-533.
Kalia A, Kaminsky M, Andersen D G. Design guidelines for high performance RDMA systems. In Proc. the 2016 USENIX Conference on USENIX Annual Technical Conference, June 2016, pp.437-450.
Kalia A, Kaminsky M, Andersen D G. Using RDMA efficiently for key-value services. In Proc. the 2014 ACM Conference on SIGCOMM, August 2014, pp.295-306. https://doi.org/10.1145/2619239.2626299.
Mitchell C, Geng Y, Li J. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In Proc. the 2013 USENIX Annual Technical Conference, June 2013, pp.103-114.
Dragojević A, Narayanan D, Hodson O, Castro M. FaRM: Fast remote memory. In Proc. the 11th USENIX Conference on Networked Systems Design and Implementation, April 2014, pp.401-414.
Wei X, Shi J, Chen Y, Chen R, Chen H. Fast in-memory transaction processing using RDMA and HTM. In Proc. the 25th Symposium on Operating Systems Principles, October 2015, pp.87-104. https://doi.org/10.1145/2815400.2815419.
Chen Y, Wei X, Shi J, Chen R, Chen H. Fast and general distributed transactions using RDMA and HTM. In Proc. the 11th European Conference on Computer Systems, April 2016, Article No. 26. https://doi.org/10.1145/2901318.2901349.
Yang J, Izraelevitz J, Swanson S. Orion: A distributed file system for non-volatile main memories and RDMA-capable networks. In Proc. the 17th USENIX Conference on File and Storage Technologies, February 2019, pp.221-234.
Kim J, Jang I, Reda W, Im J, Canini M, Kostić D, Kwon Y, Peter S, Witchel E. LineFS: Efficient smartNIC offload of a distributed file system with pipeline parallelism. In Proc. the 28th ACM SIGOPS Symposium on Operating Systems Principles, October 2021, pp.756-771. https://doi.org/10.1145/3477132.3483565.
Weil S A, Brandt S A, Miller E L, Long D D E, Maltzahn C. Ceph: A scalable, high-performance distributed file system. In Proc. the 7th Symposium on Operating Systems Design and Implementation, November 2006, pp.307-320.
Aghayev A, Weil S, Kuchnik M, Nelson M, Ganger G R, Amvrosiadis G. File systems unfit as distributed storage backends: Lessons from 10 years of Ceph evolution. In Proc. the 27th ACM Symposium on Operating Systems Principles, October 2019, pp.353-369. https://doi.org/10.1145/3341301.3359656.
Kalia A, Kaminsky M, Andersen D G. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.185-201.
Su M, Zhang M, Chen K, Guo Z, Wu Y. RFP: When RPC is faster than server-bypass with RDMA. In Proc. the 12th European Conference on Computer Systems, April 2017, pp.1-15. https://doi.org/10.1145/3064176.3064189.
Kalia A, Kaminsky M, Andersen D G. Datacenter RPCs can be general and fast. In Proc. the 16th USENIX Conference on Networked Systems Design and Implementation, February 2019, pp.1-16.
Zhu Y, Eran H, Firestone D, Guo C, Lipshteyn M, Liron Y, Padhye J, Raindel S, Yahia M H, Zhang M. Congestion control for large-scale RDMA deployments. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, August 2015, pp.523-536. https://doi.org/10.1145/2785956.2787484.
Zhuo D, Ghobadi M, Mahajan R, Förster K T, Krishnamurthy A, Anderson T. Understanding and mitigating packet corruption in data center networks. In Proc. the Conference of the ACM Special Interest Group on Data Communication, August 2017, pp.362-375. https://doi.org/10.1145/3098822.3098849.
Zhuo D, Ghobadi M, Mahajan R, Phanishayee A, Zou X K, Guan H, Krishnamurthy A, Anderson T. RAIL: A case for redundant arrays of inexpensive links in data center networks. In Proc. the 14th USENIX Symposium on Networked Systems Design and Implementation, March 2017, pp.561-576.
Mittal R, Shpiner A, Panda A, Zahavi E, Krishnamurthy A, Ratnasamy S, Shenker S. Revisiting network support for RDMA. In Proc. the 2018 Conference of the ACM Special Interest Group on Data Communication, August 2018, pp.313-326. https://doi.org/10.1145/3230543.3230557.
Wang Y, Liu K, Tian C, Bai B, Zhang G. Error recovery of RDMA packets in data center networks. In Proc. the 28th International Conference on Computer Communication and Networks, July 29-August 1, 2019. https://doi.org/10.1109/ICCCN.2019.8846946.
Langley A, Riddoch A, Wilk A et al. The QUIC transport protocol: Design and Internet-scale deployment. In Proc. the Conference of the ACM Special Interest Group on Data Communication, August 2017, pp.183-196. https://doi.org/10.1145/3098822.3098842.
Bosshart P, Daly D, Gibb G et al. P4: Programming protocol-independent packet processors. ACM SIG-COMM Comput. Commun. Rev., 2014, 44(3): 87-95. https://doi.org/10.1145/2656877.2656890.
Marty M, Kruijf M, Adriaens J et al. Snap: A microkernel approach to host networking. In Proc. the 27th ACM Symposium on Operating Systems Principles, October 2019, pp.399-413. https://doi.org/10.1145/3341301.3359657.
Singhvi A, Akella A, Gibson D et al. 1RMA: Re-envisioning remote memory access for multi-tenant datacenters. In Proc. the 2020 Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, August 2020, pp.708-721. https://doi.org/10.1145/3387514.3405897.
Monga S K, Kashyap S, Min C. Birds of a feather flock together: Scaling RDMA RPCs with Flock. In Proc. the 28th ACM SIGOPS Symposium on Operating Systems Principles, October 2021, pp.212-227. https://doi.org/10.1145/3477132.3483576.
Handley M, Raiciu C, Agache A, Voinescu A, Moore A W, Antichi G, Wójcik M. Re-architecting datacenter networks and stacks for low latency and high performance. In Proc. the Conference of the ACM Special Interest Group on Data Communication, August 2017, pp.29-42. https://doi.org/10.1145/3098822.3098825.
Li J, Wang Q, Lee P P C, Shi C. An in-depth analysis of cloud block storage workloads in large-scale production. In Proc. the 2020 IEEE International Symposium on Workload Characterization, October 2020, pp.37-47. https://doi.org/10.1109/IISWC50251.2020.00013.
Geng J, Yan J, Ren Y, Zhang Y. Design and implementation of network monitoring and scheduling architecture based on P4. In Proc. the 2nd International Conference on Computer Science and Application Engineering, October 2018, Article No. 182. https://doi.org/10.1145/3207677.3278059.
Ye J L, Chen C, Huang Chu Y. A weighted ECMP load balancing scheme for data centers using P4 switches. In Proc. the 7th IEEE International Conference on Cloud Networking, October 2018. https://doi.org/10.1109/CloudNet.2018.8549549.
Sultana N, Sonchack J, Giesen H, Pedisich I, Han Z, Shyamkumar N, Burad S, DeHon A, Loo B T. Flightplan: Dataplane disaggregation and placement for P4 programs. In Proc. the 18th USENIX Symposium on Networked Systems Design and Implementation, April 2021, pp.571-592.
Singh A, Ong J, Agarwal A et al. Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network. In Proc. the 2015 ACM Conference on Special Interest Group on Data Communication, August 2015, pp.183-197. https://doi.org/10.1145/2829988.2787508.
Zhang Q, Ng K K W, Kazer C, Yan S, Sedoc J, Liu V. MimicNet: Fast performance estimates for data center networks with machine learning. In Proc. the 2021 ACM SIGCOMM Conference, August 2021, pp.287-304. https://doi.org/10.1145/3452296.3472926.

Download references

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210046, China
Yi-Xiao Gao, Chen Tian, Wei Chen, Duo-Xing Li, Yuan-Yuan Gong, Wan-Chun Dou & Gui-Hai Chen
Huawei Technologies Co. Ltd, Nanjing, 210012, China
Jian Yan, Bing-Quan Wang, Tao Wu & Lei Han
Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, 100190, China
Fa-Zhi Qi & Shan Zeng

Authors

Yi-Xiao Gao
View author publications
You can also search for this author inPubMed Google Scholar
Chen Tian
View author publications
You can also search for this author inPubMed Google Scholar
Wei Chen
View author publications
You can also search for this author inPubMed Google Scholar
Duo-Xing Li
View author publications
You can also search for this author inPubMed Google Scholar
Jian Yan
View author publications
You can also search for this author inPubMed Google Scholar
Yuan-Yuan Gong
View author publications
You can also search for this author inPubMed Google Scholar
Bing-Quan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Tao Wu
View author publications
You can also search for this author inPubMed Google Scholar
Lei Han
View author publications
You can also search for this author inPubMed Google Scholar
Fa-Zhi Qi
View author publications
You can also search for this author inPubMed Google Scholar
Shan Zeng
View author publications
You can also search for this author inPubMed Google Scholar
Wan-Chun Dou
View author publications
You can also search for this author inPubMed Google Scholar
Gui-Hai Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Chen Tian or Lei Han.

Supplementary Information

ESM 1

(PDF 157 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, YX., Tian, C., Chen, W. et al. Analyzing and Optimizing Packet Corruption in RDMA Network. J. Comput. Sci. Technol. 37, 743–762 (2022). https://doi.org/10.1007/s11390-022-2123-8

Download citation

Received: 31 December 2021
Accepted: 05 July 2022
Published: 30 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11390-022-2123-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing and Optimizing Packet Corruption in RDMA Network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RDMA Based Performance Optimization on Distributed Database Systems: A Case Study with GoldenX

Accurate and fast congestion feedback in MEC-enabled RDMA datacenters

TPL: A Novel Analysis and Optimization Model for RDMA P2P Communication

References

Author information

Authors and Affiliations

Corresponding authors

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Analyzing and Optimizing Packet Corruption in RDMA Network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RDMA Based Performance Optimization on Distributed Database Systems: A Case Study with GoldenX

Accurate and fast congestion feedback in MEC-enabled RDMA datacenters

TPL: A Novel Analysis and Optimization Model for RDMA P2P Communication

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding authors

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now