research-article

Open access

sRDMA: A General and Low-Overhead Scheduler for RDMA

Authors:

Dan LiAuthors Info & Claims

APNet '23: Proceedings of the 7th Asia-Pacific Workshop on Networking

Pages 21 - 27

https://doi.org/10.1145/3600061.3600082

Published: 05 September 2023 Publication History

All formats PDF

Abstract

Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead.

In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to, and sRDMA has negligible overhead in terms of CPU and flow throughput.

References

[1]

2022. NVIDIA ConnectX family of smart network interface cards. https://www.nvidia.com/en-us/networking/ethernet-adapters/.

[2]

Albert Gran Alcoz, Alexander Dietmüller, and Laurent Vanbever. 2020. SP-PIFO: Approximating Push-In First-Out Behaviors using Strict-Priority Queues. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 59–76.

[3]

Rajarshi Biswas, Xiaoyi Lu, and Dhabaleswar K. Panda. 2018. Accelerating TensorFlow with Adaptive RDMA-Based gRPC. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC). 2–11.

[4]

Hanhua Chen, Jie Yuan, Hai Jin, Yonghui Wang, Sijie Wu, and Zhihao Jiang. 2022. RGraph: Asynchronous graph processing based on asymmetry of remote direct memory access. Software: Practice and Experience (2022), 374–393.

[5]

Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). 401–414.

Digital Library

[6]

Aleksandar Dragojević, Dushyanth Narayanan, Edmund B Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th symposium on operating systems principles. 54–70.

Digital Library

[7]

Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, Zheng Cao, Chen Tian, Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu. 2021. When Cloud Storage Meets RDMA. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 519–533.

[8]

Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference. ACM, 202–215.

Digital Library

[9]

Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, and Gennady Pekhimenko. 2019. Priority-based Parameter Propagation for Distributed DNN Training. In Proceedings of Machine Learning and Systems, Vol. 1. 132–145.

[10]

Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 1–16.

Digital Library

[11]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In 2016 USENIX Annual Technical Conference (ATC 16). USENIX Association, 437–450.

[12]

Anuj Kalia, Michael Kaminsky, and David G Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided RDMA Datagram RPCs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 185–201.

[13]

Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, and Dhabaleswar K. Panda. 2016. High-performance design of apache spark with RDMA and its benefits on various workloads. In 2016 IEEE International Conference on Big Data (Big Data). 253–262.

[14]

Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. 2018. Multi-path transport for RDMA in datacenters. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 357–371.

[15]

Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. 2017. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (ATC 17). 773–785.

[16]

Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown. 2016. Programmable Packet Scheduling at Line Rate. In Proceedings of the 2016 ACM SIGCOMM Conference. Association for Computing Machinery, New York, NY, USA, 44–57.

Digital Library

[17]

Shuai Wang, Dan Li, Jiansong Zhang, and Wei Lin. 2020. CEFS: Compute-Efficient Flow Scheduling for Iterative Synchronous Applications. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies(CoNEXT 20). Association for Computing Machinery, New York, NY, USA, 136–148.

Digital Library

[18]

Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles. 87–104.

Digital Library

[19]

Sijie Wu, Hanhua Chen, Yonghui Wang, and Hai Jin. 2021. Argus: Efficient Job Scheduling in RDMA-assisted Big Data Processing. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 827–836.

[20]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 15–28.

Index Terms

sRDMA: A General and Low-Overhead Scheduler for RDMA
1. Networks
  1. Network algorithms
    1. Data path algorithms
      1. Packet scheduling
  2. Network types
    1. Data center networks

Recommendations

Revisiting network support for RDMA
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication

The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC)...
A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

DMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently ...
Software-based Live Migration for Containerized RDMA
APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking

Container live migration is critical to ensure services are not interrupted during host maintenance in data centers. On the other hand, RDMA containerization has attracted both academia and industry for years. However, live migration for containerized ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APNet '23: Proceedings of the 7th Asia-Pacific Workshop on Networking

June 2023

229 pages

ISBN:9798400707827

DOI:10.1145/3600061

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

APNET 2023

APNET 2023: 7th Asia-Pacific Workshop on Networking

June 29 - 30, 2023

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
634
Total Downloads

Downloads (Last 12 months)473
Downloads (Last 6 weeks)48

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten