research-article

A case for RDMA in clouds: turning supercomputer networking into commodity

Authors:

Animesh Trivedi,

Bernard Metzler,

Patrick StuediAuthors Info & Claims

APSys '11: Proceedings of the Second Asia-Pacific Workshop on Systems

Article No.: 17, Pages 1 - 5

https://doi.org/10.1145/2103799.2103820

Published: 11 July 2011 Publication History

Abstract

Modern cloud computing infrastructures are steadily pushing the performance of their network stacks. At the hardware-level, already some cloud providers have upgraded parts of their network to 10GbE. At the same time there is a continuous effort within the cloud community to improve the network performance inside the virtualization layers. The low-latency/high-throughput properties of those network interfaces are not only opening the cloud for HPC applications, they will also be well received by traditional large scale web applications or data processing frameworks. However, as commodity networks get faster the burden on the end hosts increases. Inefficient memory copying in socket-based networking takes up a significant fraction of the end-to-end latency and also creates serious CPU load on the host machine. Years ago, the supercomputing community has developed RDMA network stacks like Infiniband that offer both low end-to-end latency as well as a low CPU footprint. While adopting RDMA to the commodity cloud environment is difficult (costly, requires special hardware) we argue in this paper that most of the benefits of RDMA can in fact be provided in software. To demonstrate our findings we have implemented and evaluated a prototype of a software-based RDMA stack. Our results, when compared to a socket/TCP approach (with TCP receive copy offload) show significant reduction in end-to-end latencies for messages greater than modest 64kB and reduction of CPU load (w/o TCP receive copy offload) for better efficiency while saturating the 10Gbit/s link.

References

[1]

Intel Corporation. Intel QuickData Technology Software Guide for Linux* at http://www.intel.com/technology/quickdata/whitepapers/sw_guide_linux.pdf, 2008.

[2]

D. G. Andersen et al. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 1--14, New York, NY, USA, 2009. ACM.

Digital Library

[3]

P. Balaji. Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck. In In RAIT workshop, 2004.

[4]

D. Beaver et al. Finding a needle in Haystack: Facebook's photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[5]

G. Buzzard et al. An implementation of the Hamlyn sender-managed interface architecture. In Proceedings of the second USENIX symposium on Operating systems design and implementation, OSDI '96, pages 245--259, New York, NY, USA, 1996. ACM.

Digital Library

[6]

H.-k. J. Chu. Zero-copy TCP in Solaris. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, pages 21--21, Berkeley, CA, USA, 1996. USENIX Association.

Digital Library

[7]

D. D. Clark et al. Architectural considerations for a new generation of protocols. In Proceedings of the ACM symposium on Communications architectures & protocols, SIGCOMM '90, pages 200--208, New York, NY, USA, 1990. ACM.

Digital Library

[8]

D. Dalessandro et al. iWARP protocol kernel space software implementation. In Proceedings of the 20th international conference on Parallel and distributed processing, IPDPS'06, pages 274--274, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[9]

P. Druschel et al. Experiences with a high-speed network adaptor: a software perspective. In Proceedings of the conference on Communications architectures, protocols and applications, SIGCOMM '94, pages 2--13, New York, NY, USA, 1994. ACM.

Digital Library

[10]

P. W. Frey et al. Minimizing the Hidden Cost of RDMA. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, ICDCS '09, pages 553--560, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[11]

IETF. Remote direct data placement working group. http://datatracker.ietf.org/wg/rddp/charter/.

[12]

J. Liu et al. Evaluating high performance communication: a power perspective. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 326--337, New York, NY, USA, 2009. ACM.

Digital Library

[13]

K. Magoutis. The Case Against User-level Networking. In In Third Workshop on Novel Uses of System Area Networks (SAN-3), 2004.

[14]

A. Menon et al. Optimizing network virtualization in Xen. In Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 2--2, Berkeley, CA, USA, 2006. USENIX Association.

Digital Library

[15]

B. Metzler, P. Frey, and A. Trivedi. SoftiWARP - Project Update, 2010. Available online at http://www.openfabrics.org/OFA-Events-sonoma2010.html.

[16]

J. C. Mogul. TCP offload is a dumb idea whose time has come. In Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9, pages 5--5, Berkeley, CA, USA, 2003. USENIX Association.

Digital Library

[17]

Netperf, 2.4.5. http://www.netperf.org/netperf/, 2011.

[18]

OpenFabric Alliance. OpenFabrics Enterprise Distribution (OFED) Stack, 2010. Available online at www.openfabrics.org.

[19]

Oprofile, 0.9.6. http://oprofile.sourceforge.net, 2011.

[20]

J. Ousterhout et al. The case for RAMCloud. Commun. ACM, 54:121--130, July 2011.

Digital Library

[21]

K. K. Ram et al. Achieving 10 Gb/s using safe and transparent network interface virtualization. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE '09, pages 61--70, New York, NY, USA, 2009. ACM.

Digital Library

[22]

S. K. Reinhardt et al. Tempest and Typhoon: user-level shared memory. In Proceedings of the 21st annual international symposium on Computer architecture, ISCA '94, pages 325--336, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.

Digital Library

[23]

B. Tiwana et al. Location, location, location!: modeling data proximity in the cloud. In Proceedings of the Ninth ACM SIGCOMM Workshop on Hot Topics in Networks, Hotnets '10, pages 15:1--15:6, New York, NY, USA, 2010. ACM.

Digital Library

[24]

T. von Eicken et al. U-Net: a user-level network interface for parallel and distributed computing. In Proceedings of the fifteenth ACM symposium on Operating systems principles, SOSP '95, pages 40--53, New York, NY, USA, 1995. ACM.

Digital Library

[25]

P. Willmann et al. An evaluation of network stack parallelization strategies in modern operating systems. In Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 8--8, Berkeley, CA, USA, 2006. USENIX Association.

Digital Library

Cited By

Lee SChoi MYeom IKim YBagchi SZhang Y(2024)PeRFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692005(209-225)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692005
Markussen JKristiansen LStensland HHalvorsen P(2024)Multi-Host Sharing of a Single-Function NVMe Device in a PCIe ClusterProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00204(1638-1645)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00204
Rosa LFoschini LCorradi A(2024)Empowering Cloud Computing With Network Acceleration: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.337753126:4(2729-2768)Online publication date: Dec-2025
https://doi.org/10.1109/COMST.2024.3377531
Show More Cited By

Index Terms

A case for RDMA in clouds: turning supercomputer networking into commodity
1. Networks
  1. Network protocols
    1. Transport protocols
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management

Recommendations

Congestion Control for Large-Scale RDMA Deployments
SIGCOMM'15

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-...
Revisiting network support for RDMA
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication

The advent of RoCE (RDMA over Converged Ethernet) has led to a significant increase in the use of RDMA in datacenter networks. To achieve good performance, RoCE requires a lossless network which is in turn achieved by enabling Priority Flow Control (PFC)...
A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

DMA-capable interconnects, providing ultra-low latency and high bandwidth, are increasingly being used in the context of distributed storage and data processing systems. However, the deployment of such systems in virtualized data centers is currently ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

APSys '11: Proceedings of the Second Asia-Pacific Workshop on Systems

July 2011

97 pages

ISBN:9781450311793

DOI:10.1145/2103799

General Chairs:
Haibo Chen
Fudan University, China
,
Zheng Zhang
Microsoft Research Asia, China
,
Program Chairs:
Sue Moon
KAIST, Korea
,
Yuanyuan Zhou
University of California, San Diego, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

USENIX Assoc: USENIX Assoc

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

APSys '11

Sponsor:

USENIX Assoc

APSys '11: Asia Pacific Workshop on Systems

July 11 - 12, 2011

Shanghai, China

Acceptance Rates

Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee SChoi MYeom IKim YBagchi SZhang Y(2024)PeRFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692005(209-225)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692005
Markussen JKristiansen LStensland HHalvorsen P(2024)Multi-Host Sharing of a Single-Function NVMe Device in a PCIe ClusterProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00204(1638-1645)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00204
Rosa LFoschini LCorradi A(2024)Empowering Cloud Computing With Network Acceleration: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.337753126:4(2729-2768)Online publication date: Dec-2025
https://doi.org/10.1109/COMST.2024.3377531
Kim TPark HLee SShin SHur JShin Y(2023)DevIOus: Device-Driven Side-Channel Attacks on the IOMMU2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179283(2288-2305)Online publication date: May-2023
https://doi.org/10.1109/SP46215.2023.10179283
Xu ZYin JLi W(2023)A Load-balancing method for high performance cluster computing system2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI57876.2023.10176628(1775-1779)Online publication date: 26-May-2023
https://doi.org/10.1109/ICETCI57876.2023.10176628
Yin JWang HXu Z(2022)A reconfigurable rack-scale interconnect architecture based on PCIe fabric2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988411(306-310)Online publication date: 28-Oct-2022
https://doi.org/10.1109/ICDSCA56264.2022.9988411
Markussen JKristiansen LHalvorsen PKielland-Gyrud HStensland HGriwodz C(2021)SmartIOACM Transactions on Computer Systems10.1145/346254538:1-2(1-78)Online publication date: 8-Jul-2021
https://dl.acm.org/doi/10.1145/3462545
Xu ZLiu HLiu Y(2021)PCIe P2P Communication for the High Performance Heterogeneous Computing System2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)10.1109/ICAICA52286.2021.9498081(284-288)Online publication date: 28-Jun-2021
https://doi.org/10.1109/ICAICA52286.2021.9498081
Markussen JKristiansen LBorgli RStensland HSeifert FRiegler MGriwodz CHalvorsen P(2019)Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device LendingCluster Computing10.1007/s10586-019-02988-0Online publication date: 21-Sep-2019
https://doi.org/10.1007/s10586-019-02988-0
Trivedi AIoannou NMetzler BStuedi PPfefferle JKourtis KKoltsidas IGross T(2018)FlashNetACM Transactions on Storage10.1145/323956214:4(1-29)Online publication date: 4-Dec-2018
https://dl.acm.org/doi/10.1145/3239562
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten