skip to main content
10.1145/2103799.2103820acmotherconferencesArticle/Chapter ViewAbstractPublication PagesapsysConference Proceedingsconference-collections
research-article

A case for RDMA in clouds: turning supercomputer networking into commodity

Published: 11 July 2011 Publication History

Abstract

Modern cloud computing infrastructures are steadily pushing the performance of their network stacks. At the hardware-level, already some cloud providers have upgraded parts of their network to 10GbE. At the same time there is a continuous effort within the cloud community to improve the network performance inside the virtualization layers. The low-latency/high-throughput properties of those network interfaces are not only opening the cloud for HPC applications, they will also be well received by traditional large scale web applications or data processing frameworks. However, as commodity networks get faster the burden on the end hosts increases. Inefficient memory copying in socket-based networking takes up a significant fraction of the end-to-end latency and also creates serious CPU load on the host machine. Years ago, the supercomputing community has developed RDMA network stacks like Infiniband that offer both low end-to-end latency as well as a low CPU footprint. While adopting RDMA to the commodity cloud environment is difficult (costly, requires special hardware) we argue in this paper that most of the benefits of RDMA can in fact be provided in software. To demonstrate our findings we have implemented and evaluated a prototype of a software-based RDMA stack. Our results, when compared to a socket/TCP approach (with TCP receive copy offload) show significant reduction in end-to-end latencies for messages greater than modest 64kB and reduction of CPU load (w/o TCP receive copy offload) for better efficiency while saturating the 10Gbit/s link.

References

[1]
Intel Corporation. Intel QuickData Technology Software Guide for Linux* at http://www.intel.com/technology/quickdata/whitepapers/sw_guide_linux.pdf, 2008.
[2]
D. G. Andersen et al. FAWN: a fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP '09, pages 1--14, New York, NY, USA, 2009. ACM.
[3]
P. Balaji. Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck. In In RAIT workshop, 2004.
[4]
D. Beaver et al. Finding a needle in Haystack: Facebook's photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.
[5]
G. Buzzard et al. An implementation of the Hamlyn sender-managed interface architecture. In Proceedings of the second USENIX symposium on Operating systems design and implementation, OSDI '96, pages 245--259, New York, NY, USA, 1996. ACM.
[6]
H.-k. J. Chu. Zero-copy TCP in Solaris. In Proceedings of the 1996 annual conference on USENIX Annual Technical Conference, pages 21--21, Berkeley, CA, USA, 1996. USENIX Association.
[7]
D. D. Clark et al. Architectural considerations for a new generation of protocols. In Proceedings of the ACM symposium on Communications architectures & protocols, SIGCOMM '90, pages 200--208, New York, NY, USA, 1990. ACM.
[8]
D. Dalessandro et al. iWARP protocol kernel space software implementation. In Proceedings of the 20th international conference on Parallel and distributed processing, IPDPS'06, pages 274--274, Washington, DC, USA, 2006. IEEE Computer Society.
[9]
P. Druschel et al. Experiences with a high-speed network adaptor: a software perspective. In Proceedings of the conference on Communications architectures, protocols and applications, SIGCOMM '94, pages 2--13, New York, NY, USA, 1994. ACM.
[10]
P. W. Frey et al. Minimizing the Hidden Cost of RDMA. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, ICDCS '09, pages 553--560, Washington, DC, USA, 2009. IEEE Computer Society.
[11]
IETF. Remote direct data placement working group. http://datatracker.ietf.org/wg/rddp/charter/.
[12]
J. Liu et al. Evaluating high performance communication: a power perspective. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 326--337, New York, NY, USA, 2009. ACM.
[13]
K. Magoutis. The Case Against User-level Networking. In In Third Workshop on Novel Uses of System Area Networks (SAN-3), 2004.
[14]
A. Menon et al. Optimizing network virtualization in Xen. In Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 2--2, Berkeley, CA, USA, 2006. USENIX Association.
[15]
B. Metzler, P. Frey, and A. Trivedi. SoftiWARP - Project Update, 2010. Available online at http://www.openfabrics.org/OFA-Events-sonoma2010.html.
[16]
J. C. Mogul. TCP offload is a dumb idea whose time has come. In Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9, pages 5--5, Berkeley, CA, USA, 2003. USENIX Association.
[17]
Netperf, 2.4.5. http://www.netperf.org/netperf/, 2011.
[18]
OpenFabric Alliance. OpenFabrics Enterprise Distribution (OFED) Stack, 2010. Available online at www.openfabrics.org.
[19]
Oprofile, 0.9.6. http://oprofile.sourceforge.net, 2011.
[20]
J. Ousterhout et al. The case for RAMCloud. Commun. ACM, 54:121--130, July 2011.
[21]
K. K. Ram et al. Achieving 10 Gb/s using safe and transparent network interface virtualization. In Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE '09, pages 61--70, New York, NY, USA, 2009. ACM.
[22]
S. K. Reinhardt et al. Tempest and Typhoon: user-level shared memory. In Proceedings of the 21st annual international symposium on Computer architecture, ISCA '94, pages 325--336, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
[23]
B. Tiwana et al. Location, location, location!: modeling data proximity in the cloud. In Proceedings of the Ninth ACM SIGCOMM Workshop on Hot Topics in Networks, Hotnets '10, pages 15:1--15:6, New York, NY, USA, 2010. ACM.
[24]
T. von Eicken et al. U-Net: a user-level network interface for parallel and distributed computing. In Proceedings of the fifteenth ACM symposium on Operating systems principles, SOSP '95, pages 40--53, New York, NY, USA, 1995. ACM.
[25]
P. Willmann et al. An evaluation of network stack parallelization strategies in modern operating systems. In Proceedings of the annual conference on USENIX '06 Annual Technical Conference, pages 8--8, Berkeley, CA, USA, 2006. USENIX Association.

Cited By

View all
  • (2024)PeRFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692005(209-225)Online publication date: 10-Jul-2024
  • (2024)Multi-Host Sharing of a Single-Function NVMe Device in a PCIe ClusterProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00204(1638-1645)Online publication date: 17-Nov-2024
  • (2024)Empowering Cloud Computing With Network Acceleration: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.337753126:4(2729-2768)Online publication date: Dec-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APSys '11: Proceedings of the Second Asia-Pacific Workshop on Systems
July 2011
97 pages
ISBN:9781450311793
DOI:10.1145/2103799
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. commodity clouds
  3. ethernet networking

Qualifiers

  • Research-article

Conference

APSys '11
Sponsor:
  • USENIX Assoc
APSys '11: Asia Pacific Workshop on Systems
July 11 - 12, 2011
Shanghai, China

Acceptance Rates

Overall Acceptance Rate 169 of 430 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PeRFProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692005(209-225)Online publication date: 10-Jul-2024
  • (2024)Multi-Host Sharing of a Single-Function NVMe Device in a PCIe ClusterProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00204(1638-1645)Online publication date: 17-Nov-2024
  • (2024)Empowering Cloud Computing With Network Acceleration: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2024.337753126:4(2729-2768)Online publication date: Dec-2025
  • (2023)DevIOus: Device-Driven Side-Channel Attacks on the IOMMU2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179283(2288-2305)Online publication date: May-2023
  • (2023)A Load-balancing method for high performance cluster computing system2023 IEEE 3rd International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI57876.2023.10176628(1775-1779)Online publication date: 26-May-2023
  • (2022)A reconfigurable rack-scale interconnect architecture based on PCIe fabric2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)10.1109/ICDSCA56264.2022.9988411(306-310)Online publication date: 28-Oct-2022
  • (2021)SmartIOACM Transactions on Computer Systems10.1145/346254538:1-2(1-78)Online publication date: 8-Jul-2021
  • (2021)PCIe P2P Communication for the High Performance Heterogeneous Computing System2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)10.1109/ICAICA52286.2021.9498081(284-288)Online publication date: 28-Jun-2021
  • (2019)Flexible device compositions and dynamic resource sharing in PCIe interconnected clusters using Device LendingCluster Computing10.1007/s10586-019-02988-0Online publication date: 21-Sep-2019
  • (2018)FlashNetACM Transactions on Storage10.1145/323956214:4(1-29)Online publication date: 4-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media