skip to main content
10.1145/2535372.2535412acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

On limitations of network acceleration

Published:09 December 2013Publication History

ABSTRACT

The performance of large-scale data-intensive applications running on thousands of machines depends considerably on the performance of the network. To deliver better application performance on rapidly evolving high-bandwidth, low-latency interconnects, researchers have proposed the use of network accelerator devices. However, despite the initial enthusiasm, translating network accelerator's capabilities into high application performance remains a challenging issue.

In this paper, we describe our experience and discuss issues that we uncover with network acceleration using Remote Direct Memory Access (RDMA) capable network controllers (RNICs). RNICs offload the complete packet processing into network controllers, and provide direct userspace access to the networking hardware. Our analysis shows that multiple (un)related factors significantly influence the performance gains for the end-application. We identify factors that span the whole stack, ranging from low-level architectural issues (cache and DMA interaction, hardware pre-fetching) to the high-level application parameters (buffer size, access pattern). We discuss implications of our findings upon application performance and the future of integration of network acceleration technology within the systems.

Skip Supplemental Material Section

Supplemental Material

References

  1. perf: Linux profiling with performance counters. http://perf.wiki.kernel.org/.Google ScholarGoogle Scholar
  2. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. D. Bachand, S. Bilgin, R. Greiner, P. Hammarlund, D. L. Hill, T. Huff, S. Kulick, and R. Safranek. The Uncore: A Modular Approach to Feeding The High-Performance Cores. In Intel Technology Journal, Volume 14, Issue 3, pages 30--49, 2010.Google ScholarGoogle Scholar
  4. J. Brown, S. Woodward, B. Bass, and C. Johnson. IBM Power Edge of Network Processor: A Wire-Speed System on a Chip. Micro, IEEE, 31(2):76--85, March-April. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In Proceedings of 21st SOSP 2007, pages 205--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Freimuth, E. Hu, J. LaVoie, R. Mraz, E. Nahum, P. Pradhan, and J. Tracey. Server Network Scalability and TCP Offload. In Proceedings of the USENIX Annual Technical Conference, ATC '05, pages 209--222, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. W. Frey, A. Hasler, B. Metzler, and G. Alonso. Server-efficient high-definition media dissemination. In Proceedings of the 18th NOSSDAV '09, pages 49--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Intel. Intel Xeon Processor 7500 Series Uncore Programming Guide at http://www.intel.com/Assets/en_US/PDF/designguide/323535.pdf.Google ScholarGoogle Scholar
  9. Intel weaves strategy to put interconnect fabrics on chip. http://www.hpcwire.com/hpcwire/2012-09-10/intel_weaves_strategy_to_put_interconnect_fabrics_on_chip.html, 2012.Google ScholarGoogle Scholar
  10. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM EuroSys 2007, pages 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Liu, D. Poff, and B. Abali. Evaluating high performance communication: a power perspective. In Proceedings of the 23rd ICS 2009, pages 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD, pages 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Mitchell, Y. Geng, and J. Li. Using one-sided rdma reads to build a fast, cpu-efficient key-value store. In Proceedings of the 2013 USENIX Annual Technical Conference, USENIX ATC'13, pages 103--114, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. C. Mogul. Tcp offload is a dumb idea whose time has come. In Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9, HotOS'03, pages 5--5, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Netperf, 2.4.5. http://www.netperf.org/netperf/.Google ScholarGoogle Scholar
  17. J. Ousterhout et al. The case for ramclouds: scalable high-performance storage entirely in dram. SIGOPS Oper. Syst. Rev., 43(4):92--105, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. RDMA for GPUDirect, CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/gpudirect-rdma/index.html, 2013.Google ScholarGoogle Scholar
  19. (R)DMA in userspace on Linux RDMA mailing list. http://comments.gmane.org/gmane.linux.drivers.rdma/13635, october, 2012.Google ScholarGoogle Scholar
  20. Redis in memory key-value store. http://redis.io.Google ScholarGoogle Scholar
  21. S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It's time for low latency. In Proc. of the 13th HotOS, pages 11--11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Shinde, A. Kaufmann, T. Roscoe, and S. Kaestle. We Need to Talk About NICs. In Proceedings of the 14th USENIX workshop on Hot Topics in Operating Systems, HotOS'13, pages 1--1, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Stuedi, B. Metzler, and A. Trivedi. jVerbs: Ultra-low Latency for Data Center Applications. In Proceedings of the 4th ACM Symposium on Cloud Computing, SOCC'13, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Stuedi, A. Trivedi, and B. Metzler. Wimpy nodes with 10GbE: leveraging one-sided operations in soft-RDMA to boost memcached. In Proceedings of the USENIX ATC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Trivedi, B. Metzler, and P. Stuedi. A case for RDMA in clouds: turning supercomputer networking into commodity. In Proc. of the 2nd APSys, pages 17:1--17:5, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Trivedi, P. Stuedi, B. Metzler, R. Pletka, B. G. Fitch, and T. R. Gross. Unified High-Performance I/O: One Stack to Rule Them All. In Proceedings of the 14th USENIX workshop on Hot Topics in Operating Systems, HotOS'13, pages 4--4, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX NSDI, pages 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On limitations of network acceleration

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CoNEXT '13: Proceedings of the ninth ACM conference on Emerging networking experiments and technologies
      December 2013
      454 pages
      ISBN:9781450321013
      DOI:10.1145/2535372

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 December 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CoNEXT '13 Paper Acceptance Rate44of226submissions,19%Overall Acceptance Rate198of789submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader