research-article

IOctopus: Outsmarting Nonuniform DMA

Authors:

Boris Pismenny,

Gerd Zellweger,

Dan TsafrirAuthors Info & Claims

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 101 - 115

https://doi.org/10.1145/3373376.3378509

Published: 13 March 2020 Publication History

Abstract

In a multi-CPU server, memory modules are local to the CPU to which they are connected, forming a nonuniform memory access (NUMA) architecture. Because non-local accesses are slower than local accesses, the NUMA architecture might degrade application performance. Similar slowdowns occur when an I/O device issues nonuniform DMA (NUDMA) operations, as the device is connected to memory via a single CPU. NUDMA effects therefore degrade application performance similarly to NUMA effects.

We observe that the similarity is not inherent but rather a product of disregarding the intrinsic differences between I/O and CPU memory accesses. Whereas NUMA effects are inevitable, we show that NUDMA effects can and should be eliminated. We present IOctopus, a device architecture that makes NUDMA impossible by unifying multiple physical PCIe functions-one per CPU-in manner that makes them appear as one, both to the system software and externally to the server. IOctopus requires only a modest change to the device driver and firmware. We implement it on existing hardware and demonstrate that it improves throughput and latency by as much as 2.7x and 1.28x, respectively, while ridding developers from the need to combat (what appeared to be) an unavoidable type of overhead.

References

[1]

Atul Adya, Daniel Myers, Henry Qin, and Robert Grandl. Fast key-value stores: An idea whose time has come and gone (HotOS'19 talk slides). https://ai.google/research/pubs/pub48030. (Accessed: Aug 2019).

[2]

Ardsher Ahmed, Pat Conway, Bill Hughes, and Fred Weber. AMD Opteron shared memory MP systems. In Hot Chips, 2002. http://www.hotchips.org/wp-content/uploads/hc_archives/hc14/3_Tue/28_AMD_Hammer_MP_HC_v8.pdf (Accessed: Jan 2017).

[3]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A scalable processing-in-memory accelerator for parallel graph processing. In ACM International Symposium on Computer Architecture (ISCA), pages 105--117, 2015. https://doi.org/10.1145/2872887.2750386.

Digital Library

[4]

Brian Aker. Memslap - load testing and benchmarking a server. http://docs.libmemcached.org/bin/memslap.html. Accessed: August 2016.

[5]

Paul Alcorn. Intel Xeon Platinum 8176 scalable processor review -- the mesh topology & UPI. Tom's Hardware, https://www.tomshardware.com/reviews/intel-xeon-platinum-8176-scalable-cpu,5120--4.html, Jul 2017. (Accessed: Jan 2019).

[6]

Ethernet Alliance. The 2018 Ethernet Roadmap. https://ethernetalliance.org/the-2018-ethernet-roadmap/, 2018. Accessed: January 2019.

[7]

Amazon. Physical cores by Amazon EC2 and RDS DB instance type. https://aws.amazon.com/ec2/physicalcores/, 2019. Accessed: January 2019.

[8]

AMD. AMD EPYC 7000 Series: Product Specifications. https://www.amd.com/en/products/epyc-7000-series, 2019. Accessed: January 2020.

[9]

ARM. ARM Cortex-A Series Programmer's Guide for ARMv8-A: Cacheable and shareable memory attributes. https://developer.arm.com/docs/den0024/latest/memory-ordering/memory-attributes/cacheable-and-shareable-memory-attributes. (Accessed: Jan 2020).

[10]

Jens Axboe. fio - Flexible IO Tester. http://git.kernel.dk/cgit/fio/, 2019. Accessed: August, 2019.

[11]

Amitabha Banerjee, Rishi Mehta, and Zach Shen. NUMA aware I/O in virtualized systems. In High-Performance Interconnects (HOTI). IEEE, 2015. https://doi.org/10.1109/HOTI.2015.17.

Digital Library

[12]

Scott Beamer, Krste Asanović, and David Patterson. The GAP Benchmark Suite. arXiv e-prints, 2015. http://arxiv.org/abs/1508.03619.

[13]

John Beckett. NUMA best practices for Dell PowerEdge 12th generation servers: Tuning the Linux OS for optimal performance with NUMA systems. http://en.community.dell.com/techcenter/extras/m/white_papers/20266946, 2012. Accessed: January 2019.

[14]

Daniel Berrangé. Openstack performance optimization: NUMA, large pages & CPU pinning. KVM Forum, 2014. Accessed: January 2017.

[15]

Timothy Brecht. On the importance of parallel application placement in NUMA multiprocessors. In USENIX Experiences with Distributed and Multiprocessor Systems (SEDMS), 1993. https://www.usenix.org/legacy/publications/library/proceedings/sedms4/full_papers/brecht.txt.

Digital Library

[16]

Broadcom. M150PM - 1 x 50gbe OCP 2.0 Multi-Host Adapter. https://www.broadcom.com/products/ethernet-connectivity/network-adapters/50gb-nic-ocp/m150pm, 2018. Accessed: January, 2020.

[17]

F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, and R. Namyst. hwloc: A generic framework for managing hardware affinities in HPC applications. In Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010. http://doi.org//10.1109/PDP.2010.67.

Digital Library

[18]

Georgios Chatzopoulos, Rachid Guerraoui, Tim Harris, and Vasileios Trigonakis. Abstracting multi-core topologies with MCTOP. In European Conference on Computer Systems (EuroSys), 2017. https://doi.org/10.1145/3064176.3064194.

Digital Library

[19]

Cisco Systems, Inc. Understanding EtherChannel load balancing and redundancy on catalyst switches. http://www.cisco.com/c/dam/en/us/support/docs/lan-switching/etherchannel/12023--4-01.pdf, 2007. Accessed: January 2019.

[20]

Jonathan Corbet. AutoNUMA: the other approach to NUMA scheduling. http://lwn.net/Articles/488709, 2012.

[21]

Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013. http://dx.doi.org/10.1145/2451116.2451157.

[22]

Dell Engineering. Personal email communication, 2020.

[23]

Linux network teaming driver. http://libteam.org/files/teamdev.pp.pdf, 2013. Accessed: April 2019.

[24]

Dror G. Feitelson, Larry Rudolph, Uwe Schwiegelshohn, Kenneth C. Sevcik, and Parkson Wong. Theory and practice in parallel job scheduling. In Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), pages 1--34, 1997. https://doi.org/10.1007/3--540--63574--2_14.

[25]

Brad Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124):5, Aug 2004. http://dl.acm.org/citation.cfm?id=1012889.1012894.

Digital Library

[26]

Network RSS. https://wiki.freebsd.org/NetworkRSS. Accessed: January 2017.

[27]

NUMA. https://wiki.freebsd.org/NUMA. Accessed: January 2019.

[28]

ioMemory VSL: peak perforamnce guide. https://support.fusionio.com/load/-media-/2fk40u/docsConfluence/ioMemory_VSL_Peak_Performance_Guide_2013-08--20.pdf, 2013. Accessed: January 2019.

[29]

Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, and Mark Roth. Challenges of memory management on modern NUMA systems. Communications of the ACM, 58(12):59--66, 2015. https://doi.org/10.1145/2814328.

Digital Library

[30]

Brice Goglin and Stéphanie Moreaud. Dodging non-uniform I/O access in hierarchical collective operations for multicore clusters. In International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW). IEEE, 2011. https://doi.org/10.1109/IPDPS.2011.222.

Digital Library

[31]

Jiri Herrmann, Yehuda Zimmerman, Parker Parker, and Scott Radvan. Virtualization tuning and performance guide. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/, 2016. Accessed: January 2017.

[32]

HyperTransport Consortium. http://www.hypertransport.org (Accessed: Jan 2017).

[33]

IEEE Std 802.3ad-2000: Amendment to carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications-aggregation of multiple link segments, 2000. https://doi.org/10.1109/IEEESTD.2000.91610.

[34]

Creahan Research Inc. 400GbE to drive the majority of data center ethernet switch bandwidth within five years. http://www.crehanresearch.com/wp-content/uploads/2018/01/CREHAN-Data-Center-Networking-January-2018.pdf, 2018. Accessed: January 2019.

[35]

Intel. DPDK: Data plane development kit. http://dpdk.org. (Accessed: May 2016).

[36]

Intel. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html, Jan 2011.

[37]

Intel. Intel data direct I/O technology (Intel DDIO): A primer. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf, 2012. Accessed: January 2019.

[38]

Intel. Intel Ethernet Controller XL710 . http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710--10--40-controller-datasheet.pdf, 2016. Accessed: January, 2019.

[39]

Intel. Intel ARK: Product Specifications. http://ark.intel.com/, 2017. Accessed: January 2019.

[40]

Intel. Intel Xeon processor scalable family technical overview. https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview, Sep 2017. (Accessed: Jan 2019).

[41]

Intel. BIOS setup utility user guide for the Intel server board S2600 family supporting the Intel Xeon processor scalable family. https://www.intel.com/content/www/us/en/support/articles/000025892/server-products.html, Aug 2018. Article ID 000025892. Accessed: Jan, 2020.

[42]

Rick A. Jones. Netperf: A network performance benchmark (Revision 2.0). http://www.netperf.org/netperf/training/Netperf.html, 1995. Accessed: August, 2016.

[43]

Patryk Kaminski. NUMA aware heap memory manager. AMD Developer Central, page 46, 2009. https://developer.amd.com/wordpress/media/2012/10/NUMA_aware_heap_memory_manager_article_final.pdf (Accessed: Jan 2019).

[44]

Andi Kleen. A NUMA API for Linux. Novel Inc, 2005.

[45]

Maciek Konstantynowicz, Patrick Lu, and Shrikant M. Shah. Benchmarking and analysis of software data planes. https://fd.io/wp-content/uploads/sites/34/2018/01/performance_analysis_sw_data_planes_dec21_2017.pdf, Dec 2017. Whilte paper from FD.io -- The Fast Data I/O Project.

[46]

Fritz Kruger. CPU bandwidth - the worrisome 2020 trend. https://blog.westerndigital.com/cpu-bandwidth-the-worrisome-2020-trend, 2016. Accessed: January 2017.

[47]

Andrey Kudryavtsev. An Introduction to Dual-Port NVMe SSD. https://itpeernetwork.intel.com/an-introduction-to-dual-port-nvme-ssd, 2016. Accessed: August, 2019.

[48]

Renaud Lachaize, Baptiste Lepers, and Vivien Quéma. MemProf: A memory profiler for NUMA multicore systems. In USENIX Annual Technical Conference (USENIX ATC), 2012.

[49]

Baptiste Lepers, Vivien Quéma, and Alexandra Fedorova. Thread and memory placement on NUMA systems: Asymmetry matters. In USENIX Annual Technical Conference (USENIX ATC), 2015.

[50]

Linux Team infrastructure specification. https://github.com/jpirko/libteam/wiki/Infrastructure-Specification. Accessed: August 2019.

[51]

Linux Ethernet bonding driver HOWTO. https://www.kernel.org/doc/Documentation/networking/bonding.txt. Accessed: April 2019.

[52]

Page migration. https://www.kernel.org/doc/Documentation/vm/page_migration. Accessed: January 2017.

[53]

Scaling in the Linux networking stack. https://www.kernel.org/doc/Documentation/networking/scaling.txt. Accessed: January 2017.

[54]

Zoltan Majo and Thomas R. Gross. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In International Symposium on Memory Management (ISMM). ACM, 2011. https://doi.org/10.1145/2076022.1993481.

[55]

Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. Disk|Crypt|Net: Rethinking the stack for high-performance video streaming. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications, pages 211--224, 2017. https://doi.org/10.1145/3098822.3098844.

[56]

John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/, 1991--2007. Accessed: August 2019.

[57]

Mellanox. Sockperf Network Benchmarking Utility. https://github.com/Mellanox/sockperf. Accessed: August, 2019.

[58]

Mellanox. Press release: Introduction of ConnectX-3 40GbE. http://www.mellanox.com/page/press_release_item?id=1009, 2013. Accessed: January 2019.

[59]

Mellanox. Press release: Introduction of ConnectX-4 100GbE. http://www.mellanox.com/page/press_release_item?id=1416, 2014. Accessed: January 2019.

[60]

Mellanox. Introducing ConnectX-4 Ethernet SRIOV. https://lwn.net/Articles/666180/, 2015. Accessed: August, 2019.

[61]

Mellanox. Mellanox ConnectX-4 VPI Adapter. http://www.mellanox.com/related-docs/prod_silicon/PB_ConnectX-4_VPI_IC.pdf, 2016. Accessed: January, 2017.

[62]

Mellanox. Mellanox multi-host evaluation kit. http://www.mellanox.com/related-docs/prod_adapter_cards/PB_Multi-Host_EVB_Kit.pdf, 2016. Accessed: January, 2017.

[63]

Mellanox. Mellanox ConnectX-5 VPI Socket Direct Adapter. https://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card_SocketDirect.pdf, 2018. Accessed: January, 2019.

[64]

Mellanox. Mellanox Socket Direct Adapters. http://www.mellanox.com/page/products_dyn?product_family=285&mtag=socketdc, 2018. Accessed: January, 2019.

[65]

Mellanox. Product brief: ConnectX-6 200Gb/s. www.mellanox.com/related-docs/prod_silicon/PB_ConnectX-6_EN_IC.pdf, 2018. Accessed: January 2019.

[66]

Mellanox. Mellanox ConnectX-6 EN Card. https://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-6_EN_Card.pdf, 2019. Accessed: August, 2019.

[67]

MS TechNet: receive side scaling. https://technet.microsoft.com/en-us/library/hh997036.aspx. Accessed: January 2017.

[68]

MSDN: NUMA support. https://msdn.microsoft.com/en-us/library/windows/desktop/aa363804(v=vs.85).aspx. Accessed: January 2019.

[69]

Tomer Y Morad, Gil Shomron, Mattan Erez, Avinoam Kolodny, and Uri C Weiser. Optimizing Read-Once Data Flow in Big-Data Applications. IEEE Computer Architecture Letters, 2016. https://doi.org/10.1109/LCA.2016.2520927.

[70]

Timothy Prickett Morgan. Intel to challenge AMD with 48 core "Cascade Lake" Xeon AP. https://www.nextplatform.com/2018/11/05/intel-to-challenge-amd-with-48-core-cascade-lake-xeon-ap/, 2018. Accessed: January 2019.

[71]

David Mytton. Network performance at AWS, Google, Rackspace and Softlayer. https://blog.serverdensity.com/network-performance-aws-google-rackspace-softlayer, Apr 2014. (Accessed: Jan 2019).

[72]

NVM Express Workgroup. NVM Express (NVMe) specification -- Revision 1.2. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_2-Gold-20141209.pdf, Nov 2014. Accessed: Jan 2015.

[73]

Robert Olsson. Pktgen the Linux packet generator. In Ottawa Linux Symposium (OLS), pages 19--32, 2005.

[74]

I/O (PCIe) based NUMA scheduling. https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/input-output-based-numa-scheduling.html, 2015. Accessed: January 2017.

[75]

Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, and Martin Casado. The design and implementation of open vSwitch. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015.

[76]

Vijay Rao and Edwin Smith. Facebook's new front-end server design delivers on performance without sucking up power. https://engineering.fb.com/data-center-engineering/facebook-s-new-front-end-server-design-delivers-on-performance-without-sucking-up-power, 2016. Accessed: August, 2019.

[77]

Y. Ren, T. Li, D. Yu, S. Jin, and T. Robertazzi. Design, implementation, and evaluation of a NUMA-aware cache for iSCSI storage servers. IEEE Transactions on Parallel and Distributed Systems, 2015. https://doi.org/10.1109/TPDS.2014.2311817.

[78]

Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. High-speed query processing over high-speed networks. Proceedings of the VLDB Endowment, 9(4), 2015. https://doi.org/10.14778/2856318.2856319.

Digital Library

[79]

Samsung. PM1725a NVMe SSD. https://www.samsung.com/semiconductor/global.semi.static/Samsung_PM1725a_NVMe_SSD-0.pdf, 2017. Accessed: August, 2019.

[80]

Lee T Schermerhorn. Automatic page migration for Linux [a matter of hygiene]. In linux.conf.au, 2007.

[81]

Lance Shelton. High performance I/O with NUMA systems in Linux. Linux Foundation Event, 2013.

[82]

Ronak Singhal. Inside Intel next generation Nehalem microarchitecture. In Hot Chips, 2008. http://www.hotchips.org/wp-content/uploads/hc_archives/hc20/3_Tues/HC20.26.630.pdf (Accessed: Jan 2017).

[83]

Jeff Squyres. Process and memory affinity: why do you care? High Performance Computing Networking--Cisco Blog, 2013. http://blogs.cisco.com/performance/process-and-memory-affinity-why-do-you-care (Accessed: Jan 2017).

[84]

Roman Sudarikov and Patrick Lu. Hardware-Level Performance Analysis of Platform I/O. https://static.sched.com/hosted_files/dpdkprcsummit2018/f6/Roman%20Sudarikov%20-%20DPDK_PRC_Summit_Sudarikov.pptx, 2018. Accessed: August, 2019.

[85]

David Tam, Reza Azimi, and Michael Stumm. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In European Conference on Computer Systems (EuroSys), 2007. https://doi.org/10.1145/1272998.1273004.

Digital Library

[86]

Lingjia Tang, Jason Mars, Xiao Zhang, Robert Hagmann, Robert Hundt, and Eric Tune. Optimizing Google's warehouse scale computers: The NUMA experience. In IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2013. https://doi.org/10.1109/HPCA.2013.6522318.

Digital Library

[87]

VMware Technical Publications. Tuning vCloud NFV for data plane intensive workloads, Open Stack Edition. https://docs.vmware.com/en/VMware-vCloud-NFV-OpenStack-Edition/3.0/vmwa-vcloud-nfv30-performance-tunning.pdf. White Paper. Accessed: January 2019.

[88]

VMware Technical Publications. VMware ESX Server 2 NUMA support. http://www.vmware.com/pdf/esx2_NUMA.pdf. White Paper. Accessed: January 2019.

[89]

VMware Technical Publications. vSphere Resource Management: How ESXi NUMA Scheduling Works. https://docs.vmware.com/en/VMware-vSphere/6.7/vsphere-esxi-vcenter-server-671-resource-management-guide.pdf. White Paper. Accessed: January 2019.

[90]

Andreas Wittig. EC2 network performance cheat sheet. https://cloudonaut.io/ec2-network-performance-cheat-sheet/, 2018. Accessed: January 2019.

[91]

Bruce Worthington. NUMA I/O optimizations. Windows Hardware Engineering Conference (WinHEC), 2007. Accessed: January 2017.

[92]

Xen NUMA roadmap: IONUMA support. https://wiki.xen.org/wiki/Xen_on_NUMA_Machines. Accessed: January 2017.

[93]

Xen on NUMA machines. https://wiki.xen.org/wiki/Xen_NUMA_Roadmap. Accessed: January 2017.

[94]

Dimitrios Ziakas, Allen Baum, Robert A. Maddox, and Robert J. Safranek. Intel QuickPath Interconnect architectural features supporting scalable system architectures. In IEEE Symposium on High Performance Interconnects (HOTI), pages 1--6, 2010. http://dx.doi.org/10.1109/HOTI.2010.24.

Digital Library

Cited By

Wang QLu YLi JXie MShu J(2022)Nap: Persistent Memory Indexes for NUMA ArchitecturesACM Transactions on Storage10.1145/350792218:1(1-35)Online publication date: 29-Jan-2022
https://dl.acm.org/doi/10.1145/3507922
Eran HFudim MMalka GShalom GCohen NHermony ALevi DLiss LSilberstein MFalsafi BFerdman MLu SWenisch T(2022)FlexDriver: a network driver for your acceleratorProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507776(1115-1129)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507776
Pismenny BLiss LMorrison ATsafrir DFalsafi BFerdman MLu SWenisch T(2022)The benefits of general-purpose on-NIC memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507711(1130-1147)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507711
Show More Cited By

Index Terms

IOctopus: Outsmarting Nonuniform DMA
1. Hardware
  1. Communication hardware, interfaces and storage
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Input / output

Recommendations

M&MMs: navigating complex memory spaces with hwloc
MEMSYS '19: Proceedings of the International Symposium on Memory Systems

The complexity of the memory system has increased dramatically in the last decade. As a result, high-performance computers include multi-level, heterogeneous, and non-uniform memories, each with significantly different properties. For example, a memory ...
Scale-out NUMA
ASPLOS '14

Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive memory footprint requires frequent server-to-server communication in applications such as ...
Scale-out NUMA
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive memory footprint requires frequent server-to-server communication in applications such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

March 2020

1412 pages

ISBN:9781450371025

DOI:10.1145/3373376

General Chair:
James Larus
EPFL
,
Program Chairs:
Luis Ceze
University of Washington
,
Karin Strauss
Microsoft

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Israel Science Foundation

Conference

ASPLOS '20

Sponsor:

ASPLOS '20: Architectural Support for Programming Languages and Operating Systems

March 16 - 20, 2020

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
882
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang QLu YLi JXie MShu J(2022)Nap: Persistent Memory Indexes for NUMA ArchitecturesACM Transactions on Storage10.1145/350792218:1(1-35)Online publication date: 29-Jan-2022
https://dl.acm.org/doi/10.1145/3507922
Eran HFudim MMalka GShalom GCohen NHermony ALevi DLiss LSilberstein MFalsafi BFerdman MLu SWenisch T(2022)FlexDriver: a network driver for your acceleratorProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507776(1115-1129)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507776
Pismenny BLiss LMorrison ATsafrir DFalsafi BFerdman MLu SWenisch T(2022)The benefits of general-purpose on-NIC memoryProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507711(1130-1147)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507711
Pismenny BEran HYehezkel ALiss LMorrison ATsafrir DSherwood TBerger EKozyrakis C(2021)Autonomous NIC offloadsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446732(18-35)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446732
Yuan YAlian MWang YWang RKurakin ITai CKim NMartínez JDuato JJohn L(2021)Don't forget the I/O when allocating your LLCProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00018(112-125)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00018
Nelson JPalmieri R(2020)Performance Evaluation of the Impact of NUMA on One-sided RDMA Interactions2020 International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS51746.2020.00036(288-298)Online publication date: Sep-2020
https://doi.org/10.1109/SRDS51746.2020.00036

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten