skip to main content
10.1145/3445814.3446732acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Autonomous NIC offloads

Published: 17 April 2021 Publication History

Abstract

CPUs routinely offload to NICs network-related processing tasks like packet segmentation and checksum. NIC offloads are advantageous because they free valuable CPU cycles. But their applicability is typically limited to layer≤4 protocols (TCP and lower), and they are inapplicable to layer-5 protocols (L5Ps) that are built on top of TCP. This limitation is caused by a misfeature we call ”offload dependence,” which dictates that L5P offloading additionally requires offloading the underlying layer≤4 protocols and related functionality: TCP, IP, firewall, etc. The dependence of L5P offloading hinders innovation, because it implies hard-wiring the complicated, ever-changing implementation of the lower-level protocols.
We propose ”autonomous NIC offloads,” which eliminate offload dependence. Autonomous offloads provide a lightweight software-device architecture that accelerates L5Ps without having to migrate the entire layer≤4 TCP/IP stack into the NIC. A main challenge that autonomous offloads address is coping with out-of-sequence packets. We implement autonomous offloads for two L5Ps: (i) NVMe-over-TCP zero-copy and CRC computation, and (ii) https authentication, encryption, and decryption. Our autonomous offloads increase throughput by up to 3.3x, and they deliver CPU consumption and latency that are as low as 0.4x and 0.7x, respectively. Their implementation is already upstreamed in the Linux kernel, and they will be supported in the next-generation of Mellanox NICs.

References

[1]
D. Eastlake 3rd and P. Jones. 2001. US Secure Hash Algorithm 1 (SHA1). RFC 3174. Internet Engineering Task Force. 22 pages. http://www.rfc-editor.org/rfc/rfc3174.txt.
[2]
Aizman Alex and Yusupov Dmitry. 2005. Open-iSCSI High-Performance Initiator for Linux. https://lwn.net/Articles/126530/. Accessed: 2020-03-24.
[3]
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 63?74. https://doi.org/10.1145/1851275.1851192.
[4]
Muhammad Shoaib Bin Altaf and David A. Wood. 2017. LogCA: A High-Level Performance Model for Hardware Accelerators. In ACM International Symposium on Computer Architecture (ISCA). 375?388. https://doi.org/10.1145/3079856.3080216.
[5]
Jens Axboe. 2014. Fio - Flexible I/O tester. https://fio.readthedocs.io/en/latest/fio_doc.html.
[6]
John Baldwin. 2019. TLS in the kernel. https://reviews.freebsd.org/D21277. FreeBSD Kernel patches. Accessed: 2020-03-24.
[7]
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In USENIX Symposium on Operating System Design and Implementation (OSDI). 49?65. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-belay.pdf.
[8]
Pismenny Boris and Lesokhin Ilya. 2018. TLS offload rx, netdev & mlx5. https://lwn.net/Articles/759052/. Accessed: 2019-08-27.
[9]
Pismenny Boris and Kuperman Yossi. 2018. UDP GSO Offload. https://netdevconf.info/0x12/session.html?udp-segmentation-offload. Accessed: 2020-05-23.
[10]
Peter Breuer, Andrés Marín-López, and Arturo Ares. 2000. The network block device. Linux Journal 73 (05 2000). https://www.linuxjournal.com/article/3778.
[11]
Chelsio Communications. 2018. Chelsio Cryptographic Offload and Acceleration Solution Overview. https://www.chelsio.com/crypto-solution/. Accessed: 2018-12-13.
[12]
Mosharaf Chowdhury and Ion Stoica. 2012. Coflow: A Networking Abstraction for Cluster Applications. In ACM Workshop on Hot Topics in Networks (HotNets). 31??36. https://doi.org/10.1145/2390231.2390237.
[13]
D. D. Clark and D. L. Tennenhouse. 1990. Architectural Considerations for a New Generation of Protocols. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 200?208. https://doi.org/10.1145/99508.99553.
[14]
Edward Cree. 2016. Checksum Offloads. https://www.kernel.org/doc/html/latest/networking/checksum-offloads.html. Accessed: 2020-03-24.
[15]
Joan Daemen and Vincent Rijmen. 2013. The design of Rijndael: AES-the advanced encryption standard. Springer Science & Business Media.
[16]
Watson Dave, Pismenny Boris, Lesokhin Ilya, and Yehezkel Aviad. 2017. Kernel TLS. https://lwn.net/Articles/725721/. Accessed: 2020-03-24.
[17]
Tim Dierks and Eric Rescorla. 2008. The transport layer security (TLS) protocol version 1.2. RFC. Internet Engineering Task Force. 104 pages. https://rfc-editor.org/rfc/rfc5246.txt.
[18]
DPDK VLAN 2014. VLAN Offload Tests. https://dpdk-test-plans.readthedocs.io/en/latest/vlan_test_plan.html. Accessed: 2019-08-30.
[19]
Aleksandar Dragojevi\'c, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 401?414. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevi\'c.
[20]
Nandita Dukkipati, Tiziana Refice, Yuchung Cheng, Jerry Chu, Tom Herbert, Amit Agarwal, Arvind Jain, and Natalia Sutin. 2010. An Argument for Increasing TCP?s Initial Congestion Window. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 26?33. https://doi.org/10.1145/1823844.1823848.
[21]
Alexander Duyck. 2016. Segmentation Offloads. https://www.kernel.org/doc/html/latest/networking/segmentation-offloads.html. Accessed: 2020-03-24.
[22]
Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In USENIX Annual Technical Conference (ATC). 345?362. https://www.usenix.org/conference/atc19/presentation/eran.
[23]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 51?66. https://www.usenix.org/conference/nsdi18/presentation/firestone.
[24]
Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux Journal 2004, 124 (Aug 2004), 5. http://dl.acm.org/citation.cfm?id=1012889.1012894.
[25]
Linux Foundation. 2016. Why Linux engineers currently feel that TOE has little merit. https://wiki.linuxfoundation.org/networking/toe. Accessed: 2018-11-06.
[26]
Eitan Frachtenberg. 2012. Holistic datacenter design in the open compute project. Computer 45, 7 (2012), 83?85. https://doi.ieeecomputersociety.org/10.1109/MC.2012.235.
[27]
H. Franke, J. Xenidis, C. Basso, B. M. Bass, S. S. Woodward, J. D. Brown, and C. L. Johnson. 2010. Introduction to the wire-speed processor and architecture. IBM Journal of Research and Development 54, 1 (2010), 3:1?3:11. https://doi.org/10.1147/JRD.2009.2036980.
[28]
Steve French. 2007. CIFS VFS - Advanced Common Internet File System for Linux. https://linux-cifs.samba.org/. Accessed: 2020-03-24.
[29]
Drew Gallatin. 2020. Netflix view on TOE. Private email communication with a Netflix engineer; quote approved by Netflix and used with permission.
[30]
Will Glozer. 2012. Wrk - a HTTP benchmarking tool. https://github.com/wg/wrk.git. Accessed: 2019-08-06.
[31]
Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. 2017. APUNet: Revitalizing GPU as Packet Processing Accelerator. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 83?96. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/go.
[32]
Google. 2015. gRPC: a high-performance, open source universal RPC framework. https://grpc.io/. Accessed: 2020-03-05.
[33]
Vinodh Gopal, J Guilford, E Ozturk, G Wolrich, W Feghali, J Dixon, and D Karakoyunlu. 2011. Fast CRC computation for iSCSI Polynomial using CRC32 instruction. Intel Corporation (2011). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf.
[34]
Vinodh Gopal, Sean Gulley, Wajdi Feghali, Dan Zimmerman, and Ilya Albrekht. 2015. Improving openssl performance. Technical Report. Intel Corporation. https://software.intel.com/en-us/articles/improving-openssl-performance.
[35]
Sagi Grimberg. 2018. TCP transport binding for NVMe over Fabrics. https://lwn.net/Articles/772556/. Accessed: 2020-03-24.
[36]
Shay Gueron. 2010. Intel advanced encryption standard instructions (AES-NI). Intel White Paper (2010). https://www.intel.com/content/dam/doc/white-paper/advanced-encryption-standard-new-instructions-set-paper.pdf.
[37]
Sean Gulley, Vinodh Gopal, Kirk Yap, Wajdi Feghali, J Guilford, and Gil Wolrich. 2013. Intel sha extensions. Intel White Paper (2013). https://software.intel.com/content/dam/develop/external/us/en/documents/intel-sha-extensions-white-paper-402097.pdf.
[38]
Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2010. PacketShader: A GPU-Accelerated Software Router. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 195?206. https://doi.org/10.1145/1851182.1851207.
[39]
Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In USENIX Symposium on Operating System Design and Implementation (OSDI). USENIX, Hollywood, CA, 135?148. https://www.usenix.org/conference/osdi12/technical-sessions/presentation/han.
[40]
Xiaokang Hu, Changzheng Wei, Jian Li, Brian Will, Ping Yu, Lu Gong, and Haibing Guan. 2019. QTLS: High-performance TLS Asynchronous Offload Framework with Intel QuickAssist Technology. In ACM Symposium on Principals and Practice of Parallel Programming (PPoPP). 158?172. http://doi.acm.org/10.1145/3293883.3295705.
[41]
Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. 2020. TCP $\approx$ RDMA: CPU-efficient Remote Storage Access with i10. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 127?140. https://www.usenix.org/conference/nsdi20/presentation/hwang.
[42]
Burstein Idan. 2019. Enabling Remote Persistent Memory. https://www.snia.org/educational-library/enabling-remote-persistent-memory-2019. Persistent memory summit. Accessed: 2020-05-23.
[43]
Lesokhin Ilya, Pismenny Boris, and Yehezkel Aviad. 2017. tls: Add generic NIC offload infrastructure. https://lwn.net/Articles/738847/. Accessed: 2019-08-27.
[44]
Intel. 2015. Intel QuickAssist Adapter 8950 Product Brief. https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/quickassist-adapter-8950-brief.pdf. Accessed: 2018-12-13.
[45]
Intel. 2019. Accelerating Redis with Intel DC persistent memory. https://ci.spdk.io/download/2019-summit-prc/02_Presentation_13_Accelerating_Redis_with_Intel_Optane_DC_Persistent_Memory_Dennis.pdf. Accessed: 2019-08-06.
[46]
Intel Corporation. 2010. DPDK: Data Plane Development Kit. http://dpdk.org. (Accessed: May 2016).
[47]
Zsolt Istv\'an, David Sidler, and Gustavo Alonso. 2017. Caribou: Intelligent Distributed Storage. Proceedings of the VLDB Endowment (2017), 1202?1213. https://doi.org/10.14778/3137628.3137632.
[48]
Zsolt Istv\'an, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 425?438. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/istvan.
[49]
J. Iyengar and M. Thomson. 2020. QUIC: A UDP-Based Multiplexed and Secure Transport. RFC Draft. Internet Engineering Task Force. 206 pages. https://tools.ietf.org/html/draft-ietf-quic-transport-34.
[50]
A. Choudhury J. Salowey and D. McGrew. 2008. AES Galois Counter Mode (GCM) Cipher Suites for TLS. RFC. Internet Engineering Task Force. 8 pages. https://tools.ietf.org/html/rfc5288.
[51]
EunYoung Jeong, Shinae Wood, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 489?502. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/jeong.
[52]
Rick A. Jones. 2009. MongoDB: The database for modern applications. https://www.mongodb.com/. Accessed: August, 2020.
[53]
JSOF research lab. 2019. Ripple20: 19 Zero-Day Vulnerabilities Amplified by the Supply Chain. https://www.jsof-tech.com/ripple20/. Accessed: 2020-08-07.
[54]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In USENIX Symposium on Operating System Design and Implementation (OSDI). 185?201. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/kalia.
[55]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a Warehouse-Scale Computer. In ACM International Symposium on Computer Architecture (ISCA). 158??169. https://doi.org/10.1145/2872887.2750392.
[56]
Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 67?81. http://dx.doi.org/10.1145/2872362.2872367.
[57]
Antoine Kaufmann, Tim Stamler, Simon Peter, Naveen Kr. Sharma, Arvind Krishnamurthy, and Thomas Anderson. 2019. TAS: TCP Acceleration as an OS Service. In ACM Eurosys (Dresden, Germany). 1?16. https://doi.org/10.1145/3302424.3303985.
[58]
Franziskus Kiefer. 2017. Improving AES-GCM Performance. https://blog.mozilla.org/security/2017/09/29/improving-aes-gcm-performance/. Mozilla Security Blog. Accessed: 2020-03-05.
[59]
Daehyeok Kim, Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan. 2018. Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 297?312. https://doi.org/10.1145/3230543.3230572.
[60]
E Kohler, M Handley, and S Floyd. 2006. Datagram Congestion Control Protocol (DCCP). RFC. Internet Engineering Task Force. 129 pages. https://rfc-editor.org/rfc/rfc4340.txt.
[61]
C Krasic, M. Bishop, and Ed. A. Frindell. 2020. QPACK: Header Compression for HTTP/3. RFC Draft. Internet Engineering Task Force. 49 pages. https://tools.ietf.org/html/draft-ietf-quic-qpack-20.
[62]
Vlad Krasnov. 2016. It takes two to ChaCha (Poly). https://blog.cloudflare.com/it-takes-two-to-chacha-poly/. The Cloudflare Blog. Accessed: 2020-03-05.
[63]
Vlad Krasnov. 2017. How "expensive" is crypto anyway? https://blog.cloudflare.com/how-expensive-is-crypto-anyway/. The Cloudflare Blog. Accessed: 2020-03-05.
[64]
Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. 2016. Paravirtual remote i/o. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 49?65. http://dx.doi.org/10.1145/2872362.2872378.
[65]
Redis Labs. 2013. Memtier benchmark. https://github.com/RedisLabs/memtier_benchmark. Accessed: 2029-03-05.
[66]
Ilya Lesokhin, Haggai Eran, Shachar Raindel, Guy Shapiro, Sagi Grimberg, Liran Liss, Muli Ben-Yehuda, Nadav Amit, and Dan Tsafrir. 2017. Page fault support for network controllers. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 449?466. https://doi.org/10.1145/3037697.3037710.
[67]
Huaicheng Li, Mingzhe Hao, Stanko Novakovic, Vaibhav Gogte, Sriram Govindan, Dan RK Ports, Irene Zhang, Ricardo Bianchini, Haryadi S Gunawi, and Anirudh Badam. 2020. LeapIO: Efficient and Portable Virtual NVMe Storage on ARM SoCs. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 591?605. https://doi.org/10.1145/3373376.3378531.
[68]
ARM Limited. 2011. ARMv8 Instruction Set Overview. Technical Report. ARM. https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf.
[69]
Linux and TOE 2005. Linux and TCP offload engines. https://lwn.net/Articles/148697/. Accessed: 2018-11-06.
[70]
Ilias Marinos, Robert N.M. Watson, and Mark Handley. 2014. Network Stack Specialization for Performance. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 175?186. http://doi.acm.org/10.1145/2619239.2626311.
[71]
Ilias Marinos, Robert N.M. Watson, Mark Handley, and Randall R. Stewart. 2017. Disk|Crypt|Net: Rethinking the Stack for High-performance Video Streaming. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 211?224. https://doi.org/10.1145/3098822.3098844.
[72]
Iskra Mark, Dibbiny Majd, and Tragler Anita. 2019. Deployable OVS hardware offloading for 5G telco clouds. https://www.openvswitch.org/support/ovscon2019/day2/1125-dibbiny-tragler-iskra-shern-efraim.pdf. Accessed: 2020-05-23.
[73]
W. Pepper Marts, Matthew G. F. Dosanjh, Whit Schonbein, Ryan E. Grant, and Patrick G. Bridges. 2019. MPI Tag Matching Performance on ConnectX and ARM. In Proceedings of the 26th European MPI Users? Group Meeting (EuroMPI ?19). Article 13, 10 pages. https://doi.org/10.1145/3343211.3343224.
[74]
Mellanox. 2011. ConnectX®-3 Pro Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-3_Pro_Card_EN.pdf. Accessed: 2019-08-06.
[75]
Mellanox. 2013. Messaging Accelerator (VMA). https://www.mellanox.com/products/software/accelerator-software/vma. Accessed: 2020-02-05.
[76]
Mellanox. 2014. ConnectX®-4 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-4_EN_Card.pdf. Accessed: 2019-08-06.
[77]
Mellanox. 2017. ConnectX®-5 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-5_EN_Card.pdf. Accessed: 2019-08-06.
[78]
Mellanox. 2018. ConnectX®-6 En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_EN_Card.pdf. Accessed: 2019-08-06.
[79]
Mellanox. 2020. ConnectX®-6 Dx En Card Product Brief. https://www.mellanox.com/sites/default/files/related-docs/prod_adapter_cards/PB_ConnectX-6_Dx_EN_Card.pdf. Accessed: 2020-07-06.
[80]
Mellanox. 2020. Mellanox company timeline. https://www.mellanox.com/company/timeline. Accessed: 2019-08-06.
[81]
Mellanox. 2020. Mellanox NIC pricing list effective March 2020. https://store.mellanox.com/. Accessed: 2020-03-24.
[82]
Microsoft 2017. Introduction to Receive Side Scaling. https://docs.microsoft.com/en-us/windows-hardware/drivers/network/introduction-to-receive-side-scaling. Accessed: January 2020.
[83]
Microsoft 2017. Overview of Receive Segment Coalescing. https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-receive-segment-coalescing. Accessed: January 2020.
[84]
Microsoft. 2017. Why Are We Deprecating Network Performance Features (KB4014193)? https://techcommunity.microsoft.com/t5/Core-Infrastructure-and-Security/Why-Are-We-Deprecating-Network-Performance-Features-KB4014193/ba-p/259053. Accessed: 2019-08-30.
[85]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 313?326. https://doi.org/10.1145/3230543.3230557.
[86]
Jeffrey C Mogul. 2003. TCP Offload Is a Dumb Idea Whose Time Has Come. In USENIX Workshop on Hot Topics in Operating Systems (HotOS). 25?30. https://www.usenix.org/conference/hotos-ix/tcp-offload-dumb-idea-whose-time-has-come.
[87]
YoungGyoun Moon, SeungEon Lee, Muhammad Asim Jamshed, and KyoungSoo Park. 2020. AccelTCP: Accelerating Network Applications with Stateful TCP Offloading. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 77?92. https://www.usenix.org/conference/nsdi20/presentation/moon.
[88]
Energy Sciences Network. 2012. NIC Tuning - Chelsio 10Gig NIC, Linux and FreeBSD. https://fasterdata.es.net/host-tuning/nic-tuning/. Accessed: 2020-03-24.
[89]
TechTarget Network. 2013. TCP offload's promises and limitations for enterprise networks. https://searchdatacenter.techtarget.com/tip/TCP-offloads-promises-and-limitations-for-enterprise-networks. Accessed: 2020-03-24.
[90]
Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Boris Pismenny, Liran Liss, Michael Wei, Dan Tsafrir, and Marcos Aguilera. 2019. Storm: A Fast Transactional Dataplane for Remote Data Structures. In ACM International Systems and Storage Conference (SYSTOR). Association for Computing Machinery, New York, NY, USA, 97??108. https://doi.org/10.1145/3319647.3325827.
[91]
NVM Express Workgroup. 2014. NVM Express (NVMe) Specification ? Revision 1.2. http://www.nvmexpress.org/wp-content/uploads/NVM-Express-1_2-Gold-20141209.pdf. Accessed: Jan 2015.
[92]
NVM Express Workgroup. 2018. NVMe/TCP Transport Binding specification. https://nvmexpress.org/wp-content/uploads/NVM-Express-over-Fabrics-1.0-Ratified-TPs.zip. Accessed: Jan 2020.
[93]
Tzahi Oved. 2018. T10-DIF offload. https://www.openfabrics.org/images/2018workshop/presentations/307_TOved_T10-DIFOffload.pdf. OpenFabrics alliance workshop. Accessed: 2020-05-23.
[94]
Vern Paxson. 1997. End-to-end Internet packet dynamics. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 139?152. https://doi.org/10.1145/263109.263155.
[95]
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. In USENIX Symposium on Operating System Design and Implementation (OSDI). 1?16. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-peter_simon.pdf.
[96]
Boris Pismenny. 2018. Kernel TLS Receive Side. https://github.com/openssl/openssl/pull/7848. Accessed: 2019-08-27.
[97]
Boris Pismenny. 2018. Kernel TLS socket API. https://github.com/openssl/openssl/pull/5253. Accessed: 2019-08-27.
[98]
Boris Pismenny. 2019. KTLS Sendfile. https://github.com/openssl/openssl/pull/8727. Accessed: 2019-08-27.
[99]
Steven Pope and David Riddoch. 2007. 10Gb/s Ethernet Performance and Retrospective. SIGCOMM Comput. Commun. Rev. 37, 2 (March 2007), 89?92. issn:0146-4833 https://doi.org/10.1145/1232919.1232930 https://doi.org/10.1145/1232919.1232930.
[100]
Steve Pope and David Riddoch. 2011. Introduction to OpenOnload. Technical Report. Solarflare Communication. https://www.openonload.org/.
[101]
Ali Raza, Parul Sohal, James Cadden, Jonathan Appavoo, Ulrich Drepper, Richard Jones, Orran Krieger, Renato Mancuso, and Larry Woodman. 2019. Unikernels: The Next Stage of Linux?s Dominance. In USENIX Workshop on Hot Topics in Operating Systems (HotOS). 7??13. https://doi.org/10.1145/3317550.3321445.
[102]
RedHat. 2019. SegmentSmack and FragmentSmack: IP fragments and TCP segments with random offsets may cause a remote denial of service. https://access.redhat.com/articles/3553061. Accessed: 2020-08-07.
[103]
Redis 2011. Redis Labs. https://redislabs.com. (Accessed: May 2020.).
[104]
Will Reese. 2008. Nginx: The High-performance Web Server and Reverse Proxy. http://dl.acm.org/citation.cfm?id=1412202.1412204. Linux J. 2008, 173, Article 2 (Sept. 2008). issn:1075-3583
[105]
Philipp Reisner. 2009. The Distributed Replicated Block Device (DRBD) driver. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b411b3637fa71fce9cf2acf0639009500f5892fe. Accessed: 2020-03-24.
[106]
E. Rescorla. 2000. HTTP Over TLS. RFC 2818. Internet Engineering Task Force. 7 pages. http://www.rfc-editor.org/rfc/rfc2818.txt.
[107]
Eric Rescorla. 2018. The transport layer security (TLS) protocol version 1.3. RFC. Internet Engineering Task Force. https://rfc-editor.org/rfc/rfc8446.txt.
[108]
Eric Rescorla and Nagendra Modadugu. 2012. Datagram Transport Layer Security Version 1.2. RFC. Internet Engineering Task Force. https://rfc-editor.org/rfc/rfc6347.txt.
[109]
Luigi Rizzo. 2012. Netmap: A Novel Framework for Fast Packet I/O. In USENIX Annual Technical Conference (ATC). https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzo.
[110]
RocksDB 2012. RocksDB: A persistent key-value store. https://rocksdb.org. (Accessed: May 2020.).
[111]
J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, and E. Zeidner. 2004. Internet Small Computer Systems Interface (iSCSI). RFC 3720. Internet Engineering Task Force. http://www.rfc-editor.org/rfc/rfc3720.txt.
[112]
Manish Shah, Robert Golla, Gregory Grohoski, Paul Jordan, Jama Barreh, Jeffrey Brooks, Mark Greenberg, Gideon Levinsky, Mark Luttrell, Christopher Olson, et al\mbox. 2012. Sparc T4: A dynamically threaded server-on-a-chip. IEEE/ACM International Symposium on Microarchitecture (MICRO) 32, 2 (2012), 8?19. https://doi.org/10.1109/MM.2012.1.
[113]
D. Sheinwald, J. Satran, P. Thaler, and V. Cavanna. 2002. Internet Protocol Small Computer System Interface (iSCSI) Cyclic Redundancy Check (CRC)/Checksum Considerations. RFC 3385. Internet Engineering Task Force. 23 pages. http://www.rfc-editor.org/rfc/rfc3385.txt.
[114]
Kevin Shu. 2018. Optimize Redis with NextGen NVM. https://www.snia.org/sites/default/files/SDC/2018/presentations/PM/Shu_Kevin_Optimize_Redis_with_NextGen_NVM.pdf. Intel. Accessed: 2019-08-06.
[115]
Arjun Singhvi, Aditya Akella, Dan Gibson, Thomas F. Wenisch, Monica Wong-Chan, Sean Clark, Milo M. K. Martin, Moray McLaren, Prashant Chandra, Rob Cauble, Hassan M. G. Wassel, Behnam Montazeri, Simon L. Sabato, Joel Scherpelz, and Amin Vahdat. 2020. 1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters. In ACM SIGCOMM Conference on Applications Technologies Architecture and Protocols for Computer Communications. 708??721. https://doi.org/10.1145/3387514.3405897.
[116]
Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. 2007. Thrift: Scalable cross-language services implementation. Facebook White Paper 5, 8 (2007). https://thrift.apache.org/static/files/thrift-20070401.pdf.
[117]
Igor Smolyar, Alex Markuze, Boris Pismenny, Haggai Eran, Gerd Zellweger, Austin Bolen, Liran Liss, Adam Morrison, and Dan Tsafrir. 2020. IOctopus: Outsmarting Nonuniform DMA. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 101?115. https://doi.org/10.1145/3373376.3378509.
[118]
R. J. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, and M. Kalla. 2000. Stream Control Transmission Protocol. RFC 2960. Internet Engineering Task Force. 134 pages. http://www.rfc-editor.org/rfc/rfc2960.txt.
[119]
Jeff Stuecheli. 2013. POWER8. In Hot Chips. 1?20. https://doi.org/10.1109/HOTCHIPS.2013.7478303.
[120]
Nick Sullivan. 2015. Do the ChaCha: better mobile performance with cryptography. https://blog.cloudflare.com/do-the-chacha-better-mobile-performance-with-cryptography/. The Cloudflare Blog. Accessed: 2020-03-05.
[121]
Nick Sullivan. 2016. Padding oracles and the decline of CBC-mode cipher suites. https://blog.cloudflare.com/padding-oracles-and-the-decline-of-cbc-mode-ciphersuites/. The Cloudflare Blog. Accessed: 2020-03-05.
[122]
Veritas Support. 2015. How to Disable TCP Chimney, TCP/IP Offload Engine and/or TCP Segmentation Offload. https://www.veritas.com/content/support/en_US/article.100031033. Accessed: 2020-03-24.
[123]
Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, and Kevin Gibbs. 2005. Iperf: The tcp/udp bandwidth measurement tool. dast. nlanr. net/Projects (2005), 38. https://iperf.fr/.
[124]
Herbert Tom and de Bruijn Willem. 2011. Scaling in the Linux Networking Stack. https://www.kernel.org/doc/Documentation/networking/scaling.txt. Accessed: 2020-03-05.
[125]
WolfSSL. 2016. WolfSSL/Wolfcrypt async with Intel QuickAssist. https://www.wolfssl.com/docs/intel-quickassist/. Accessed: 2020-02-05.
[126]
WolfSSL. 2018. WolfSSL ARMv8 support. https://www.wolfssl.com/wolfssl-on-armv8-lemaker-2/. Accessed: 2020-04-05.

Cited By

View all
  • (2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
  • (2024)CyberStarProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692006(227-246)Online publication date: 10-Jul-2024
  • (2024)Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AISC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00109(1-17)Online publication date: 17-Nov-2024
  • Show More Cited By

Index Terms

  1. Autonomous NIC offloads

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2021
    1090 pages
    ISBN:9781450383172
    DOI:10.1145/3445814
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 April 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    • Distinguished Paper

    Author Tags

    1. NIC
    2. hardware/software co-design
    3. operating system

    Qualifiers

    • Article

    Funding Sources

    Conference

    ASPLOS '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)236
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
    • (2024)CyberStarProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692006(227-246)Online publication date: 10-Jul-2024
    • (2024)Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AISC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00109(1-17)Online publication date: 17-Nov-2024
    • (2024)HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00062(785-800)Online publication date: 29-Jun-2024
    • (2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
    • (2023)On the (dis)Advantages of Programmable NICs for Network Security Services2023 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking57963.2023.10186433(1-9)Online publication date: 12-Jun-2023
    • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
    • (2023)Accelerating PUF-based Authentication Protocols Using Programmable SwitchNOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium10.1109/NOMS56928.2023.10154275(1-10)Online publication date: 8-May-2023
    • (2022)Autonomous Mutual Authentication Protocol in the Edge NetworksSensors10.3390/s2219763222:19(7632)Online publication date: 8-Oct-2022
    • (2022)Implementing ChaCha based crypto primitives on programmable SmartNICsProceedings of the ACM SIGCOMM Workshop on Formal Foundations and Security of Programmable Network Infrastructures10.1145/3528082.3544833(15-23)Online publication date: 22-Aug-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media