ABSTRACT
Transport Layer Security (TLS) has become a key building block for private network communication in modern Internet. While recent advancement of CPU has substantially improved the data encryption performance, TLS key exchange still remains the bottleneck for short-lived transactions. Dedicated hardware crypto accelerators promise good performance, but they often require invasive modification of the application due to its inherent architecture of asynchronous processing.
In this paper, we explore a potential for offloading TLS handshake to network interface cards (NICs) with a hardware crypto accelerator. We envision a split TLS processing architecture for TCP that handles TCP connection setup and TLS handshake on NIC while carrying out the remaining operations in the CPU-based host stack. We present our rationale for the design and discuss a set of challenges towards our goal. Our proof-of-concept implementation on existing SmartNIC shows a promising result as it brings 5.9x throughput improvement than that of a single CPU core.
- Alexa Top 1 Million Analysis. https://scotthelme.co.uk/alexa-top-1-million-analysis-february-2019/. Accessed: 2020-05-08.Google Scholar
- Broadcom Stringray™ SmartNIC. https://www.broadcom.com/products/ethernet-connectivity/smartnic. Accessed: 2020-05-08.Google Scholar
- Chelsio T6 ASIC Architecture. https://www.chelsio.com/terminator-6-asic/. Accessed: 2020-05-08.Google Scholar
- Data Plane Development Kit. https://www.dpdk.org. Accessed: 2020-05-08.Google Scholar
- Fortinet Threat Report. https://www.fortinet.com/content/dam/fortinet/assets/threat-reports/threat-report-q3-2018.pdf. Accessed: 2020-05-08.Google Scholar
- Google Meet. https://meet.google.com/. Accessed: 2020-05-08.Google Scholar
- Google Transparency Report. https://transparencyreport.google.com/https/overview. Accessed: 2020-05-08.Google Scholar
- IETF RFC 7252. https://tools.ietf.org/html/rfc7252. Accessed: 2020-05-08.Google Scholar
- IETF RFC 7540. https://tools.ietf.org/html/rfc7540. Accessed: 2020-05-08.Google Scholar
- Intel QuickAssist Adaptor 8960/8970. https://www.marvell.com/products/security-solutions/nitrox-security-processors/nitrox-v.html. Accessed: 2020-05-08.Google Scholar
- Intel QuickAssist Technology. https://01.org/intel-quickassist-technology. Accessed: 2020-05-08.Google Scholar
- Mellanox ASAP2. https://www.mellanox.com/related-docs/products/SB_asap2.pdf. Accessed: 2020-05-08.Google Scholar
- Mellanox BlueField®-2. https://www.mellanox.com/products/bluefield2-overview. Accessed: 2020-05-08.Google Scholar
- Mellanox BlueField™ SmartNIC. http://www.mellanox.com/related-docs/prod_adapter_cards/PB_BlueField_Smart_NIC.pdf. Accessed: 2020-05-08.Google Scholar
- Netronome Agilio LX SmartNICs. https://www.netronome.com/products/agilio-lx/. Accessed: 2020-05-08.Google Scholar
- nginx High performance Load Balancer, Web Server. https://www.nginx.com/. Accessed: 2020-05-08.Google Scholar
- Nitrox Security Processors. https://www.marvell.com/products/security-solutions/nitrox-security-processors.html. Accessed: 2020-05-08.Google Scholar
- Nitrox V Processsors. https://www.marvell.com/products/security-solutions/nitrox-security-processors/nitrox-v.html. Accessed: 2020-05-08.Google Scholar
- Skype. https://skype.com. Accessed: 2020-05-08.Google Scholar
- Zoom. https://zoom.us. Accessed: 2020-05-08.Google Scholar
- Elaine Barker. 2019. Recommendation for Key Management: Part 1 – General. Technical Report. NIST.Google Scholar
- P. Druschel C. Coarfa and D. S. Wallach.2002. Performance Analysis of TLS Web Servers. In Proceedings of Network and Distributed System Security Symposium (NDSS).Google Scholar
- J. Viega D. McGrew. 2004. The Galois/Counter Mode of Operation (GCM). In NIST Modes of Operation Process.Google Scholar
- S. Gueron. 2010. Intel Advanced Encryption Standard (AES) New Instructions Set. Technical Report. Intel Corporation.Google Scholar
- Xiaokang Hu, Changzheng Wei, Jian Li, Brian Will, Ping Yu, Lu Gong, and Haibing Guan. 2019. QTLS: High-Performance TLS Asynchronous Offload Framework with Intel® QuickAssist Technology. In Proceedings of Principles and Practice of Parallel Programming (PPoPP).Google ScholarDigital Library
- Takashi Isobe, Satoshi Tsutsumi, Koichiro Seto, Kenji Aoshima, , and Kazutoshi Kariya. 2010. 10 Gbps Implementation of TLS/SSL Accel-erator on FPGA. In Proceedings of the 18th International Workshop onQuality of Service (IWQoS).Google Scholar
- EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’14).Google Scholar
- S. Han S. Moon K. Jang, S. Han and K. Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors. In In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Zia-Uddin-Ahamed Khan and Mohammed Benaissa. 2015. Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 62-II, 11(2015), 1078–1082.Google ScholarCross Ref
- A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, and J. Iyengar. 2017. The QUIC Transport Protocol: Design and Internet-Scale Deployment. In Proceedings of the Conference of the ACM Special Interest Group on Data (SIGCOMM). 183–196.Google Scholar
- YoungGyoun Moon, SeungEon Lee, Muhammad Asim Jamshed, and KyoungSoo Park. 2020. AccelTCP: Accelerating Network Applications with Stateful TCP Offloading. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Mostafa I Soliman and Ghada Y Abozaid. 2011. FPGA Implementation and Performance Evaluation of a High Throughput Crypto Coprocessor. Journal of Parallel and Distributed Computing (JPDC) 71, 8 (2011), 1075–1084.Google ScholarDigital Library
- Brian Will, Andrea Grandi, and Nicolas Salhuana. 2017. Intel® QuickAssist Technology & OpenSSL-1.1.0: Performance. Technical Report. Intel.Google Scholar
Recommendations
SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud
SIGCOMM '20: Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communicationMultiple vendors have recently released SmartNICs that provide both special-purpose accelerators and programmable processing cores that allow increasingly sophisticated packet processing tasks to be offloaded from general-purpose CPUs. Indeed, leading ...
Accelerating Flow Processing Middleboxes with Programmable NICs
APSys '18: Proceedings of the 9th Asia-Pacific Workshop on SystemsSoftware network functions are increasingly popular as they promise operational flexibility unconstrained by physical limitations. However, meeting the stringent requirements of high throughput and low latency in modern networks is often challenging on ...
The Case for a Network Fast Path to the CPU
HotNets '19: Proceedings of the 18th ACM Workshop on Hot Topics in NetworksFor the past two decades, the communication channel between the NIC and CPU has largely remained the same---issuing memory requests across a slow PCIe peripheral interconnect. Today, with application service times and network fabric delays measuring ...
Comments