ABSTRACT
eBPF (extended Berkeley Packet Filter) significantly enhances observability, performance, and security within the Linux kernel, playing a pivotal role in various real-world applications. Implemented as a register-based kernel virtual machine, eBPF features a customized Instruction Set Architecture (ISA) with stringent kernel safety requirements, e.g., a limited number of instructions. This constraint necessitates substantial optimization efforts for eBPF programs to meet performance objectives. Despite the availability of compilers supporting eBPF program compilation, existing tools often overlook key optimization opportunities, resulting in suboptimal performance. In response, this paper introduces Merlin, an optimization framework leveraging customized LLVM passes and bytecode rewriting for Instruction Representation (IR) transformation and bytecode refinement. Merlin employs two primary optimization strategies, i.e., instruction merging and strength reduction. These optimizations are deployed before eBPF verification. We evaluate Merlin across 19 XDP programs (drawn from the Linux kernel, Meta, hXDP, and Cilium) and three eBPF-based systems (Sysdig, Tetragon, and Tracee, each comprising several hundred eBPF programs). The results show that all optimized programs pass the kernel verification. Meanwhile, Merlin can reduce number of instructions by 73% and runtime overhead by 60% compared with the original programs. Merlin can also improve the throughput by 0.59% and reduce the latency by 5.31%, compared to state-of-the-art technique K2, while being 106 times faster and more scalable to larger and more complex programs without additional manual efforts.
- Bringing ebpf and cilium to google kubernetes engine | google cloud blog. https://cloud.google.com/blog/products/containers-kubernetes/bringing-ebpf-and-cilium-to-google-kubernetes-engine, 2020. Accessed: 2023-10-30.Google Scholar
- How alibaba cloud uses cilium for high-performance cloud-native networking. https://cilium.io/blog/2020/10/09/cilium-in-alibaba-cloud/, 2020. Accessed: 2023-10-30.Google Scholar
- Nvd - cve-2021-3490. https://nvd.nist.gov/vuln/detail/CVE-2021-3490, 2021. Accessed: 2023-10-30.Google Scholar
- c - bpf verifier says program exceeds 1m instruction - stack over-flow. https://stackoverflow.com/questions/70841631/bpf-verifier-says-program-exceeds-1m-instruction, 2022. Accessed: 2024-03-04.Google Scholar
- Microsoft and isovalent partner to bring next generation ebpf dataplane for cloud-native applications in azure | microsoft azure blog. https://azure.microsoft.com/en-us/blog/microsoft-and-isovalent-partner-to-bring-next-generation-ebpf-dataplane-for-cloudnative-applications-in-azure/, 2022. Accessed: 2023-10-30.Google Scholar
- Amazon guardduty eks runtime monitoring expands operating systems and processor support. https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-guardduty-eks-monitoring-systems-processor/, 2023. Accessed: 2023-11-1.Google Scholar
- Bpf design q&a --- the linux kernel documentation. https://docs.kernel.org/bpf/bpf_design_QA.html#q-what-are-the-verifier-limits, 2023. Accessed: 2024-03-04.Google Scholar
- Bpf documentation --- the linux kernel documentation. https://docs.kernel.org/bpf/, 2023. Accessed: 2023-10-30.Google Scholar
- ebpf instruction set specification, v1.0 --- the linux kernel documentation. https://docs.kernel.org/bpf/standardization/instruction-set.html, 2023. Accessed: 2023-11-06.Google Scholar
- ebpf verifier - the linux kernel documentation. https://docs.kernel.org/bpf/verifier.html, 2023. Accessed: 2023-11-06.Google Scholar
- Sysdig | security for containers, kubernetes, and cloud. https://sysdig.com/, 2023. Accessed: 2023-10-30.Google Scholar
- Tencent cloud mesh | tencent cloud. https://www.tencentcloud.com/products/tcm, 2023. Accessed: 2023-10-30.Google Scholar
- Tetragon - ebpf-based security observability and runtime enforcement. https://tetragon.io/, 2023. Accessed: 2023-10-30.Google Scholar
- Tracee - aqua. https://www.aquasec.com/products/tracee/, 2023. Accessed: 2023-10-30.Google Scholar
- Sanjit Bhat and Hovav Shacham. Formal verification of the linux kernel ebpf verifier range analysis, 2022. https://sanjit-bhat.github.io/assets/pdf/ebpf-verifier-range-analysis22.pdf.Google Scholar
- Scott Bradner and Jim McQuaid. Benchmarking methodology for network interconnect devices. Technical report, 1999.Google ScholarDigital Library
- Marco Spaziani Brunella, Giacomo Belocchi, Marco Bonola, Salvatore Pontarelli, Giuseppe Siracusano, Giuseppe Bianchi, Aniello Cammarano, Alessandro Palumbo, Luca Petrucci, and Roberto Bifulco. hxdp: Efficient software packet processing on fpga nics. Communications of the ACM, 65(8):92--100, 2022. Google ScholarDigital Library
- Luca Caviglione, Wojciech Mazurczyk, Matteo Repetto, Andreas Schaffhauser, and Marco Zuppelli. Kernel-level tracing for detecting stegomalware and covert channels in linux environments. Computer Networks, 191:108010, 2021. Google ScholarCross Ref
- Howard Chen, Jiwei Lu, Wei-Chung Hsu, and Pen-Chung Yew. Continuous adaptive object-code re-optimization framework. In Advances in Computer Systems Architecture: 9th Asia-Pacific Conference, ACSAC 2004, Beijing, China, September 7-9, 2004. Proceedings 9, pages 241--255. Springer, 2004. Google ScholarCross Ref
- Cisco. T-rex traffic generator. https://trex-tgn.cisco.com/trex/doc/trex_manual.html, 2023. Accessed: 2023-10-30.Google Scholar
- Jack W Davidson and Christopher W Fraser. Code selection through object code optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 6(4):505--526, 1984. -0925/84/1000-0505. Google Scholar
- Bjorn De Sutter, Ludo Van Put, Dominique Chanet, Bruno De Bus, and Koen De Bosschere. Link-time compaction and optimization of arm executables. ACM Trans. Embed. Comput. Syst., 6(1):5--es, feb 2007. Google ScholarDigital Library
- Advanced Micro Devices. Software Optimization Guide for the AMD Zen4 Microarchitecture. Advanced Micro Devices, Inc., Santa Clara, California, 2023. https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip.Google Scholar
- Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, et al. The design and operation of {CloudLab}. In 2019 USENIX annual technical conference (USENIX ATC 19), pages 1--14, 2019. https://www.usenix.org/conference/atc19/presentation/duplyakin.Google Scholar
- Alexis Engelke and Martin Schulz. Robust practical binary optimization at run-time using llvm. In 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar), pages 56--64, 2020. Google ScholarCross Ref
- William Findlay, Anil Somayaji, and David Barrera. Bpfbox: Simple precise process confinement with ebpf. In Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, CCSW'20, page 91--103, New York, NY, USA, 2020. Association for Computing Machinery. Google ScholarDigital Library
- Agner Fog. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for intel, amd and via cpus. Copenhagen University College of Engineering, 93:110, 2011. https://agner.org/optimize/.Google Scholar
- Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly-polyhedral optimization in llvm. In Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), volume 2011, page 1, 2011. Google ScholarDigital Library
- Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual: Volume 1. Intel Inc., Santa Clara, California, 2023. https://www.intel.com/content/www/us/en/content-details/671488/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html.Google Scholar
- Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee. Rain: Refinable attack investigation with on-demand inter-process information flow tracking. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pages 377--390, 2017. Google ScholarDigital Library
- Jinghao Jia, Raj Sahu, Adam Oswald, Dan Williams, Michael V Le, and Tianyin Xu. Kernel extension verification is untenable. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, pages 150--157, 2023. Google ScholarDigital Library
- Jinghao Jia, YiFei Zhu, Dan Williams, Andrea Arcangeli, Claudio Canella, Hubertus Franke, Tobin Feldman-Fitzthum, Dimitrios Skarlatos, Daniel Gruss, and Tianyin Xu. Programmable system call security with ebpf, 2023. https://arxiv.org/abs/2302.10366.Google Scholar
- Zachary H Jones. Performance analysis of {XDP} programs. 2021.Google Scholar
- Jeffrey Katcher. Postmark: A new file system benchmark. TR3022, 1997. https://www.filesystems.org/docs/auto-pilot/Postmark.html.Google Scholar
- Jens Knoop, Oliver Rüthing, and Bernhard Steffen. Partial dead code elimination. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI '94, page 147--158, New York, NY, USA, 1994. Association for Computing Machinery. Google ScholarDigital Library
- Hsuan-Chi Kuo, Kai-Hsun Chen, Yicheng Lu, Dan Williams, Sibin Mohan, and Tianyin Xu. Verified programs can party: Optimizing kernel extensions via post-verification merging. In Proceedings of the Seventeenth European Conference on Computer Systems, EuroSys '22, page 283--299, New York, NY, USA, 2022. Association for Computing Machinery. Google ScholarDigital Library
- Juneyoung Lee, Chung-Kil Hur, Ralf Jung, Zhengyang Liu, John Regehr, and Nuno P. Lopes. Reconciling high-level optimizations and low-level code in llvm. Proc. ACM Program. Lang., 2(OOPSLA), oct 2018. Google ScholarDigital Library
- Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas. Effective source-to-source outlining to support whole program empirical optimization. In Guang R. Gao, Lori L. Pollock, John Cavazos, and Xiaoming Li, editors, Languages and Compilers for Parallel Computing, pages 308--322, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg. Google ScholarDigital Library
- Soo Yee Lim, Bogdan Stelea, Xueyuan Han, and Thomas Pasquier. Secure namespaced kernel audit for containers. In Proceedings of the ACM Symposium on Cloud Computing, SoCC '21, page 518--532, New York, NY, USA, 2021. Association for Computing Machinery. Google ScholarDigital Library
- Edward S Lowry and Cleburne W Medlock. Object code optimization. Communications of the ACM, 12(1):13--22, 1969. Google ScholarDigital Library
- Shiqing Ma, Xiangyu Zhang, and Dongyan Xu. Protracer: Towards practical provenance tracing by alternating between logging and tainting. In 23rd Annual Network And Distributed System Security Symposium (NDSS 2016). Internet Soc, 2016. Google ScholarCross Ref
- Larry W McVoy and Carl Staelin. lmbench: Portable tools for performance analysis. In USENIX annual technical conference, pages 279--294. San Diego, CA, USA, 1996. https://lmbench.sourceforge.net/.Google ScholarDigital Library
- Sebastiano Miano, Matteo Bertrone, Fulvio Risso, Mauricio Vásquez Bernal, Yunsong Lu, and Jianwen Pi. Securing linux with a faster and scalable iptables. SIGCOMM Comput. Commun. Rev., 49(3):2--17, nov 2019. Google ScholarDigital Library
- Sebastiano Miano, Matteo Bertrone, Fulvio Risso, Massimo Tumolo, and Mauricio Vásquez Bernal. Creating complex network services with ebpf: Experience and lessons learned. In 2018 IEEE 19th International Conference on High Performance Switching and Routing (HPSR), pages 1--8, 2018. Google ScholarCross Ref
- Sebastiano Miano, Xiaoqi Chen, Ran Ben Basat, and Gianni Antichi. Fast in-kernel traffic sketching in ebpf. ACM SIGCOMM Computer Communication Review, 53(1):3--13, 2023. Google ScholarDigital Library
- Sebastiano Miano, Fulvio Risso, Mauricio Vásquez Bernal, Matteo Bertrone, and Yunsong Lu. A framework for ebpf-based network functions in an era of microservices. IEEE Transactions on Network and Service Management, 18(1):133--151, 2021. Google ScholarDigital Library
- Thomas Pasquier, Xueyuan Han, Mark Goldstein, Thomas Moyer, David Eyers, Margo Seltzer, and Jean Bacon. Practical whole-system provenance capture. In Proceedings of the 2017 Symposium on Cloud Computing, pages 405--418, 2017. Google ScholarDigital Library
- Alessandro Rivitti, Roberto Bifulco, Angelo Tulumello, Marco Bonola, and Salvatore Pontarelli. Ehdl: Turning ebpf/xdp programs into hardware designs for the nic. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 208--223, New York, NY, USA, 2023. Association for Computing Machinery. Google ScholarDigital Library
- Dominik Scholz, Daniel Raumer, Paul Emmerich, Alexander Kurtz, Krzysztof Lesiak, and Georg Carle. Performance implications of packet filtering with linux ebpf. In 2018 30th International Teletraffic Congress (ITC 30), volume 01, pages 209--217, 2018. Google ScholarCross Ref
- R Sekar, Hanke Kimm, and Rohit Aich. eaudit: A fast, scalable and deployable audit data collection system. In 2024 IEEE Symposium on Security and Privacy (SP), pages 87--87. IEEE Computer Society, 2023. http://seclab.cs.stonybrook.edu/seclab/pubs/eaudit.pdf.Google Scholar
- T. Simunic, L. Benini, G. De Micheli, and M. Hans. Source code optimization and profiling of energy consumption in embedded systems. In Proceedings 13th International Symposium on System Synthesis, pages 193--198, 2000. Google ScholarCross Ref
- David Soldani, Petrit Nahi, Hami Bour, Saber Jafarizadeh, Mohammed F. Soliman, Leonardo Di Giovanna, Francesco Monaco, Giuseppe Ognibene, and Fulvio Risso. ebpf: A new approach to cloud-native observability, networking and security for current (5g) and future mobile networks (6g and beyond). IEEE Access, 11:57174--57202, 2023. Google ScholarCross Ref
- Dave Jing Tian, Grant Hernandez, Joseph I. Choi, Vanessa Frost, Peter C. Johnson, and Kevin R. B. Butler. Lbm: A security framework for peripherals within the linux kernel. In 2019 IEEE Symposium on Security and Privacy (SP), pages 967--984, 2019. Google ScholarCross Ref
- Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, and Santosh Nagarakatte. Verifying the verifier: ebpf range analysis verification. In International Conference on Computer Aided Verification, pages 226--251. Springer, 2023. Google ScholarDigital Library
- Mark N. Wegman and F. Kenneth Zadeck. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst., 13(2):181--210, apr 1991. Google ScholarDigital Library
- Mathieu Xhonneux, Fabien Duchene, and Olivier Bonaventure. Leveraging ebpf for programmable network functions with ipv6 segment routing. In Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies, CoNEXT '18, page 67--72, New York, NY, USA, 2018. Association for Computing Machinery. Google ScholarDigital Library
- Qiongwen Xu, Michael D Wong, Tanvi Wagle, Srinivas Narayana, and Anirudh Sivaraman. Synthesizing safe and efficient kernel extensions for packet processing. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, pages 50--64, 2021. Google ScholarDigital Library
- Yuhong Zhong, Haoyu Li, Yu Jian Wu, Ioannis Zarkadas, Jeffrey Tao, Evan Mesterhazy, Michael Makris, Junfeng Yang, Amy Tai, Ryan Stutsman, and Asaf Cidon. XRP: In-Kernel storage functions with eBPF. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 375--393, Carlsbad, CA, July 2022. USENIX Association. https://www.usenix.org/conference/osdi22/presentation/zhong.Google Scholar
- Jianer Zhou, Zengxie Ma, Weijian Tu, Xinyi Qiu, Jingpu Duan, Zhenyu Li, Qing Li, Xinyi Zhang, and Weichao Li. Cable: A framework for accelerating 5g upf based on ebpf. Computer Networks, 222:109535, 2023. Google ScholarDigital Library
- Yang Zhou, Zezhou Wang, Sowmya Dharanipragada, and Minlan Yu. Electrode: Accelerating distributed protocols with eBPF. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1391--1407, Boston, MA, April 2023. USENIX Association. https://www.usenix.org/conference/nsdi23/presentation/zhou.Google Scholar
Index Terms
- Merlin: Multi-tier Optimization of eBPF Code for Performance and Compactness
Recommendations
Register Allocation for Compressed ISAs in LLVM
CC 2023: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler ConstructionWe present an adaptation to the LLVM greedy register allocator to improve code density for compressed RISC ISAs.
Many RISC architectures have extensions defining smaller encodings for common instructions, typically 16 rather than 32 bits wide. However,...
Enhancing the performance of 16-bit code using augmenting instructions
LCTES '03: Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systemsIn the embedded domain, memory usage and energy consumption are critical constraints. Dual width instruction set embedded processors such as the ARM provide a 16-bit instruction set in addition to the 32-bit instruction set to address these concerns. ...
Comments