Skip to main content
Log in

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

面向商用多核分组处理的研究现状与未来发展方向

  • Research Paper
  • Special Focus on Future Internet Architecture and Protocol
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

The demands of programmability have become more and more exigent as novel network services appear, such as E-commerce, social softwares, and online videos. Commodity multi-core CPUs have been widely applied in network packet processing to get high programmability and reduce the time-to-market. However, there is a great gap between the packet processing performance of commodity multi-core and that of the traditional packet processing hardware, e.g., NP (Network Process). Recently, optimization of the packet processing performance of commodity multi-cores has become a hot topic in industry and academia. In this paper, based on a detailed analysis of the packet processing procedure, firstly we identify two dominating overheads, namely the virtual-to-physical address translation and the packet buffer management. Secondly, we make a comprehensive survey on the current optimization methods. Thirdly, based on the survey, the heterogeneous architecture of the commodity multi-core + FPGA is proposed as a promising way to improve the packet processing performance. Fourthly, a novel Self-Described Buffer (SDB) management technology is introduced to eliminate the overheads of the allocation and deallocation of the packet buffers offloaded to FPGA. Then, an evaluation testbed, named PIOT (Packet I/O Testbed), is designed and implemented to evaluate the packet forwarding performance. I/O capacity of different commodity multi-core CPUs and the performance of optimization methods are assessed and compared based on PIOT. At last, the future work of packet processing optimization on multi-core CPUs is discussed.

摘要

创新点

随着电子商务、 即时通讯、 在线视频以及微博等新型网络业务应用和新协议的层出不穷, 网络设备不但应具有高速业务处理能力, 还应具有高度可编程性和适应性. 近年来, 随着微电子和多核多线程等技术的发展和成熟, 通用多核处理器的多线程聚合处理能力不断提升, 通用多核处理器与专用网络处理芯片的差距正逐渐缩小, 基于通用多核处理器的网络处理平台正逐渐学术界和业界的研究热点. 但通用多核处理器进行分组处理存在分组 IO 开销大和分组访存性能低的问题. 针对上述问题, 我们对现有通用多核分组处理进行了详细分析, 并对现有优化技术进行总结. 还设计了评测通用多核平台分组处理能力的 PIOP (Packet IO Performance) 测试工具并基于该工具对通用多核平台的分组读写能力、 优化机制的分组 IO 能力、 访存能力以及计算能力进行评测分析. 我们希望通过分析与测试的方式总结分析现有通用多核分组处理所面临的问题, 并对未来通用多核分组处理设计进行指导.

  1. (1)

    通过对多核分组转发处理中基于页表的虚实地址转换机制、 基于 slab 的 skb 管理机制等进行的剖析. 指出传统 CPU 和 OS 设计中面向资源优化使用, 支持多种应用的设计原则是导致分组处理性能降低的重要原因;

  2. (2)

    对当前多核平台分组处理优化机制进行了综述, 指出多核 CPU+ 外置协处理器是进一步提高分组处理性能的主要手段, 同时缺少有效的评价方法和评价工具也是限制研究进一步开展的问题;

  3. (3)

    面向通用多核分组优化的评估, 基于可编程的 PCIe 网卡设计了评估工具 PIOP, 该工具可以测试分组处理平台裸读写性能, 能有效支持对各种分组处理优化手段的评估. 给出了基于 PIOP 对 DPDK 对 NPDK 的初步评测结果.

  4. (4)

    介绍了我们基于 SDB 和 NPDK 的思想, 基于多核 CPU+FPGA 的可编程分组处理模块的设计, 包括 SDB 的软硬件实现方案和测试结果.

  5. (5)

    对基于多核平台进行分组处理优化的未来方向进行了讨论, 包括可编程性探讨以及 PCIE、 DDR 接口的讨论等.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Han S J, Jang K, Park K S, et al. PacketShader: a GPU-accelerated software router. In: Proceedings of the ACM SIGCOMM 2010 Conference. New York: ACM, 2010. 195–206

    Chapter  Google Scholar 

  2. Garcia-Dorado J L, Mata F, Ramos J, et al. High-performance network traffic processing systems using commodity hardware. In: Biersack E, Callegari C, Matijasevic M, eds. Data Traffic Monitoring and Analysis. Berlin/Heidelberg: Springer, 2013. 3–27

    Chapter  Google Scholar 

  3. Liao G D, Zhu X, Bnuyan L. A new server I/O architecture for high speed networks. In: Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), San Antonio, 2011. 255–265

    Google Scholar 

  4. Bonelli N, Di Pietro A, Giordano S, et al. On multi-gigabit packet capturing with multi-core commodity hardware. In: Proceedings of the 13th International Conference on Passive and Active Measurement. Berlin/Heidelberg: Springer, 2012. 64–73

    Chapter  Google Scholar 

  5. Huggahalli R, Iyer R, Tetrick S. Direct cache access for high bandwidth network I/O. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture. Washington DC: IEEE, 2005. 50–59

    Google Scholar 

  6. Rizzo L, Luca D, Alfredo C. 10 Gbit/s Line Rate Packet Processing Using Commodity Hardware: Survey and New Proposals. Luca Report, 2012

    Google Scholar 

  7. Rizzo L. Netmap: a novel framework for fast packet I/O. In: Proceedings of USENIX ATC Conferences, Bellevue, 2012. 101–112

    Google Scholar 

  8. Lei G, Dou Y, Wan W, et al. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications. BMC Genomics, 2012, 13: S14–S24

    Article  Google Scholar 

  9. Kekely L, Pus V, Benacek P, et al. Trade-offs and progressive adoption of FPGA acceleration in network traffic monitoring. In: Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, 2014. 1–4

    Google Scholar 

  10. Tang L, Sun Z, Li T, et al. Demostration of self-described buffer for accelerating packet forwarding on multi-core servers. In: Proceedings of IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom), Singapore, 2014. 712–714

    Google Scholar 

  11. Xia F, Dou Y, Zhou X, et al. Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA. BMC Bioinformatics, 2009, 10: S37–S50

    Article  Google Scholar 

  12. Sun Z G, Dai Y, Gong Z H. MPFS: a truly scalable router architecture for next generation Internet. Sci China Ser-F: Inf Sci, 2008, 51: 1761–1771

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, L., Yan, J., Sun, Z. et al. Towards high-performance packet processing on commodity multi-cores: current issues and future directions. Sci. China Inf. Sci. 58, 1–16 (2015). https://doi.org/10.1007/s11432-015-5484-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-015-5484-6

Keywords

Navigation