Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Tang, Lu; Yan, JinLi; Sun, ZhiGang; Li, Tao; Zhang, MinXuan

doi:10.1007/s11432-015-5484-6

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

面向商用多核分组处理的研究现状与未来发展方向

Research Paper
Special Focus on Future Internet Architecture and Protocol
Published: 18 November 2015

Volume 58, pages 1–16, (2015)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Lu Tang¹,
JinLi Yan¹,
ZhiGang Sun¹,
Tao Li¹ &
…
MinXuan Zhang¹

200 Accesses
5 Citations
Explore all metrics

Abstract

The demands of programmability have become more and more exigent as novel network services appear, such as E-commerce, social softwares, and online videos. Commodity multi-core CPUs have been widely applied in network packet processing to get high programmability and reduce the time-to-market. However, there is a great gap between the packet processing performance of commodity multi-core and that of the traditional packet processing hardware, e.g., NP (Network Process). Recently, optimization of the packet processing performance of commodity multi-cores has become a hot topic in industry and academia. In this paper, based on a detailed analysis of the packet processing procedure, firstly we identify two dominating overheads, namely the virtual-to-physical address translation and the packet buffer management. Secondly, we make a comprehensive survey on the current optimization methods. Thirdly, based on the survey, the heterogeneous architecture of the commodity multi-core + FPGA is proposed as a promising way to improve the packet processing performance. Fourthly, a novel Self-Described Buffer (SDB) management technology is introduced to eliminate the overheads of the allocation and deallocation of the packet buffers offloaded to FPGA. Then, an evaluation testbed, named PIOT (Packet I/O Testbed), is designed and implemented to evaluate the packet forwarding performance. I/O capacity of different commodity multi-core CPUs and the performance of optimization methods are assessed and compared based on PIOT. At last, the future work of packet processing optimization on multi-core CPUs is discussed.

摘要

创新点

随着电子商务、即时通讯、在线视频以及微博等新型网络业务应用和新协议的层出不穷, 网络设备不但应具有高速业务处理能力, 还应具有高度可编程性和适应性. 近年来, 随着微电子和多核多线程等技术的发展和成熟, 通用多核处理器的多线程聚合处理能力不断提升, 通用多核处理器与专用网络处理芯片的差距正逐渐缩小, 基于通用多核处理器的网络处理平台正逐渐学术界和业界的研究热点. 但通用多核处理器进行分组处理存在分组 IO 开销大和分组访存性能低的问题. 针对上述问题, 我们对现有通用多核分组处理进行了详细分析, 并对现有优化技术进行总结. 还设计了评测通用多核平台分组处理能力的 PIOP (Packet IO Performance) 测试工具并基于该工具对通用多核平台的分组读写能力、优化机制的分组 IO 能力、访存能力以及计算能力进行评测分析. 我们希望通过分析与测试的方式总结分析现有通用多核分组处理所面临的问题, 并对未来通用多核分组处理设计进行指导.

(1)
通过对多核分组转发处理中基于页表的虚实地址转换机制、基于 slab 的 skb 管理机制等进行的剖析. 指出传统 CPU 和 OS 设计中面向资源优化使用, 支持多种应用的设计原则是导致分组处理性能降低的重要原因;
(2)
对当前多核平台分组处理优化机制进行了综述, 指出多核 CPU+ 外置协处理器是进一步提高分组处理性能的主要手段, 同时缺少有效的评价方法和评价工具也是限制研究进一步开展的问题;
(3)
面向通用多核分组优化的评估, 基于可编程的 PCIe 网卡设计了评估工具 PIOP, 该工具可以测试分组处理平台裸读写性能, 能有效支持对各种分组处理优化手段的评估. 给出了基于 PIOP 对 DPDK 对 NPDK 的初步评测结果.
(4)
介绍了我们基于 SDB 和 NPDK 的思想, 基于多核 CPU+FPGA 的可编程分组处理模块的设计, 包括 SDB 的软硬件实现方案和测试结果.
(5)
对基于多核平台进行分组处理优化的未来方向进行了讨论, 包括可编程性探讨以及 PCIE、 DDR 接口的讨论等.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Article 14 April 2024

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Article 08 July 2023

FPGA-based hardware accelerator for SIC in uplink NOMA networks

Article 17 April 2024

References

Han S J, Jang K, Park K S, et al. PacketShader: a GPU-accelerated software router. In: Proceedings of the ACM SIGCOMM 2010 Conference. New York: ACM, 2010. 195–206
Chapter Google Scholar
Garcia-Dorado J L, Mata F, Ramos J, et al. High-performance network traffic processing systems using commodity hardware. In: Biersack E, Callegari C, Matijasevic M, eds. Data Traffic Monitoring and Analysis. Berlin/Heidelberg: Springer, 2013. 3–27
Chapter Google Scholar
Liao G D, Zhu X, Bnuyan L. A new server I/O architecture for high speed networks. In: Proceedings of IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), San Antonio, 2011. 255–265
Google Scholar
Bonelli N, Di Pietro A, Giordano S, et al. On multi-gigabit packet capturing with multi-core commodity hardware. In: Proceedings of the 13th International Conference on Passive and Active Measurement. Berlin/Heidelberg: Springer, 2012. 64–73
Chapter Google Scholar
Huggahalli R, Iyer R, Tetrick S. Direct cache access for high bandwidth network I/O. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture. Washington DC: IEEE, 2005. 50–59
Google Scholar
Rizzo L, Luca D, Alfredo C. 10 Gbit/s Line Rate Packet Processing Using Commodity Hardware: Survey and New Proposals. Luca Report, 2012
Google Scholar
Rizzo L. Netmap: a novel framework for fast packet I/O. In: Proceedings of USENIX ATC Conferences, Bellevue, 2012. 101–112
Google Scholar
Lei G, Dou Y, Wan W, et al. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications. BMC Genomics, 2012, 13: S14–S24
Article Google Scholar
Kekely L, Pus V, Benacek P, et al. Trade-offs and progressive adoption of FPGA acceleration in network traffic monitoring. In: Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, 2014. 1–4
Google Scholar
Tang L, Sun Z, Li T, et al. Demostration of self-described buffer for accelerating packet forwarding on multi-core servers. In: Proceedings of IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom), Singapore, 2014. 712–714
Google Scholar
Xia F, Dou Y, Zhou X, et al. Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA. BMC Bioinformatics, 2009, 10: S37–S50
Article Google Scholar
Sun Z G, Dai Y, Gong Z H. MPFS: a truly scalable router architecture for next generation Internet. Sci China Ser-F: Inf Sci, 2008, 51: 1761–1771
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, 410073, China
Lu Tang, JinLi Yan, ZhiGang Sun, Tao Li & MinXuan Zhang

Authors

Lu Tang
View author publications
You can also search for this author in PubMed Google Scholar
JinLi Yan
View author publications
You can also search for this author in PubMed Google Scholar
ZhiGang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
MinXuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lu Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, L., Yan, J., Sun, Z. et al. Towards high-performance packet processing on commodity multi-cores: current issues and future directions. Sci. China Inf. Sci. 58, 1–16 (2015). https://doi.org/10.1007/s11432-015-5484-6

Download citation

Received: 24 September 2015
Accepted: 02 November 2015
Published: 18 November 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11432-015-5484-6

Keywords

120103

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Abstract

摘要

创新点

Access this article

Similar content being viewed by others

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

FPGA-based hardware accelerator for SIC in uplink NOMA networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards high-performance packet processing on commodity multi-cores: current issues and future directions

Abstract

摘要

创新点

Access this article

Similar content being viewed by others

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

FPGA-based hardware accelerator for SIC in uplink NOMA networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation