Toward low CPU usage and efficient DPDK communication in a cluster

Wu, Mingjie; Chen, Qingkui; Wang, Jingjuan

doi:10.1007/s11227-021-03942-x

Toward low CPU usage and efficient DPDK communication in a cluster

Published: 21 June 2021

Volume 78, pages 1852–1884, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Mingjie Wu¹,
Qingkui Chen¹ &
Jingjuan Wang¹

604 Accesses
5 Citations
Explore all metrics

Abstract

In recent years, DPDK (Data Plane Development Kit, a data plane development tool set provided by Intel, focusing on high-performance processing of data packets in network applications), one of the high-performance packet I/O frameworks, is widely used to improve the efficiency of data transmission in the cluster. But, the busy polling used in DPDK will not only waste a lot of CPU cycles and cause certain power consumption, but also the high CPU usage will have a great impact on the performance of other applications in the host. Although some technologies, such as DVFS (dynamic voltage and frequency scaling, which is to dynamically adjust the operating frequency and voltage of the chip according to the different needs of the computing power of the application running on the chip, so as to achieve the purpose of energy saving) and LPI (low power idle, a technology that saves power by turning off the power of certain supporting circuits when the CPU core is idle), can reduce power consumption by adjusting CPU voltage and frequency, they can also cause performance degradation in other applications. Using thread sleep technology is a promising method to reduce the CPU usage and power consumption. However, it is challenging because the appropriate thread sleep duration cannot be obtained accurately. In this paper, we propose a model that finds the optimal thread sleep duration to solve the above challenges. From the model, we can balance the thread CPU usage and transmission efficiency to obtain the optimal sleep duration called the transmission performance threshold. Experiments show that the proposed models can significantly reduce the thread CPU usage. Generally, while the communication performance is slightly reduced, the CPU utilization is reduced by about 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Article 14 April 2024

Notes

Bypass parallel communication mechanism, a parallel communication technology of multiple network cards in a cluster based on DPDK.

References

Rizzo L, Landi M (2011) Netmap: memory mapped access to network devices. SIGCOMM Comput Commun Rev 41(4):422–423. https://doi.org/10.1145/2043164.2018500
Article Google Scholar
Intel. Intel data plane development kit (DPDK). Website (2012). https://www.dpdk.org/
Nazir A, Wajahat A, Qureshi S (2019) Performance analysis of open source solution “ntop’’ for active and passive packet analysis relating to application and transport layer. Int J Adv Comput Sci Appl 10(3):20–27
Google Scholar
Barbette T, Soldani C, Mathy L (2015) Fast userspace packet processing. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 5–16
Qingqing R, Liang Z, Zhijun X, Yujun Z, Lei Z (2020) PacketUsher: exploiting DPDK to accelerate compute-intensive packet processing. Comput Commun 161:324–333. https://doi.org/10.1016/j.comcom.2020.07.040
Article Google Scholar
Huo L (2020) Packet-level-based traffic aggregation to optimize NDN content delivery. Int J Commun Syst 33(12). https://doi.org/10.1002/dac.4473
Redžović H, Smiljanić A, Savić B (2016) Performance evaluation of software routers with VPN features. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4
Vesović M, Smiljanić A, Tomašević M (2016) Speeding up IP lookup procedure in software routers by means of parallelization. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4
Emmerich P, Raumer D, Gallenmüller S, Wohlfart F, Carle G (2018) Throughput and latency of virtual switching with open vSwitch: a quantitative analysis. J Netw Syst Manage 26:314–388. https://doi.org/10.1007/s10922-017-9417-0
Article Google Scholar
Yang R, Chang X, Mišić Jelena, Mišić, Vojislav B (2020) Performance modeling of linux network system with open vSwitch. Peer-to-Peer Netw Appl 13:151–162. https://doi.org/10.1007/s12083-019-00723-5
Bradai A, Rehmani MH, Haque I, Nogueira M, Bukhari SHR (2020) Software-defined networking (SDN) and network function virtualization (NFV) for a hyperconnected world: challenges. J Netw Syst Manag Appl Major Adv 28:433–435. https://doi.org/10.1007/s10922-020-09542-z
Article Google Scholar
Hwang J, Ramakrishnan KK, Wood T (2015) NetVM: high performance and flexible networking using virtualization on commodity platforms. IEEE Trans Netw Serv Manage 12(1):34
Article Google Scholar
Ullah S, Choi J, Oh H (2020) Performance analysis and enhancements, IPsec for high speed network links. Future Gener Comput Syst 107:112–125. https://doi.org/10.1016/j.future.2020.01.049
Wu M, Chen Q, Wang J (2020) BPCM: a flexible high-speed bypass parallel communication mechanism for GPU cluster. IEEE Access 8:103256–103272
Article Google Scholar
Li X, Cheng W, Zhang T, Ren F, Yang B (2020) Towards power efficient high performance packet I/O. IEEE Trans Parallel Distrib Syst 31(4):981–996
Article Google Scholar
Benson T, Anand A, Akella A, Zhang M (2010) Understanding data center traffic characteristics. SIGCOMM Comput Commun Rev 40(1):92–99. https://doi.org/10.1145/1672308.1672325
Article Google Scholar
Feamster N, Borkenhagen J, Rexford J (2003) Guidelines for interdomain traffic engineering. SIGCOMM Comput Commun Rev 33(5):19–30. https://doi.org/10.1145/963985.963988
Article Google Scholar
Kandula S, Sengupta S, Greenberg A, Patel P, Chaiken R (2009) The nature of data center traffic: measurements and analysis. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (Association for Computing Machinery, New York, NY, USA, 2009), IMC ’09, p. 202–208. https://doi.org/10.1145/1644893.1644918
Bash C, Forman G (2007) Cool job allocation: measuring the power savings of placing jobs at cooling-efficient locations in the data center. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (USENIX Association, USA, 2007), ATC’07, pp 363–368
Kondo M, Sasaki H, Nakamura H (2007) Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS. SIGARCH Comput Archit News 35(1):31–38. https://doi.org/10.1145/1241601.1241609
Article Google Scholar
Intel. Data plane development kit power optimization on advantech* network appliance platform. Website (2015). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/dpdk-power-optimization-advantech-white-paper.pdf
Wu CM, Chang RS, Chan HY (2014) A green energyefficient scheduling algorithm using the DVFS technique for cloud datacenters, Future Gener Comput Syst 37, 141–147. https://doi.org/10.1016/j.future.2013.06.009
Gallenmüller S, Emmerich P, Wohlfart F, Raumer D, Carle G (2015) Comparison of frameworks for high-performance packet IO. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 29–38
Kahng AB, Kang S, Kumar R, Sartori J (2013) Enhancing the efficiency of energy-constrained DVFS designs. IEEE Trans Very Large Scale Integr Syst 21(10), 1769–1782
Cho SJ, Yun SH, Jeon JW (2015) A powersaving DVFS algorithm based on operational intensity for embedded systems. IEICE Electron Express 12(3):20141128. https://doi.org/10.1587/elex.12.20141128
Article Google Scholar
Borodin S, Pavlenko D (2014) Device for detection and bearing finding of radar with a low probability of intercept (LPI), by using detected signal. Nonlinear World (Russia) 12(5):28–31
Google Scholar
Schoene R, Molka D, Werner M (2015) Wake-up latencies for processor idle states on current x86 processors. Comput Sci - Res Dev 30(2):219–227. https://doi.org/10.1007/s00450-014-0270-z
Article Google Scholar
Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3):187. https://doi.org/10.1145/2043164.20185000
Article Google Scholar
Agarwal A, Hennessy J, Horowitz M (1988) Cache performance of operating system and multiprogramming workloads. ACM Trans Comput Syst 6(4):393–431. https://doi.org/10.1145/2043164.20185001
Article Google Scholar
Mogul JC, Borg A (1991) The effect of context switches on cache performance. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Association for Computing Machinery, New York, NY, USA, 1991), ASPLOS IV, pp 75–84. https://doi.org/10.1145/106972.106982
Khoshkholghi MA, Derahman MN, Abdullah A, Subramaniam S, Othman M (2017) Energy-efficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access 5:10709–10722
Article Google Scholar
Han J, Lee S (2020) Performance improvement of linux CPU scheduler using policy gradient reinforcement learning for android smartphones. IEEE Access 8:11031
Article Google Scholar
Joseph O, Michael C, Tom P (2015) Benefitting power and performance sleep loops. Website (2015). https://doi.org/10.1145/2043164.20185002
Salvador P, Pacheco A, Valadas R (2004) Modeling IP traffic: joint characterization of packet arrivals and packet sizes using BMAPs. Comput Netw 44(3):335–352. https://doi.org/10.1145/2043164.20185003
Article MATH Google Scholar
Wei G, Weifan W, Yaojun C (2020) Tight bounds for the existence of path factors in network vulnerability parameter settings. Int J Intell Syst 36(3):1133–1158. https://doi.org/10.1145/2043164.20185004
Article Google Scholar
Gao W, Veeresha P, Prakasha DG, Senel B, Baskonus HM (2020) Iterative method applied to the fractional nonlinear systems arising in thermoelasticity with Mittag Leffler kernel. Fractals-Complex Geom Patterns Scal Nat Soc 28(8):12040040. https://doi.org/10.1142/S0218348X2040040X
Wei G, Yaojun C (2020) Approximation analysis of ontology learning algorithm in linear combination setting. J Cloud Comput 9(1):29. https://doi.org/10.1145/2043164.20185005
Article Google Scholar
Hashlamon I (2020) A new adaptive extended Kalman filter for a class of nonlinear systems. J Appl Comput Mech 6(1): 1–12. https://doi.org/10.22055/jacm.2019.28130.1455
Lei X, Tian Y, Zhang Z, Wang L, Xiang X, Wang H (2019) Correction of pumping station parameters in a one-dimensional hydrodynamic model using the Ensemble Kalman filter. J Hydrol 568, 108–118 . https://doi.org/10.1016/j.jhydrol.2018.10.062
Erramilli A, Narayan O, Willinger W (1996) Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Trans Netw 4(2):209–223
Article Google Scholar
Chang CH, Lin YD, Lai YK, Lai YC (2019) A scalable and accurate distributed traffic generator with Fourier transformed distribution over multiple commodity platforms. J Netw Comput Appl 144:102–117. https://doi.org/10.1145/2043164.20185006
Article Google Scholar
Adeppady M, Singh MK, Tamma BR (2020) ONVM-5G: a framework for realization of 5G core in a box using DPDK. CSI Trans ICT 8(1):77–84. https://doi.org/10.1145/2043164.20185007
Article Google Scholar
Leira R, JuliánMoreno G, González I, GómezArribas FJ, de Vergara JEL (2019) Performance assessment of 40 Gbit/s off-the-shelf network cards for virtual network probes in 5G networks. Comput Netw 152:133–143. https://doi.org/10.1145/2043164.20185008
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support of the Shanghai Key Technology Project (19DZ1208903), National Natural Science Foundation of China (Grant Nos. 61572325 and 60970012), Ministry of Education Doctoral Fund of Ph.D. Supervisor of China (Grant No. 20113120110008), Shanghai Key Science and Technology Project in Information Technology Field (Grant Nos. 14511107902 and 16DZ1203603), Shanghai Leading Academic Discipline Project (No. XTKX2012), Shanghai Engineering Research Center Project (Nos. GCZX14014 and C14001) and in part by a Cooperation Project with the Intel Asia Pacific Research and Development Center.

Author information

Authors and Affiliations

University of Shanghai for Science and Technology, Shanghai, China
Mingjie Wu, Qingkui Chen & Jingjuan Wang

Authors

Mingjie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qingkui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingjuan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingkui Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, M., Chen, Q. & Wang, J. Toward low CPU usage and efficient DPDK communication in a cluster. J Supercomput 78, 1852–1884 (2022). https://doi.org/10.1007/s11227-021-03942-x

Download citation

Accepted: 08 June 2021
Published: 21 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11227-021-03942-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward low CPU usage and efficient DPDK communication in a cluster

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward low CPU usage and efficient DPDK communication in a cluster

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A relaxed and faster switch migration framework to balance the load of distributed control plane in software defined networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation