Abstract
In recent years, DPDK (Data Plane Development Kit, a data plane development tool set provided by Intel, focusing on high-performance processing of data packets in network applications), one of the high-performance packet I/O frameworks, is widely used to improve the efficiency of data transmission in the cluster. But, the busy polling used in DPDK will not only waste a lot of CPU cycles and cause certain power consumption, but also the high CPU usage will have a great impact on the performance of other applications in the host. Although some technologies, such as DVFS (dynamic voltage and frequency scaling, which is to dynamically adjust the operating frequency and voltage of the chip according to the different needs of the computing power of the application running on the chip, so as to achieve the purpose of energy saving) and LPI (low power idle, a technology that saves power by turning off the power of certain supporting circuits when the CPU core is idle), can reduce power consumption by adjusting CPU voltage and frequency, they can also cause performance degradation in other applications. Using thread sleep technology is a promising method to reduce the CPU usage and power consumption. However, it is challenging because the appropriate thread sleep duration cannot be obtained accurately. In this paper, we propose a model that finds the optimal thread sleep duration to solve the above challenges. From the model, we can balance the thread CPU usage and transmission efficiency to obtain the optimal sleep duration called the transmission performance threshold. Experiments show that the proposed models can significantly reduce the thread CPU usage. Generally, while the communication performance is slightly reduced, the CPU utilization is reduced by about 80%.
Similar content being viewed by others
Notes
Bypass parallel communication mechanism, a parallel communication technology of multiple network cards in a cluster based on DPDK.
References
Rizzo L, Landi M (2011) Netmap: memory mapped access to network devices. SIGCOMM Comput Commun Rev 41(4):422–423. https://doi.org/10.1145/2043164.2018500
Intel. Intel data plane development kit (DPDK). Website (2012). https://www.dpdk.org/
Nazir A, Wajahat A, Qureshi S (2019) Performance analysis of open source solution “ntop’’ for active and passive packet analysis relating to application and transport layer. Int J Adv Comput Sci Appl 10(3):20–27
Barbette T, Soldani C, Mathy L (2015) Fast userspace packet processing. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 5–16
Qingqing R, Liang Z, Zhijun X, Yujun Z, Lei Z (2020) PacketUsher: exploiting DPDK to accelerate compute-intensive packet processing. Comput Commun 161:324–333. https://doi.org/10.1016/j.comcom.2020.07.040
Huo L (2020) Packet-level-based traffic aggregation to optimize NDN content delivery. Int J Commun Syst 33(12). https://doi.org/10.1002/dac.4473
Redžović H, Smiljanić A, Savić B (2016) Performance evaluation of software routers with VPN features. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4
Vesović M, Smiljanić A, Tomašević M (2016) Speeding up IP lookup procedure in software routers by means of parallelization. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4
Emmerich P, Raumer D, Gallenmüller S, Wohlfart F, Carle G (2018) Throughput and latency of virtual switching with open vSwitch: a quantitative analysis. J Netw Syst Manage 26:314–388. https://doi.org/10.1007/s10922-017-9417-0
Yang R, Chang X, Mišić Jelena, Mišić, Vojislav B (2020) Performance modeling of linux network system with open vSwitch. Peer-to-Peer Netw Appl 13:151–162. https://doi.org/10.1007/s12083-019-00723-5
Bradai A, Rehmani MH, Haque I, Nogueira M, Bukhari SHR (2020) Software-defined networking (SDN) and network function virtualization (NFV) for a hyperconnected world: challenges. J Netw Syst Manag Appl Major Adv 28:433–435. https://doi.org/10.1007/s10922-020-09542-z
Hwang J, Ramakrishnan KK, Wood T (2015) NetVM: high performance and flexible networking using virtualization on commodity platforms. IEEE Trans Netw Serv Manage 12(1):34
Ullah S, Choi J, Oh H (2020) Performance analysis and enhancements, IPsec for high speed network links. Future Gener Comput Syst 107:112–125. https://doi.org/10.1016/j.future.2020.01.049
Wu M, Chen Q, Wang J (2020) BPCM: a flexible high-speed bypass parallel communication mechanism for GPU cluster. IEEE Access 8:103256–103272
Li X, Cheng W, Zhang T, Ren F, Yang B (2020) Towards power efficient high performance packet I/O. IEEE Trans Parallel Distrib Syst 31(4):981–996
Benson T, Anand A, Akella A, Zhang M (2010) Understanding data center traffic characteristics. SIGCOMM Comput Commun Rev 40(1):92–99. https://doi.org/10.1145/1672308.1672325
Feamster N, Borkenhagen J, Rexford J (2003) Guidelines for interdomain traffic engineering. SIGCOMM Comput Commun Rev 33(5):19–30. https://doi.org/10.1145/963985.963988
Kandula S, Sengupta S, Greenberg A, Patel P, Chaiken R (2009) The nature of data center traffic: measurements and analysis. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (Association for Computing Machinery, New York, NY, USA, 2009), IMC ’09, p. 202–208. https://doi.org/10.1145/1644893.1644918
Bash C, Forman G (2007) Cool job allocation: measuring the power savings of placing jobs at cooling-efficient locations in the data center. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (USENIX Association, USA, 2007), ATC’07, pp 363–368
Kondo M, Sasaki H, Nakamura H (2007) Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS. SIGARCH Comput Archit News 35(1):31–38. https://doi.org/10.1145/1241601.1241609
Intel. Data plane development kit power optimization on advantech* network appliance platform. Website (2015). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/dpdk-power-optimization-advantech-white-paper.pdf
Wu CM, Chang RS, Chan HY (2014) A green energyefficient scheduling algorithm using the DVFS technique for cloud datacenters, Future Gener Comput Syst 37, 141–147. https://doi.org/10.1016/j.future.2013.06.009
Gallenmüller S, Emmerich P, Wohlfart F, Raumer D, Carle G (2015) Comparison of frameworks for high-performance packet IO. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 29–38
Kahng AB, Kang S, Kumar R, Sartori J (2013) Enhancing the efficiency of energy-constrained DVFS designs. IEEE Trans Very Large Scale Integr Syst 21(10), 1769–1782
Cho SJ, Yun SH, Jeon JW (2015) A powersaving DVFS algorithm based on operational intensity for embedded systems. IEICE Electron Express 12(3):20141128. https://doi.org/10.1587/elex.12.20141128
Borodin S, Pavlenko D (2014) Device for detection and bearing finding of radar with a low probability of intercept (LPI), by using detected signal. Nonlinear World (Russia) 12(5):28–31
Schoene R, Molka D, Werner M (2015) Wake-up latencies for processor idle states on current x86 processors. Comput Sci - Res Dev 30(2):219–227. https://doi.org/10.1007/s00450-014-0270-z
Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3):187. https://doi.org/10.1145/2043164.20185000
Agarwal A, Hennessy J, Horowitz M (1988) Cache performance of operating system and multiprogramming workloads. ACM Trans Comput Syst 6(4):393–431. https://doi.org/10.1145/2043164.20185001
Mogul JC, Borg A (1991) The effect of context switches on cache performance. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Association for Computing Machinery, New York, NY, USA, 1991), ASPLOS IV, pp 75–84. https://doi.org/10.1145/106972.106982
Khoshkholghi MA, Derahman MN, Abdullah A, Subramaniam S, Othman M (2017) Energy-efficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access 5:10709–10722
Han J, Lee S (2020) Performance improvement of linux CPU scheduler using policy gradient reinforcement learning for android smartphones. IEEE Access 8:11031
Joseph O, Michael C, Tom P (2015) Benefitting power and performance sleep loops. Website (2015). https://doi.org/10.1145/2043164.20185002
Salvador P, Pacheco A, Valadas R (2004) Modeling IP traffic: joint characterization of packet arrivals and packet sizes using BMAPs. Comput Netw 44(3):335–352. https://doi.org/10.1145/2043164.20185003
Wei G, Weifan W, Yaojun C (2020) Tight bounds for the existence of path factors in network vulnerability parameter settings. Int J Intell Syst 36(3):1133–1158. https://doi.org/10.1145/2043164.20185004
Gao W, Veeresha P, Prakasha DG, Senel B, Baskonus HM (2020) Iterative method applied to the fractional nonlinear systems arising in thermoelasticity with Mittag Leffler kernel. Fractals-Complex Geom Patterns Scal Nat Soc 28(8):12040040. https://doi.org/10.1142/S0218348X2040040X
Wei G, Yaojun C (2020) Approximation analysis of ontology learning algorithm in linear combination setting. J Cloud Comput 9(1):29. https://doi.org/10.1145/2043164.20185005
Hashlamon I (2020) A new adaptive extended Kalman filter for a class of nonlinear systems. J Appl Comput Mech 6(1): 1–12. https://doi.org/10.22055/jacm.2019.28130.1455
Lei X, Tian Y, Zhang Z, Wang L, Xiang X, Wang H (2019) Correction of pumping station parameters in a one-dimensional hydrodynamic model using the Ensemble Kalman filter. J Hydrol 568, 108–118 . https://doi.org/10.1016/j.jhydrol.2018.10.062
Erramilli A, Narayan O, Willinger W (1996) Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Trans Netw 4(2):209–223
Chang CH, Lin YD, Lai YK, Lai YC (2019) A scalable and accurate distributed traffic generator with Fourier transformed distribution over multiple commodity platforms. J Netw Comput Appl 144:102–117. https://doi.org/10.1145/2043164.20185006
Adeppady M, Singh MK, Tamma BR (2020) ONVM-5G: a framework for realization of 5G core in a box using DPDK. CSI Trans ICT 8(1):77–84. https://doi.org/10.1145/2043164.20185007
Leira R, JuliánMoreno G, González I, GómezArribas FJ, de Vergara JEL (2019) Performance assessment of 40 Gbit/s off-the-shelf network cards for virtual network probes in 5G networks. Comput Netw 152:133–143. https://doi.org/10.1145/2043164.20185008
Acknowledgements
The authors gratefully acknowledge the support of the Shanghai Key Technology Project (19DZ1208903), National Natural Science Foundation of China (Grant Nos. 61572325 and 60970012), Ministry of Education Doctoral Fund of Ph.D. Supervisor of China (Grant No. 20113120110008), Shanghai Key Science and Technology Project in Information Technology Field (Grant Nos. 14511107902 and 16DZ1203603), Shanghai Leading Academic Discipline Project (No. XTKX2012), Shanghai Engineering Research Center Project (Nos. GCZX14014 and C14001) and in part by a Cooperation Project with the Intel Asia Pacific Research and Development Center.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, M., Chen, Q. & Wang, J. Toward low CPU usage and efficient DPDK communication in a cluster. J Supercomput 78, 1852–1884 (2022). https://doi.org/10.1007/s11227-021-03942-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03942-x