Skip to main content

Advertisement

Log in

Toward low CPU usage and efficient DPDK communication in a cluster

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, DPDK (Data Plane Development Kit, a data plane development tool set provided by Intel, focusing on high-performance processing of data packets in network applications), one of the high-performance packet I/O frameworks, is widely used to improve the efficiency of data transmission in the cluster. But, the busy polling used in DPDK will not only waste a lot of CPU cycles and cause certain power consumption, but also the high CPU usage will have a great impact on the performance of other applications in the host. Although some technologies, such as DVFS (dynamic voltage and frequency scaling, which is to dynamically adjust the operating frequency and voltage of the chip according to the different needs of the computing power of the application running on the chip, so as to achieve the purpose of energy saving) and LPI (low power idle, a technology that saves power by turning off the power of certain supporting circuits when the CPU core is idle), can reduce power consumption by adjusting CPU voltage and frequency, they can also cause performance degradation in other applications. Using thread sleep technology is a promising method to reduce the CPU usage and power consumption. However, it is challenging because the appropriate thread sleep duration cannot be obtained accurately. In this paper, we propose a model that finds the optimal thread sleep duration to solve the above challenges. From the model, we can balance the thread CPU usage and transmission efficiency to obtain the optimal sleep duration called the transmission performance threshold. Experiments show that the proposed models can significantly reduce the thread CPU usage. Generally, while the communication performance is slightly reduced, the CPU utilization is reduced by about 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. Bypass parallel communication mechanism, a parallel communication technology of multiple network cards in a cluster based on DPDK.

References

  1. Rizzo L, Landi M (2011) Netmap: memory mapped access to network devices. SIGCOMM Comput Commun Rev 41(4):422–423. https://doi.org/10.1145/2043164.2018500

    Article  Google Scholar 

  2. Intel. Intel data plane development kit (DPDK). Website (2012). https://www.dpdk.org/

  3. Nazir A, Wajahat A, Qureshi S (2019) Performance analysis of open source solution “ntop’’ for active and passive packet analysis relating to application and transport layer. Int J Adv Comput Sci Appl 10(3):20–27

    Google Scholar 

  4. Barbette T, Soldani C, Mathy L (2015) Fast userspace packet processing. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 5–16

  5. Qingqing R, Liang Z, Zhijun X, Yujun Z, Lei Z (2020) PacketUsher: exploiting DPDK to accelerate compute-intensive packet processing. Comput Commun 161:324–333. https://doi.org/10.1016/j.comcom.2020.07.040

    Article  Google Scholar 

  6. Huo L (2020) Packet-level-based traffic aggregation to optimize NDN content delivery. Int J Commun Syst 33(12). https://doi.org/10.1002/dac.4473

  7. Redžović H, Smiljanić A, Savić B (2016) Performance evaluation of software routers with VPN features. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4

  8. Vesović M, Smiljanić A, Tomašević M (2016) Speeding up IP lookup procedure in software routers by means of parallelization. In: 2016 24th Telecommunications Forum (TELFOR), pp 1–4

  9. Emmerich P, Raumer D, Gallenmüller S, Wohlfart F, Carle G (2018) Throughput and latency of virtual switching with open vSwitch: a quantitative analysis. J Netw Syst Manage 26:314–388. https://doi.org/10.1007/s10922-017-9417-0

    Article  Google Scholar 

  10. Yang R, Chang X, Mišić Jelena, Mišić, Vojislav B (2020) Performance modeling of linux network system with open vSwitch. Peer-to-Peer Netw Appl 13:151–162. https://doi.org/10.1007/s12083-019-00723-5

  11. Bradai A, Rehmani MH, Haque I, Nogueira M, Bukhari SHR (2020) Software-defined networking (SDN) and network function virtualization (NFV) for a hyperconnected world: challenges. J Netw Syst Manag Appl Major Adv 28:433–435. https://doi.org/10.1007/s10922-020-09542-z

    Article  Google Scholar 

  12. Hwang J, Ramakrishnan KK, Wood T (2015) NetVM: high performance and flexible networking using virtualization on commodity platforms. IEEE Trans Netw Serv Manage 12(1):34

    Article  Google Scholar 

  13. Ullah S, Choi J, Oh H (2020) Performance analysis and enhancements, IPsec for high speed network links. Future Gener Comput Syst 107:112–125. https://doi.org/10.1016/j.future.2020.01.049

  14. Wu M, Chen Q, Wang J (2020) BPCM: a flexible high-speed bypass parallel communication mechanism for GPU cluster. IEEE Access 8:103256–103272

    Article  Google Scholar 

  15. Li X, Cheng W, Zhang T, Ren F, Yang B (2020) Towards power efficient high performance packet I/O. IEEE Trans Parallel Distrib Syst 31(4):981–996

    Article  Google Scholar 

  16. Benson T, Anand A, Akella A, Zhang M (2010) Understanding data center traffic characteristics. SIGCOMM Comput Commun Rev 40(1):92–99. https://doi.org/10.1145/1672308.1672325

    Article  Google Scholar 

  17. Feamster N, Borkenhagen J, Rexford J (2003) Guidelines for interdomain traffic engineering. SIGCOMM Comput Commun Rev 33(5):19–30. https://doi.org/10.1145/963985.963988

    Article  Google Scholar 

  18. Kandula S, Sengupta S, Greenberg A, Patel P, Chaiken R (2009) The nature of data center traffic: measurements and analysis. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (Association for Computing Machinery, New York, NY, USA, 2009), IMC ’09, p. 202–208. https://doi.org/10.1145/1644893.1644918

  19. Bash C, Forman G (2007) Cool job allocation: measuring the power savings of placing jobs at cooling-efficient locations in the data center. In: 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (USENIX Association, USA, 2007), ATC’07, pp 363–368

  20. Kondo M, Sasaki H, Nakamura H (2007) Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS. SIGARCH Comput Archit News 35(1):31–38. https://doi.org/10.1145/1241601.1241609

    Article  Google Scholar 

  21. Intel. Data plane development kit power optimization on advantech* network appliance platform. Website (2015). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/dpdk-power-optimization-advantech-white-paper.pdf

  22. Wu CM, Chang RS, Chan HY (2014) A green energyefficient scheduling algorithm using the DVFS technique for cloud datacenters, Future Gener Comput Syst 37, 141–147. https://doi.org/10.1016/j.future.2013.06.009

  23. Gallenmüller S, Emmerich P, Wohlfart F, Raumer D, Carle G (2015) Comparison of frameworks for high-performance packet IO. In: 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pp 29–38

  24. Kahng AB, Kang S, Kumar R, Sartori J (2013) Enhancing the efficiency of energy-constrained DVFS designs. IEEE Trans Very Large Scale Integr Syst 21(10), 1769–1782

  25. Cho SJ, Yun SH, Jeon JW (2015) A powersaving DVFS algorithm based on operational intensity for embedded systems. IEICE Electron Express 12(3):20141128. https://doi.org/10.1587/elex.12.20141128

    Article  Google Scholar 

  26. Borodin S, Pavlenko D (2014) Device for detection and bearing finding of radar with a low probability of intercept (LPI), by using detected signal. Nonlinear World (Russia) 12(5):28–31

    Google Scholar 

  27. Schoene R, Molka D, Werner M (2015) Wake-up latencies for processor idle states on current x86 processors. Comput Sci - Res Dev 30(2):219–227. https://doi.org/10.1007/s00450-014-0270-z

    Article  Google Scholar 

  28. Mazouz A, Laurent A, Pradelle B, Jalby W (2014) Evaluation of CPU frequency transition latency. Comput Sci Res Dev 29(3):187. https://doi.org/10.1145/2043164.20185000

    Article  Google Scholar 

  29. Agarwal A, Hennessy J, Horowitz M (1988) Cache performance of operating system and multiprogramming workloads. ACM Trans Comput Syst 6(4):393–431. https://doi.org/10.1145/2043164.20185001

    Article  Google Scholar 

  30. Mogul JC, Borg A (1991) The effect of context switches on cache performance. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Association for Computing Machinery, New York, NY, USA, 1991), ASPLOS IV, pp 75–84. https://doi.org/10.1145/106972.106982

  31. Khoshkholghi MA, Derahman MN, Abdullah A, Subramaniam S, Othman M (2017) Energy-efficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access 5:10709–10722

    Article  Google Scholar 

  32. Han J, Lee S (2020) Performance improvement of linux CPU scheduler using policy gradient reinforcement learning for android smartphones. IEEE Access 8:11031

    Article  Google Scholar 

  33. Joseph O, Michael C, Tom P (2015) Benefitting power and performance sleep loops. Website (2015). https://doi.org/10.1145/2043164.20185002

  34. Salvador P, Pacheco A, Valadas R (2004) Modeling IP traffic: joint characterization of packet arrivals and packet sizes using BMAPs. Comput Netw 44(3):335–352. https://doi.org/10.1145/2043164.20185003

    Article  MATH  Google Scholar 

  35. Wei G, Weifan W, Yaojun C (2020) Tight bounds for the existence of path factors in network vulnerability parameter settings. Int J Intell Syst 36(3):1133–1158. https://doi.org/10.1145/2043164.20185004

    Article  Google Scholar 

  36. Gao W, Veeresha P, Prakasha DG, Senel B, Baskonus HM (2020) Iterative method applied to the fractional nonlinear systems arising in thermoelasticity with Mittag Leffler kernel. Fractals-Complex Geom Patterns Scal Nat Soc 28(8):12040040. https://doi.org/10.1142/S0218348X2040040X

  37. Wei G, Yaojun C (2020) Approximation analysis of ontology learning algorithm in linear combination setting. J Cloud Comput 9(1):29. https://doi.org/10.1145/2043164.20185005

    Article  Google Scholar 

  38. Hashlamon I (2020) A new adaptive extended Kalman filter for a class of nonlinear systems. J Appl Comput Mech 6(1): 1–12. https://doi.org/10.22055/jacm.2019.28130.1455

  39. Lei X, Tian Y, Zhang Z, Wang L, Xiang X, Wang H (2019) Correction of pumping station parameters in a one-dimensional hydrodynamic model using the Ensemble Kalman filter. J Hydrol 568, 108–118 . https://doi.org/10.1016/j.jhydrol.2018.10.062

  40. Erramilli A, Narayan O, Willinger W (1996) Experimental queueing analysis with long-range dependent packet traffic. IEEE/ACM Trans Netw 4(2):209–223

    Article  Google Scholar 

  41. Chang CH, Lin YD, Lai YK, Lai YC (2019) A scalable and accurate distributed traffic generator with Fourier transformed distribution over multiple commodity platforms. J Netw Comput Appl 144:102–117. https://doi.org/10.1145/2043164.20185006

    Article  Google Scholar 

  42. Adeppady M, Singh MK, Tamma BR (2020) ONVM-5G: a framework for realization of 5G core in a box using DPDK. CSI Trans ICT 8(1):77–84. https://doi.org/10.1145/2043164.20185007

    Article  Google Scholar 

  43. Leira R, JuliánMoreno G, González I, GómezArribas FJ, de Vergara JEL (2019) Performance assessment of 40 Gbit/s off-the-shelf network cards for virtual network probes in 5G networks. Comput Netw 152:133–143. https://doi.org/10.1145/2043164.20185008

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the support of the Shanghai Key Technology Project (19DZ1208903), National Natural Science Foundation of China (Grant Nos. 61572325 and 60970012), Ministry of Education Doctoral Fund of Ph.D. Supervisor of China (Grant No. 20113120110008), Shanghai Key Science and Technology Project in Information Technology Field (Grant Nos. 14511107902 and 16DZ1203603), Shanghai Leading Academic Discipline Project (No. XTKX2012), Shanghai Engineering Research Center Project (Nos. GCZX14014 and C14001) and in part by a Cooperation Project with the Intel Asia Pacific Research and Development Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingkui Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, M., Chen, Q. & Wang, J. Toward low CPU usage and efficient DPDK communication in a cluster. J Supercomput 78, 1852–1884 (2022). https://doi.org/10.1007/s11227-021-03942-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03942-x

Keywords

Navigation