Skip to main content
Log in

Adjusting ECN marking threshold in multi-queue DCNs with deep learning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Explicit Congestion Notification (ECN) is designed for single queues. However, today, data center networks (DCNs) need multiple queues on each switch port. But, if some of the switches in multiple queue scenarios exceed the ECN marking threshold, all packets on the same port can receive the ECN mark. To solve this problem, we propose mapping-ECN as a systematic answer to the wrong marking problem. First, we differentiate the mice and elephant flows learning algorithm. Then, we prioritize mice flows by keeping in mind the deadline of other flows to not sacrifice them. Secondly, if a packet is marked, we need to have the privilege of using a faster path than other packets for early notification of network status. This will give a complete picture of the instant requests from all senders. In the worst case, if there is no capacity in the buffer to transmit the packets that exceed the threshold of the buffer, mapping-ECN uses Cut Payload (CP), where CP drops the payloads of the packets when a queue reaches the threshold, rather than the metadata. Consequently, just one bit will transmit that carries the information of the packet. Therefore, the sender will immediately retransmit that packet without waiting for a time-out like TCP. This retransmission can arrive within a millisecond for having an extremely low latency network. Last but not least, mapping-ECN explores different kinds of neural network techniques to avoid miss marking in the output port buffer. Therefore, if any packet is marked within the queue buffer, these marked packets are not considered again for marking choices within the output port buffer. Mapping-ECN improves the overall performance of Flow-Completion Time (FCT) for short flows around 7%, 99th percentile around 52%, and FCT for short flows around 8% in comparison between MQ-ECN. Moreover, when compared to the MQ-ECN, Mapping-ECN improves the FCT for large flows, for cache flows and for mice (web search) flows 4, 15 and 6%, respectively. This improvement is legible in comparison between DemePro and Priority-ECN as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ramakrishnan K, Floyd S (1998) A proposal to add explicit congestion notification (ECN) to IP. Tech Rep, pp 751–755. https://doi.org/10.17487/RFC2481

  2. Alizadeh M, Greenberg A, Maltz DA, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M (2010) Datacenter TCP (DCTCP). Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM) 40(4):63–74

    Google Scholar 

  3. Bai W, Chen L, Chen K, Wu H (2016). Enabling ECN in multi-service multi-queue datacenters. In: NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI), pp 537–549

  4. Akbar M, Gao X, Zhu S, Jahanbakhsh N, Zheng J, Chen G (2020) MiFi: bounded update to optimize network performance in software-defined data centers. IEEE/ACM Transactions Netw (ToN), pp 1–14. https://doi.org/10.1109/TNET.2022.3192167

  5. Handley M, Raiciu C, Agache A, Voinescu A, Moore AW, Antichi G, Wo'jcik M (2017) Re-architecting datacenter networks and stacks for low latency and high performance. In: SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp 29–42. https://doi.org/10.1145/3098822.3098825.

  6. Luo J, Jin J, Shan F (2015) Standardization of low-latency TCP with explicit congestion notification: a survey. IEEE Internet Comput 21(1):48–55. https://doi.org/10.1109/MIC.2017.11

    Article  Google Scholar 

  7. Fred Baker, Gorry Fairhurst (2015). IETF recommendations regarding active queue management. Internet Eng Task Force (IETF), Technical report

  8. Kuhn N,Natarajan P, Khademi N, Ros D (2016) Characterization guide queues for active queue management (aqm). Internet Eng Task Force (IETF), Technical report

  9. Bagnulo M, Briscoe B (2017) ECN++: adding explicit congestion notification (ECN) to TCP control packets. Draft-bagnulo-tcpm-generalized-ecn-04 (2017, work in progress). Internet Eng Task Force (IETF)

  10. Kuehlewind M, Scheffenegger R, Briscoe B (2015) Problem statement and requirements for increased accuracy in explicit congestion notification (ECN) feedback Internet Engineering Task Force (IETF). RFC 7560

  11. Gao C, Lee VCS (2016) DEME: Decouple packet marking from enqueuing for multiple services in datacenter networks. In: In International Conference on Network Protocols (ICNP), pp 1–2. IEEE.

  12. Gao C, Lee VCS, Li K (2017) DemePro: decouple packet marking from enqueuing for multiple services with proactive congestion control. IEEE Trans Cloud Comput (TCM), pp 1–1. https://doi.org/10.1109/TCC.2017.2688318.

  13. Floyd S, Jacobson V (1993) Random early detection gateways for congestion avoidance. IEEE/ACM Trans Netw (TON), 1(4):397–413

  14. Majidi A, Jahanbakhsh N, Gao X, Zheng J, Chen G (2020) ECN+: A marking-aware optimization for ECN threshold via per-port in data center networks. J Netw Comput Appl (JNCA), 152(C). https://doi.org/10.1016/j.jnca.2019.102504, 152:102504–102517.

  15. A, Jahanbakhsh N, Gao X, Zheng J, Chen G (2020) DC-ECN: a machine-learning based dynamic threshold control scheme for ECN marking in DCN. Comput Commun 150(C):334–345. https://doi.org/10.1016/j.comcom.2019.10.028Majidi.

  16. Majidi A, Gao X, Jahanbakhsh S, Jamali S, Zheng J, Chen G (2019) Deep-RL: deep reinforcement learning for marking-aware via per-port in data centers. In: 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp 392–395. https://doi.org/10.1109/ICPADS47876.2019.00061.

  17. Majidi A, Gao X, Jahanbakhsh N, Zheng J, Chen G (2020) Priority policy in multi-queue datacenter networks via per-port ECN marking. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp 1–8, IEEE. https://doi.org/10.1109/IMCOM48794.2020.9001721.

  18. Shan D, Ren F (2017) Improving ECN marking scheme with micro-burst traffic in data center networks. In: International Conference on Computer Communications (INFOCOM). IEEE, 2017, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8057181.

  19. Alizadeh M, Kabbani A, Atikoglu B, Prabhakar B (2011) Stability analysis of QCN: the averaging principle. ACM SIGMETRICS Perform Eval Rev 39(1):49–60. https://doi.org/10.1145/2007116.2007123

    Article  Google Scholar 

  20. Shan D, Ren F, Cheng P, Shu R, Guo C (2018) Micro-burst in data centers: observations, analysis, and mitigations. In: 2018 IEEE 26th International Conference on Network Protocols (ICNP), 2018, pp 88–98. https://doi.org/10.1109/ICNP.2018.00019.

  21. Wu H, Ju J, Lu G, Guo C, Xiong Y, Zhang Y (2012) Tuning ECN for data center networks. In: CoNEXT '12: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, pp 25–36. https://doi.org/10.1145/2413176.2413181.

  22. Chen L, Chen K, Bai W, Alizadeh M (2016) Scheduling mix-flows in commodity datacenters with karuna. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp 174–187. ACM. https://doi.org/10.1145/2934872.2934888.

  23. Lu Y, Chen G, Luo L, Tan K, Xiong Y, Wang X, Chen E (2017) One more queue is enough: minimizing flow completion time with explicit priority notification. In: IEEE INFOCOM 2017—IEEE Conference on Computer Communications, 2017, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8056946.

  24. Cheng P, Ren F, Shu R, Lin C (2014) Catch the whole lot in an action: rapid precise packet loss notification in data center. In: NSDI'14: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI), pp 17–28

  25. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint. https://doi.org/10.48550/arXiv.1312.5602.

  26. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  27. Poupart P, Chen Z, Jaini P, Fung F, Susanto H, Geng Y, Chen L, Chen K, Jin H (2016) Online flow size prediction for improved network routing. In: 2016 IEEE 24th International Conference on Network Protocols (ICNP), 2016, pp 1–6. https://doi.org/10.1109/ICNP.2016.7785324.

  28. Williams CK, Rasmussen CE (2006) Gaussian processes for machine learning. The MIT Press, 2(3):4. https://doi.org/10.7551/mitpress/3206.001.0001.

  29. Sutton RS, Barto AG (1998) Introduction to reinforcement learning. MIT Press, Cambridge, A Bradford Book, p 322

    Google Scholar 

  30. Omar F (2016) Online Bayesian learning in probabilistic graphical models using moment matching with applications. The University of Waterloo's publication. https://doi.org/10.13140/RG.2.2.22951.04003

  31. Sutton RS, McAllester DA, Singh SP, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst

  32. Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning, vol. 32, pp 387–395

  33. Katta NP, Rexford J, Walker D (2013) Incremental consistent updates. In: HotSDN '13: Proceedings of the second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (SIGCOMM), pp 49–54. https://doi.org/10.1145/2491185.2491191.

  34. Mnih V, KavukcuogluK, Silver D, Graves A, Antonoglou L, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint: arXiv:1312.5602.

  35. Chen L, Lingys J, Chen K, Liu F (2015) Auto: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In: SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp 191–205

  36. Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra (2015). Continuous control with deep reinforcement learning. arXiv preprint: arXiv:1509.02971.

  37. Pan Y, Tian C, Zheng J, Zhang G, Susanto H, Bai B, Chen G (2018) Support ECN in multi-queue datacenter networks via per-port marking with selective blindness. In: International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 33–42. https://doi.org/10.1109/ICDCS.2018.00014.

  38. Alizadeh M, Yang S, Sharif M, Katti S, McKeown N, Prabhakar B, Shenker S (2013) pFabric: Minimal near-optimal datacenter transport. ACM SIGCOMM Comput Commun Rev 43(4):435–446

    Article  Google Scholar 

  39. Van Kessel G, Nunez-Queija R, Borst S (2005) Differenttiated bandwidth sharing with disparate flow sizes. In: Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., 2005, vol 4, pp 2425–2435. https://doi.org/10.1109/INFCOM.2005.1498528.

  40. Hu C, Liu B, Zhao H, Chen K, Yu YC, Cheng HW (2014) Discount counting for fast flow statistics on flow size and flow volume. IEEE/ACM Trans Netw 22(3):970–981. https://doi.org/10.1109/TNET.2013.2270439

  41. Rai IA, Biersack EW, Urvoy-Kelle G (2005) Size-based scheduling to improve the performance of short TCP flows. EEE Network 19(1):12–17. https://doi.org/10.1109/MNET.2005.1383435.

  42. Bai W, Chen L, Chen K, Han D, Tian C, Wang H (2017) PIAS: practical information-agnostic flow scheduling for commodity data centers. In: IEEE/ACM Transa Netw 25(4):1954–1967. https://doi.org/10.1109/TNET.2017.2669216.

  43. Kumar A, Xu J (2006) Sketch guided sampling—using on—queue estimates of flow size for adaptive data collection. In: Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications, 2006, pp 1–11. https://doi.org/10.1109/INFOCOM.2006.326.

  44. Lall A, Ogihara M, Jun Xu (2009) An efficient algorithm for measuring medium–to–large–sized flows in network traffic. IEEE INFOCOM 2009:2711–2715. https://doi.org/10.1109/INFCOM.2009.5062217

    Article  Google Scholar 

  45. Hu C, Liu B, Wang S, Tian J, Cheng Y, Chen Y (2012) ANLS: adaptive non–queuear sampling method for accurate flow size measurement. IEEE Trans Commun (ToC)60(3):789– 798. https://doi.org/10.1109/TCOMM.2011.112311.100622.

  46. Zandi Y, Majidi A, Ma L (2019) DENA: an intelligent dynamic flow scheduling for rate adjustment in green DCNs. In: IEEE Conference on Local Computer Networks (LCN), pp 234–237. https://doi.org/10.1109/LCN44214.2019.8990731

  47. Lee C, Park C, Jang K, Moon S, Han D (2017) Dx: latency-based congestion control for datacenters. IEEE/ACM Trans Netw (TON) 25(1):335–348. https://doi.org/10.1109/TNET.2016.2587286

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially funded  by the Akhmet Yassawi University - Gazi University Scholarship program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akbar Majidi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amanov, A., Majidi, A., Jahnabakhsh, N. et al. Adjusting ECN marking threshold in multi-queue DCNs with deep learning. J Supercomput 79, 5443–5468 (2023). https://doi.org/10.1007/s11227-022-04893-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04893-7

Keywords

Navigation