Skip to main content

RILNET: A Reinforcement Learning Based Load Balancing Approach for Datacenter Networks

  • Conference paper
  • First Online:
Machine Learning for Networking (MLN 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11407))

Included in the following conference series:

Abstract

Modern datacenter networks are facing various challenges, e.g., highly dynamic workloads, congestion, topology asymmetry. ECMP, as a traditional load balancing mechanism which is widely used in today’s datacenters, can balance load poorly and lead to congestion. Variety of load balancing schemes are proposed to address the problems of ECMP. However, these traditional schemes usually make load balancing decision only based on network knowledge for a snapshot or a short time past. In this paper, we propose a Reinforcement Learning (RL) based approach, called RILNET (ReInforcement Learning NETworking), aiming at load balancing for datacenter networks. RILNET employs RL to learn a network and control it based on the learned experience. To achieve a higher granularity of control, RILNET is constructed to route flowlet rather than flows. Moreover, RILNET makes routing decisions for aggregation flows (an aggregation flow is a flow set that includes all flows flowing from the same source edge switch to the same destination edge switch) instead of a single flow. In order to test performance of RILNET, we propose a flow-level simulation and a packet-level simulation, and the both results show that RILNET can balance traffic load much more effectively than ECMP and another load balancing solution, i.e., DRILL. Compared with DRILL, RILNET outperforms DRILL in data loss and maximal link delay. Specifically, the maximal link data loss and the maximal link delay of RILNET are 44.4% and 25.4% smaller than DRILL, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that RILNET can be used for multiple purposes, including load balancing, reducing data loss, reducing flow completion time, etc. In this paper, we focus on load balancing and leave the other purposes in our future work.

References

  1. Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. In: ACM SIGCOMM Computer Communication Review, vol. 38, pp. 63–74. ACM (2008)

    Google Scholar 

  2. Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: NSDI, vol. 10, p. 19 (2010)

    Google Scholar 

  3. Alizadeh, M., et al.: CONGA: distributed congestion-aware load balancing for datacenters. In: ACM SIGCOMM Computer Communication Review, vol. 44, pp. 503–514. ACM (2014)

    Google Scholar 

  4. Benson, T., Akella, A., Maltz, D.A.: Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 267–280. ACM (2010)

    Google Scholar 

  5. Chavula, J., Densmore, M., Suleman, H.: Using SDN and reinforcement learning for traffic engineering in UbuntuNet alliance. In: 2016 International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 349–355. IEEE (2016)

    Google Scholar 

  6. Ghorbani, S., Godfrey, B., Ganjali, Y., Firoozshahian, A.: micro load balancing in data centers with drill. In: Proceedings of the 14th ACM Workshop on Hot Topics in Networks, p. 17. ACM (2015)

    Google Scholar 

  7. Gill, P., Jain, N., Nagappan, N.: Understanding network failures in data centers: measurement, analysis, and implications. In: ACM SIGCOMM Computer Communication Review, vol. 41, pp. 350–361. ACM (2011)

    Google Scholar 

  8. Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6), 1291–1307 (2012)

    Article  Google Scholar 

  9. Guo, C., et al.: BCube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Comput. Commun. Rev. 39(4), 63–74 (2009)

    Article  Google Scholar 

  10. Guo, C., et al.: Pingmesh: a large-scale system for data center network latency measurement and analysis. ACM SIGCOMM Comput. Commun. Rev. 45(4), 139–152 (2015)

    Article  Google Scholar 

  11. He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: edge-based load balancing for fast datacenter networks. ACM SIGCOMM Comput. Commun. Rev. 45(4), 465–478 (2015)

    Article  Google Scholar 

  12. Kandula, S., Katabi, D., Sinha, S., Berger, A.: Dynamic load balancing without packet reordering. ACM SIGCOMM Comput. Commun. Rev. 37(2), 51–62 (2007)

    Article  Google Scholar 

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  15. Lin, S.C., Akyildiz, I.F., Wang, P., Luo, M.: QoS-aware adaptive routing in multi-layer hierarchical software defined networks: a reinforcement learning approach. In: 2016 IEEE International Conference on Services Computing (SCC), pp. 25–33. IEEE (2016)

    Google Scholar 

  16. Popa, L., Kumar, G., Chowdhury, M., Krishnamurthy, A., Ratnasamy, S., Stoica, I.: FairCloud: sharing the network in cloud computing. In: Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 187–198. ACM (2012)

    Google Scholar 

  17. Rasley, J., et al.: Planck: millisecond-scale monitoring and control for commodity networks. In: ACM SIGCOMM Computer Communication Review, vol. 44, pp. 407–418. ACM (2014)

    Google Scholar 

  18. The MAWI Working Group: MAWI working group traffic archive. http://mawi.wide.ad.jp/mawi/. Accessed 21 June 2018

  19. Varga, A.: OMNeT++ user manual version 4.6. OpenSim Ltd (2014)

    Google Scholar 

  20. Zhang, H., Zhang, J., Bai, W., Chen, K., Chowdhury, M.: Resilient datacenter load balancing in the wild. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 253–266. ACM (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinliang Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, Q., Gong, Z., Wang, Q., Li, J. (2019). RILNET: A Reinforcement Learning Based Load Balancing Approach for Datacenter Networks. In: Renault, É., Mühlethaler, P., Boumerdassi, S. (eds) Machine Learning for Networking. MLN 2018. Lecture Notes in Computer Science(), vol 11407. Springer, Cham. https://doi.org/10.1007/978-3-030-19945-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-19945-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19944-9

  • Online ISBN: 978-3-030-19945-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics