Skip to main content

Evaluating the Use of Policy Gradient Optimization Approach for Automatic Cloud Resource Provisioning

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2019)

Abstract

Reinforcement learning is a very active field of research with many practical applications. Success in many cases is driven by combining it with Deep Learning. In this paper we present results of our attempt to use modern advancements in this area for automated management of resources used to host distributed software. We describe the use of three policy training algorithms from the policy gradient optimization family, to create a policy used to control the behavior of an autonomous management agent. The agent is interacting with a simulated cloud computing environment, which is processing a stream of computing jobs. We discuss and compare the policy performance aspects and the feasibility to use them in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Achiam, J.: OpenAI spinning up (2018). https://github.com/openai/spinningup. Accessed 30 Apr 2019

  2. Brockman, G., et al.: OpenAI gym. CoRR abs/1606.01540 (2016). http://arxiv.org/abs/1606.01540

  3. Filho, M.C.S., Oliveira, R.L., Monteiro, C.C., Inácio, P.R.M., Freire, M.M.: CloudSim plus: a cloud computing simulation framework pursuing software engineering principles for improved modularity, extensibility and correctness. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 400–406, May 2017

    Google Scholar 

  4. Funika, W., Koperek, P., Kitowski, J.: Repeatable experiments in the cloud resources management domain with use of reinforcement learning. In: Cracow Grid Workshop 2018, pp. 31–32. ACC Cyfronet AGH, Kraków (2018)

    Google Scholar 

  5. Grondman, I., Busoniu, L., Lopes, G., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 1291–1307 (2012)

    Article  Google Scholar 

  6. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, Piscataway, May 2017

    Google Scholar 

  7. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)

    Google Scholar 

  8. Hussain, A., Aleem, M., Azhar, M., Muhammad, I., Islam, A.: Investigation of cloud scheduling algorithms for resource utilization using CloudSim. Comput. Inf. 38, 525–554 (2019)

    Google Scholar 

  9. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. CoRR cs.AI/9605103 (1996). http://arxiv.org/abs/cs.AI/9605103

  10. Kalashnikov, D., et al.: QT-Opt: scalable deep reinforcement learning for vision-based robotic manipulation. CoRR abs/1806.10293 (2018). http://arxiv.org/abs/1806.10293

  11. Mnih, V., et al.: Playing atari with deep reinforcement learning (2013). http://arxiv.org/abs/1312.5602

  12. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  13. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33rd International Conference on Machine Learning, vol. 48, pp. 1928–1937. PMLR, June 2016

    Google Scholar 

  14. Nikolow, D., Slota, R., Polak, S., Pogoda, M., Kitowski, J.: Policy-based SLA storage management model for distributed data storage services. Comput. Sci. 19, 405 (2018)

    Article  Google Scholar 

  15. Rufus, R., Nick, W., Shelton, J., Esterline, A.C.: An autonomic computing system based on a rule-based policy engine and artificial immune systems. In: MAICS. CEUR Workshop Proceedings, vol. 1584, pp. 105–108. CEUR-WS.org (2016)

    Google Scholar 

  16. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: ICLR (2016)

    Google Scholar 

  17. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR abs/1502.05477 (2015)

    Google Scholar 

  18. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347

  19. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)

    Article  Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998). http://www.cs.ualberta.ca/~sutton/book/the-book.html

    MATH  Google Scholar 

  21. Sutton, R.S.: Temporal credit assignment in reinforcement learning. Ph.D. thesis (1984)

    Google Scholar 

  22. Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)

    Book  Google Scholar 

  23. Wang, Z., Gwon, C., Oates, T., Iezzi, A.: Automated cloud provisioning on AWS using deep reinforcement learning. CoRR abs/1709.04305 (2017). http://arxiv.org/abs/1709.04305

  24. Witten, I.H.: An adaptive optimal controller for discrete-time Markov environments. Inform. Control 34, 286–295 (1977)

    Article  MathSciNet  Google Scholar 

  25. Xu, C.Z., Rao, J., Bu, X.: URL: a unified reinforcement learning approach for autonomic cloud management. J. Parallel Distrib. Comput. 72(2), 95–105 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

The paper was partially financed by AGH University of Science and Technology Statutory Fund. Computational experiments were carried out on the PL-Grid infrastructure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Włodzimierz Funika .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Funika, W., Koperek, P. (2020). Evaluating the Use of Policy Gradient Optimization Approach for Automatic Cloud Resource Provisioning. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43229-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43228-7

  • Online ISBN: 978-3-030-43229-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics