Skip to main content

Reactive Workflow Scheduling in Fluctuant Infrastructure-as-a-Service Clouds Using Deep Reinforcement Learning

  • Conference paper
  • First Online:
Book cover Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2020)

Abstract

As a promising and evolving computing paradigm, cloud computing benefits scientific computing-related computational-intensive applications, which usually orchestrated in terms of workflows, by providing unlimited, elastic, and heterogeneous resources in a pay-as-you-go way. Given a workflow template, identifying a set of appropriate cloud services that fulfill users’ functional requirements under pre-given constraints is widely recognized to be a challenge. However, due to the situation that the supporting cloud infrastructures can be highly prone to performance variations and fluctuations, various challenges such as guaranteeing user-perceived performance and reducing the cost of the cloud-supported scientific workflow need to be properly tackled. Traditional approaches tend to ignore such fluctuations when scheduling workflow tasks and thus can lead to frequent violations to Service-Level-Agreement (SLA). On the contrary, we take such fluctuations into consideration and formulate the workflow scheduling problem as a continuous decision-making process and propose a reactive, deep-reinforcement-learning-based method, named DeepWS, to solve it. Extensive case studies based on real-world workflow templates show that our approach outperforms significantly than traditional ones in terms of SLA-violation rate and total cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://aws.amazon.com/cn/ec2/pricing/on-demand/.

  2. 2.

    http://browser.geekbench.com/.

  3. 3.

    https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator.

  4. 4.

    https://aws.amazon.com/compute/sla/.

References

  1. Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)

    Article  Google Scholar 

  2. Belhajjame, K., Faci, N., Maamar, Z., Burégio, V., Soares, E., Barhamgi, M.: On privacy-aware eScience workflows. Computing 1–15 (2020)

    Google Scholar 

  3. Christophe, C., et al.: Downtime statistics of current cloud solutions. In: International Working Group on Cloud Computing Resiliency. Technical report (2014)

    Google Scholar 

  4. Irwin, D.E., Grit, L.E., Chase, J.S.: Balancing risk and reward in a market-based task service. In: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pp. 160–169. IEEE (2004)

    Google Scholar 

  5. Li, W., Xia, Y., Zhou, M., Sun, X., Zhu, Q.: Fluctuation-aware and predictive workflow scheduling in cost-effective infrastructure-as-a-service clouds. IEEE Access (2018)

    Google Scholar 

  6. Li, X., Yu, W., Ruiz, R., Zhu, J.: Energy-aware cloud workflow applications scheduling with geo-distributed data. IEEE Trans. Serv. Comput. (2020)

    Google Scholar 

  7. Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)

    Google Scholar 

  8. Malawski, M., Figiela, K., Bubak, M., Deelman, E., Nabrzyski, J.: Scheduling multilevel deadline-constrained scientific workflows on clouds based on cost optimization. Sci. Program. 2015, 5 (2015)

    Google Scholar 

  9. Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2011)

    Google Scholar 

  10. Maurice, G., et al.: Downtime statistics of current cloud solutions. In: International Working Group on Cloud Computing Resiliency. Technical report (2012)

    Google Scholar 

  11. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  12. Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)

    Article  MathSciNet  Google Scholar 

  13. Poola, D., Ramamohanarao, K., Buyya, R.: Fault-tolerant workflow scheduling using spot instances on clouds. Procedia Comput. Sci. 29, 523–533 (2014)

    Article  Google Scholar 

  14. Rana, O.F., Warnier, M., Quillinan, T.B., Brazier, F., Cojocarasu, D.: Managing violations in service level agreements. In: Rana, O.F., Warnier, M., Quillinan, T.B., Brazier, F., Cojocarasu, D. (eds.) Grid Middleware and Services, pp. 349–358. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-78446-5_23

    Chapter  Google Scholar 

  15. Rodriguez, M.A., Buyya, R.: Budget-driven scheduling of scientific workflows in IaaS clouds with fine-grained billing periods. ACM Trans. Auton. Adapt. Syst. (TAAS) 12(2), 5 (2017)

    Google Scholar 

  16. Rodriguez, M.A., Buyya, R.: Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014)

    Article  Google Scholar 

  17. Rodriguez, M.A., Buyya, R.: A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments. Concurr. Comput. Pract. Exp. 29(8), e4041 (2017)

    Article  Google Scholar 

  18. Schad, J., Dittrich, J., Quiané-Ruiz, J.A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endow. 3(1–2), 460–471 (2010)

    Article  Google Scholar 

  19. Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)

  20. Wu, C., Peng, Q., Xia, Y., Lee, J.: Mobility-aware tasks offloading in mobile edge computing environment. In: 2019 Seventh International Symposium on Computing and Networking (CANDAR), pp. 204–210. IEEE (2019)

    Google Scholar 

  21. Wu, Q., Zhou, M., Zhu, Q., Xia, Y., Wen, J.: MOELS: multiobjective evolutionary list scheduling for cloud workflows. IEEE Trans. Autom. Sci. Eng. 17(1), 166–176 (2019)

    Article  Google Scholar 

  22. Yeo, C.S., Buyya, R.: Service level agreement based allocation of cluster resources: handling penalty to enhance utility. In: IEEE International Cluster Computing, pp. 1–10. IEEE (2005)

    Google Scholar 

  23. Zhou, A.C., He, B., Liu, C.: Monetary cost optimizations for hosting workflow-as-a-service in IaaS clouds. IEEE Trans. Cloud Comput. 4(1), 34–48 (2016)

    Article  Google Scholar 

  24. Zhu, Z., Zhang, G., Li, M., Liu, X.: Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans. Parallel Distrib. Syst. 27(5), 1344–1357 (2016)

    Article  Google Scholar 

Download references

Acknowledgement

This work is supported in part by the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant No. CYB20062 and CYS20066), and the Fundamental Research Funds for the Central Universities (China) under Project 2019CDXYJSJ0022.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wanbo Zheng or Yunni Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, Q. et al. (2021). Reactive Workflow Scheduling in Fluctuant Infrastructure-as-a-Service Clouds Using Deep Reinforcement Learning. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67540-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67539-4

  • Online ISBN: 978-3-030-67540-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics