Abstract
As a promising and evolving computing paradigm, cloud computing benefits scientific computing-related computational-intensive applications, which usually orchestrated in terms of workflows, by providing unlimited, elastic, and heterogeneous resources in a pay-as-you-go way. Given a workflow template, identifying a set of appropriate cloud services that fulfill users’ functional requirements under pre-given constraints is widely recognized to be a challenge. However, due to the situation that the supporting cloud infrastructures can be highly prone to performance variations and fluctuations, various challenges such as guaranteeing user-perceived performance and reducing the cost of the cloud-supported scientific workflow need to be properly tackled. Traditional approaches tend to ignore such fluctuations when scheduling workflow tasks and thus can lead to frequent violations to Service-Level-Agreement (SLA). On the contrary, we take such fluctuations into consideration and formulate the workflow scheduling problem as a continuous decision-making process and propose a reactive, deep-reinforcement-learning-based method, named DeepWS, to solve it. Extensive case studies based on real-world workflow templates show that our approach outperforms significantly than traditional ones in terms of SLA-violation rate and total cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abrishami, S., Naghibzadeh, M., Epema, D.H.: Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Gener. Comput. Syst. 29(1), 158–169 (2013)
Belhajjame, K., Faci, N., Maamar, Z., Burégio, V., Soares, E., Barhamgi, M.: On privacy-aware eScience workflows. Computing 1–15 (2020)
Christophe, C., et al.: Downtime statistics of current cloud solutions. In: International Working Group on Cloud Computing Resiliency. Technical report (2014)
Irwin, D.E., Grit, L.E., Chase, J.S.: Balancing risk and reward in a market-based task service. In: Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, pp. 160–169. IEEE (2004)
Li, W., Xia, Y., Zhou, M., Sun, X., Zhu, Q.: Fluctuation-aware and predictive workflow scheduling in cost-effective infrastructure-as-a-service clouds. IEEE Access (2018)
Li, X., Yu, W., Ruiz, R., Zhu, J.: Energy-aware cloud workflow applications scheduling with geo-distributed data. IEEE Trans. Serv. Comput. (2020)
Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)
Malawski, M., Figiela, K., Bubak, M., Deelman, E., Nabrzyski, J.: Scheduling multilevel deadline-constrained scientific workflows on clouds based on cost optimization. Sci. Program. 2015, 5 (2015)
Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12. IEEE (2011)
Maurice, G., et al.: Downtime statistics of current cloud solutions. In: International Working Group on Cloud Computing Resiliency. Technical report (2012)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Math. Oper. Res. 12(3), 441–450 (1987)
Poola, D., Ramamohanarao, K., Buyya, R.: Fault-tolerant workflow scheduling using spot instances on clouds. Procedia Comput. Sci. 29, 523–533 (2014)
Rana, O.F., Warnier, M., Quillinan, T.B., Brazier, F., Cojocarasu, D.: Managing violations in service level agreements. In: Rana, O.F., Warnier, M., Quillinan, T.B., Brazier, F., Cojocarasu, D. (eds.) Grid Middleware and Services, pp. 349–358. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-78446-5_23
Rodriguez, M.A., Buyya, R.: Budget-driven scheduling of scientific workflows in IaaS clouds with fine-grained billing periods. ACM Trans. Auton. Adapt. Syst. (TAAS) 12(2), 5 (2017)
Rodriguez, M.A., Buyya, R.: Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014)
Rodriguez, M.A., Buyya, R.: A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments. Concurr. Comput. Pract. Exp. 29(8), e4041 (2017)
Schad, J., Dittrich, J., Quiané-Ruiz, J.A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. Proc. VLDB Endow. 3(1–2), 460–471 (2010)
Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
Wu, C., Peng, Q., Xia, Y., Lee, J.: Mobility-aware tasks offloading in mobile edge computing environment. In: 2019 Seventh International Symposium on Computing and Networking (CANDAR), pp. 204–210. IEEE (2019)
Wu, Q., Zhou, M., Zhu, Q., Xia, Y., Wen, J.: MOELS: multiobjective evolutionary list scheduling for cloud workflows. IEEE Trans. Autom. Sci. Eng. 17(1), 166–176 (2019)
Yeo, C.S., Buyya, R.: Service level agreement based allocation of cluster resources: handling penalty to enhance utility. In: IEEE International Cluster Computing, pp. 1–10. IEEE (2005)
Zhou, A.C., He, B., Liu, C.: Monetary cost optimizations for hosting workflow-as-a-service in IaaS clouds. IEEE Trans. Cloud Comput. 4(1), 34–48 (2016)
Zhu, Z., Zhang, G., Li, M., Liu, X.: Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans. Parallel Distrib. Syst. 27(5), 1344–1357 (2016)
Acknowledgement
This work is supported in part by the Graduate Scientific Research and Innovation Foundation of Chongqing, China (Grant No. CYB20062 and CYS20066), and the Fundamental Research Funds for the Central Universities (China) under Project 2019CDXYJSJ0022.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Peng, Q. et al. (2021). Reactive Workflow Scheduling in Fluctuant Infrastructure-as-a-Service Clouds Using Deep Reinforcement Learning. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 350. Springer, Cham. https://doi.org/10.1007/978-3-030-67540-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-67540-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67539-4
Online ISBN: 978-3-030-67540-0
eBook Packages: Computer ScienceComputer Science (R0)