Abstract
Autoscalers exploit cloud-computing elasticity to cope with the dynamic computational demands of scientific workflows. Autoscalers constantly acquire or terminate virtual machines (VMs) on-the-fly to execute workflows minimizing makespan and economic cost. One key problem of workflow autoscaling under budget constraints (i.e. with a maximum limit in cost) is determining the right proportion between: (a) expensive but reliable VMs called on-demand instances, and (b) cheaper but subject-to-failure VMs called spot instances. Spot instances can potentially provide huge parallelism possibilities at low costs but they must be used wisely as they can fail unexpectedly hindering makespan. Given the unpredictability of failures and the inherent performance variability of clouds, designing a policy for assigning the budget for each kind of instance is not a trivial task. For such reason we formalize the described problem as a Markov decision process that allows us learning near-optimal policies from the experience of other baseline policies. Experiments over four well-known scientific workflows, demonstrate that learned policies outperform the baseline policies considering the aggregated relative percentage difference of makespan and execution cost. These promising results encourage the future study of new strategies aiming to find optimal budget policies applied to the execution of workflows in the cloud.







Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
Note that knowing the exact task running-times and the future spot prices progression would permit the design of an optimal policy able to minimize makespan and cost. However, under real cloud execution conditions such assumptions are not realistic and therefore they are not considered in this work.
Southern California Earthquake Center: http://www.scec.org
NASA: http://www.nasa.gov/
References
Buyya, R., Yeo, C., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gen. Comput. Syst. 25(6), 599–616 (2009)
Mao, M., Humphrey, M.: Scaling and scheduling to maximize application performance within budget constraints in cloud workflows. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pp. 67–78. IEEE (2013)
Monge, D.A., Garí, Y., Mateos, C., Garino, C.G.: Autoscaling scientific workflows on the cloud by combining on-demand and spot instances. Int. J. Comput. Syst. Sci. Eng. 32(4), 291–306 (2017). (Special Issue on Elastic Data Management in Cloud Systems)
Expósito, R.R., Taboada, G.L., Ramos, S., Touriño, J., Doallo, R.: Performance analysis of HPC applications in the cloud. Future Gen. Comput. Syst. 29(1), 218–229 (2013)
Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: Deconstructing amazon EC2 spot instance pricing. ACM Trans. Econ. Comput. 1(3), 16:1–16:20 (2013)
Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gen. Comput. Syst. 78, 191–210 (2018)
Benifa, J.V.B., Dejey, D.: RLPAS: reinforcement learning-based proactive auto-scaler for resource provisioning in cloud environment. Mob. Netw. Appl. 2018, 1–16 (2018)
Soualhia, M., Khomh, F., Tahar, S.: A dynamic and failure-aware task scheduling framework for hadoop. IEEE Trans. Cloud Comput. (early access) 1–16 (2018)
Garí, Y., Monge, D.A., Mateos, C., Garino, C.G.: Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud, pp. 353–369. Springer, Cham (2018)
Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gen. Comput. Syst. 29(3), 682–692 (2013)
Monge, D.A., Holec, M., Železnỳ, F., Garino, C.G.: Ensemble learning of runtime prediction models for gene-expression analysis workflows. Clust. Comput. 18(4), 1317–1329 (2015)
Turchenko, V., Shultz, V., Turchenko, I., Wallace, R.M., Sheikhalishahi, M., Vazquez-Poletti, J.L., Lucio, G.: Spot price prediction for cloud computing using neural networks. Int. J. Comput. 12(4), 348–359 (2013)
Shaojie, T., Jing, Y., Li, X.Y.: Towards optimal bidding strategy for Amazon EC2 cloud spot instance. In: Proceedings 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012, pp. 91–98 (2012)
Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)
Van Otterlo, M.: The Logic of Adaptive Behavior. In: Frontiers in Artificial Intelligence and Applications, vol. 192. IOS Press, Amsterdam (2009)
Enda, B., Enda, H., Jim, D.: A learning architecture for scheduling workflow applications in the cloud. Proceedings 9th IEEE European Conference on Web Services, ECOWS 2011, pp. 83–90 (2011)
Enda, B., Enda, H., Jim, D.: Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurr. Comput. Pract. Exp. 24(13), 1397–1420 (2012)
Philip, M., Ewa, D., Li, Z., Robert, G., Gaurang, M., Nitin, G., John, M., Carl, K., Scott, C., David, O., Hunter, F., Vipin, G., Yifeng, C., Karan, V., Thomas, J., Edward, F.: SCEC CyberShake workflows-automating probabilistic seismic hazard analysis calculations. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science: Scientific Workflows for Grids, pp. 143–163. Springer, London (2007)
Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Mei-Hui, S., Vahi, K.: Characterization of scientific workflows. In Third Workshop on Workflows in Support of Large-Scale Science, 2008. WORKS 2008, pp. 1–10 (2008)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Progr. 13(3), 219–237 (2005)
Duncan, A.B., Patrick, R.B., Alexander, D., Junwei, C., Ben, J., John, M.: A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis, pp. 39–59. Springer, London (2007)
Jonathan, L., Hidayat, T., Miron, L., Matthew, K.W.: High-throughput, kingdom-wide prediction and annotation of bacterial non-coding rnas. PLoS ONE 3(9), 1–12 (2008)
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software 41(1), 23–50 (2011)
Mann, H.B., Whitney, D.R.: On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1), 50–60 (1947)
Hondo, F., Wercelens, P., da Silva, W., Castro, K., Santana, I., Walter, M.E., AraÃ\(^{\circ }\)jo, A., Holanda, M., Lifschitz, S.: Data provenance management for bioinformatics workflows using nosql database systems in a cloud computing environment. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1929–1934 (Nov 2017)
Khawar, H., Kamran, M.: Reproducibility of scientific workflows execution using cloud-aware provenance (recap). Computing (Apr 2018)
Navendu, J., Ishai, M., Ohad, S.: On-demand, spot, or both: dynamic resource allocation for executing batch jobs in the cloud. Technical report (March 2014)
Xiao, Z., Liang, P., Tong, Z., Li, K., Khan, S.U., Li, K.: Self-adaptation and mutual adaptation for distributed scheduling in benevolent clouds. Concurr. Comput. 29(5), 1–12 (2017)
Duggan, M., Duggan, J., Howley, E., Barrett, E.: A network aware approach for the scheduling of virtual machine migration during peak loads. Clust. Comput. 20(3), 2083–2094 (2017)
Naghmeh, D., Saeed, S.: Learning-based dynamic scalable load-balanced firewall as a service in network function-virtualized cloud computing environments. J. Supercomput. 72(4), 1342–1362 (2018)
Acknowledgements
This research is supported by the ANPCyT Projects No. PICT-2012-2731 and PICT-2014-1430, and by the UNCuyo Project No. SeCTyP-M041. The first author acknowledges her PhD fellowship granted by the CONICET.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Garí, Y., Monge, D.A., Mateos, C. et al. Learning budget assignment policies for autoscaling scientific workflows in the cloud. Cluster Comput 23, 87–105 (2020). https://doi.org/10.1007/s10586-018-02902-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-02902-0