Learning budget assignment policies for autoscaling scientific workflows in the cloud

Garí, Yisel; Monge, David A.; Mateos, Cristian; García Garino, Carlos

doi:10.1007/s10586-018-02902-0

Learning budget assignment policies for autoscaling scientific workflows in the cloud

Published: 09 February 2019

Volume 23, pages 87–105, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Yisel Garí¹,
David A. Monge ORCID: orcid.org/0000-0001-6444-4610²,
Cristian Mateos³ &
…
Carlos García Garino²

378 Accesses
Explore all metrics

Abstract

Autoscalers exploit cloud-computing elasticity to cope with the dynamic computational demands of scientific workflows. Autoscalers constantly acquire or terminate virtual machines (VMs) on-the-fly to execute workflows minimizing makespan and economic cost. One key problem of workflow autoscaling under budget constraints (i.e. with a maximum limit in cost) is determining the right proportion between: (a) expensive but reliable VMs called on-demand instances, and (b) cheaper but subject-to-failure VMs called spot instances. Spot instances can potentially provide huge parallelism possibilities at low costs but they must be used wisely as they can fail unexpectedly hindering makespan. Given the unpredictability of failures and the inherent performance variability of clouds, designing a policy for assigning the budget for each kind of instance is not a trivial task. For such reason we formalize the described problem as a Markov decision process that allows us learning near-optimal policies from the experience of other baseline policies. Experiments over four well-known scientific workflows, demonstrate that learned policies outperform the baseline policies considering the aggregated relative percentage difference of makespan and execution cost. These promising results encourage the future study of new strategies aiming to find optimal budget policies applied to the execution of workflows in the cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud

A Novel Approach to Scheduling Workflows Upon Cloud Resources with Fluctuating Performance

Article 16 January 2020

Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints

Article 07 January 2016

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

Note that knowing the exact task running-times and the future spot prices progression would permit the design of an optimal policy able to minimize makespan and cost. However, under real cloud execution conditions such assumptions are not realistic and therefore they are not considered in this work.
Southern California Earthquake Center: http://www.scec.org
IPAC: http://www.ipac.caltech.edu/
NASA: http://www.nasa.gov/
LIGO:http://www.ligo.caltech.edu/
NCBI:https://www.ncbi.nlm.nih.gov/
http://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator

References

Buyya, R., Yeo, C., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gen. Comput. Syst. 25(6), 599–616 (2009)
Article Google Scholar
Mao, M., Humphrey, M.: Scaling and scheduling to maximize application performance within budget constraints in cloud workflows. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pp. 67–78. IEEE (2013)
Monge, D.A., Garí, Y., Mateos, C., Garino, C.G.: Autoscaling scientific workflows on the cloud by combining on-demand and spot instances. Int. J. Comput. Syst. Sci. Eng. 32(4), 291–306 (2017). (Special Issue on Elastic Data Management in Cloud Systems)
Google Scholar
Expósito, R.R., Taboada, G.L., Ramos, S., Touriño, J., Doallo, R.: Performance analysis of HPC applications in the cloud. Future Gen. Comput. Syst. 29(1), 218–229 (2013)
Article Google Scholar
Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: Deconstructing amazon EC2 spot instance pricing. ACM Trans. Econ. Comput. 1(3), 16:1–16:20 (2013)
Article Google Scholar
Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gen. Comput. Syst. 78, 191–210 (2018)
Article Google Scholar
Benifa, J.V.B., Dejey, D.: RLPAS: reinforcement learning-based proactive auto-scaler for resource provisioning in cloud environment. Mob. Netw. Appl. 2018, 1–16 (2018)
Soualhia, M., Khomh, F., Tahar, S.: A dynamic and failure-aware task scheduling framework for hadoop. IEEE Trans. Cloud Comput. (early access) 1–16 (2018)
Garí, Y., Monge, D.A., Mateos, C., Garino, C.G.: Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud, pp. 353–369. Springer, Cham (2018)
Google Scholar
Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gen. Comput. Syst. 29(3), 682–692 (2013)
Article Google Scholar
Monge, D.A., Holec, M., Železnỳ, F., Garino, C.G.: Ensemble learning of runtime prediction models for gene-expression analysis workflows. Clust. Comput. 18(4), 1317–1329 (2015)
Article Google Scholar
Turchenko, V., Shultz, V., Turchenko, I., Wallace, R.M., Sheikhalishahi, M., Vazquez-Poletti, J.L., Lucio, G.: Spot price prediction for cloud computing using neural networks. Int. J. Comput. 12(4), 348–359 (2013)
Google Scholar
Shaojie, T., Jing, Y., Li, X.Y.: Towards optimal bidding strategy for Amazon EC2 cloud spot instance. In: Proceedings 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012, pp. 91–98 (2012)
Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)
MATH Google Scholar
Van Otterlo, M.: The Logic of Adaptive Behavior. In: Frontiers in Artificial Intelligence and Applications, vol. 192. IOS Press, Amsterdam (2009)
Enda, B., Enda, H., Jim, D.: A learning architecture for scheduling workflow applications in the cloud. Proceedings 9th IEEE European Conference on Web Services, ECOWS 2011, pp. 83–90 (2011)
Enda, B., Enda, H., Jim, D.: Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurr. Comput. Pract. Exp. 24(13), 1397–1420 (2012)
Article Google Scholar
Philip, M., Ewa, D., Li, Z., Robert, G., Gaurang, M., Nitin, G., John, M., Carl, K., Scott, C., David, O., Hunter, F., Vipin, G., Yifeng, C., Karan, V., Thomas, J., Edward, F.: SCEC CyberShake workflows-automating probabilistic seismic hazard analysis calculations. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science: Scientific Workflows for Grids, pp. 143–163. Springer, London (2007)
Google Scholar
Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Mei-Hui, S., Vahi, K.: Characterization of scientific workflows. In Third Workshop on Workflows in Support of Large-Scale Science, 2008. WORKS 2008, pp. 1–10 (2008)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Progr. 13(3), 219–237 (2005)
Google Scholar
Duncan, A.B., Patrick, R.B., Alexander, D., Junwei, C., Ben, J., John, M.: A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis, pp. 39–59. Springer, London (2007)
Jonathan, L., Hidayat, T., Miron, L., Matthew, K.W.: High-throughput, kingdom-wide prediction and annotation of bacterial non-coding rnas. PLoS ONE 3(9), 1–12 (2008)
Google Scholar
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software 41(1), 23–50 (2011)
Google Scholar
Mann, H.B., Whitney, D.R.: On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1), 50–60 (1947)
Article MathSciNet Google Scholar
Hondo, F., Wercelens, P., da Silva, W., Castro, K., Santana, I., Walter, M.E., AraÃ$^{\circ }$jo, A., Holanda, M., Lifschitz, S.: Data provenance management for bioinformatics workflows using nosql database systems in a cloud computing environment. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1929–1934 (Nov 2017)
Khawar, H., Kamran, M.: Reproducibility of scientific workflows execution using cloud-aware provenance (recap). Computing (Apr 2018)
Navendu, J., Ishai, M., Ohad, S.: On-demand, spot, or both: dynamic resource allocation for executing batch jobs in the cloud. Technical report (March 2014)
Xiao, Z., Liang, P., Tong, Z., Li, K., Khan, S.U., Li, K.: Self-adaptation and mutual adaptation for distributed scheduling in benevolent clouds. Concurr. Comput. 29(5), 1–12 (2017)
Article Google Scholar
Duggan, M., Duggan, J., Howley, E., Barrett, E.: A network aware approach for the scheduling of virtual machine migration during peak loads. Clust. Comput. 20(3), 2083–2094 (2017)
Article Google Scholar
Naghmeh, D., Saeed, S.: Learning-based dynamic scalable load-balanced firewall as a service in network function-virtualized cloud computing environments. J. Supercomput. 72(4), 1342–1362 (2018)
Google Scholar

Download references

Acknowledgements

This research is supported by the ANPCyT Projects No. PICT-2012-2731 and PICT-2014-1430, and by the UNCuyo Project No. SeCTyP-M041. The first author acknowledges her PhD fellowship granted by the CONICET.

Author information

Authors and Affiliations

ITIC-CONICET, Universidad Nacional de Cuyo (UNCuyo), Mendoza, Argentina
Yisel Garí
ITIC, Universidad Nacional de Cuyo (UNCuyo), Mendoza, Argentina
David A. Monge & Carlos García Garino
ISISTAN-CONICET, UNICEN, Tandil, Buenos Aires, Argentina
Cristian Mateos

Authors

Yisel Garí
View author publications
You can also search for this author inPubMed Google Scholar
David A. Monge
View author publications
You can also search for this author inPubMed Google Scholar
Cristian Mateos
View author publications
You can also search for this author inPubMed Google Scholar
Carlos García Garino
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to David A. Monge.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garí, Y., Monge, D.A., Mateos, C. et al. Learning budget assignment policies for autoscaling scientific workflows in the cloud. Cluster Comput 23, 87–105 (2020). https://doi.org/10.1007/s10586-018-02902-0

Download citation

Received: 22 March 2018
Revised: 13 July 2018
Accepted: 24 December 2018
Published: 09 February 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10586-018-02902-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning budget assignment policies for autoscaling scientific workflows in the cloud

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud

A Novel Approach to Scheduling Workflows Upon Cloud Resources with Fluctuating Performance

Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now