Skip to main content

Advertisement

Log in

Learning budget assignment policies for autoscaling scientific workflows in the cloud

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Autoscalers exploit cloud-computing elasticity to cope with the dynamic computational demands of scientific workflows. Autoscalers constantly acquire or terminate virtual machines (VMs) on-the-fly to execute workflows minimizing makespan and economic cost. One key problem of workflow autoscaling under budget constraints (i.e. with a maximum limit in cost) is determining the right proportion between: (a) expensive but reliable VMs called on-demand instances, and (b) cheaper but subject-to-failure VMs called spot instances. Spot instances can potentially provide huge parallelism possibilities at low costs but they must be used wisely as they can fail unexpectedly hindering makespan. Given the unpredictability of failures and the inherent performance variability of clouds, designing a policy for assigning the budget for each kind of instance is not a trivial task. For such reason we formalize the described problem as a Markov decision process that allows us learning near-optimal policies from the experience of other baseline policies. Experiments over four well-known scientific workflows, demonstrate that learned policies outperform the baseline policies considering the aggregated relative percentage difference of makespan and execution cost. These promising results encourage the future study of new strategies aiming to find optimal budget policies applied to the execution of workflows in the cloud.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

  1. Note that knowing the exact task running-times and the future spot prices progression would permit the design of an optimal policy able to minimize makespan and cost. However, under real cloud execution conditions such assumptions are not realistic and therefore they are not considered in this work.

  2. Southern California Earthquake Center: http://www.scec.org

  3. IPAC: http://www.ipac.caltech.edu/

  4. NASA: http://www.nasa.gov/

  5. LIGO:http://www.ligo.caltech.edu/

  6. NCBI:https://www.ncbi.nlm.nih.gov/

  7. http://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator

References

  1. Buyya, R., Yeo, C., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gen. Comput. Syst. 25(6), 599–616 (2009)

    Article  Google Scholar 

  2. Mao, M., Humphrey, M.: Scaling and scheduling to maximize application performance within budget constraints in cloud workflows. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013, pp. 67–78. IEEE (2013)

  3. Monge, D.A., Garí, Y., Mateos, C., Garino, C.G.: Autoscaling scientific workflows on the cloud by combining on-demand and spot instances. Int. J. Comput. Syst. Sci. Eng. 32(4), 291–306 (2017). (Special Issue on Elastic Data Management in Cloud Systems)

    Google Scholar 

  4. Expósito, R.R., Taboada, G.L., Ramos, S., Touriño, J., Doallo, R.: Performance analysis of HPC applications in the cloud. Future Gen. Comput. Syst. 29(1), 218–229 (2013)

    Article  Google Scholar 

  5. Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: Deconstructing amazon EC2 spot instance pricing. ACM Trans. Econ. Comput. 1(3), 16:1–16:20 (2013)

    Article  Google Scholar 

  6. Ghobaei-Arani, M., Jabbehdari, S., Pourmina, M.A.: An autonomic resource provisioning approach for service-based cloud applications: a hybrid approach. Future Gen. Comput. Syst. 78, 191–210 (2018)

    Article  Google Scholar 

  7. Benifa, J.V.B., Dejey, D.: RLPAS: reinforcement learning-based proactive auto-scaler for resource provisioning in cloud environment. Mob. Netw. Appl. 2018, 1–16 (2018)

  8. Soualhia, M., Khomh, F., Tahar, S.: A dynamic and failure-aware task scheduling framework for hadoop. IEEE Trans. Cloud Comput. (early access) 1–16 (2018)

  9. Garí, Y., Monge, D.A., Mateos, C., Garino, C.G.: Markov Decision Process to Dynamically Adapt Spots Instances Ratio on the Autoscaling of Scientific Workflows in the Cloud, pp. 353–369. Springer, Cham (2018)

    Google Scholar 

  10. Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Future Gen. Comput. Syst. 29(3), 682–692 (2013)

    Article  Google Scholar 

  11. Monge, D.A., Holec, M., Železnỳ, F., Garino, C.G.: Ensemble learning of runtime prediction models for gene-expression analysis workflows. Clust. Comput. 18(4), 1317–1329 (2015)

    Article  Google Scholar 

  12. Turchenko, V., Shultz, V., Turchenko, I., Wallace, R.M., Sheikhalishahi, M., Vazquez-Poletti, J.L., Lucio, G.: Spot price prediction for cloud computing using neural networks. Int. J. Comput. 12(4), 348–359 (2013)

    Google Scholar 

  13. Shaojie, T., Jing, Y., Li, X.Y.: Towards optimal bidding strategy for Amazon EC2 cloud spot instance. In: Proceedings 2012 IEEE 5th International Conference on Cloud Computing, CLOUD 2012, pp. 91–98 (2012)

  14. Bellman, R.: Dynamic Programming. Princeton University Press, New Jersey (1957)

    MATH  Google Scholar 

  15. Van Otterlo, M.: The Logic of Adaptive Behavior. In: Frontiers in Artificial Intelligence and Applications, vol. 192. IOS Press, Amsterdam (2009)

  16. Enda, B., Enda, H., Jim, D.: A learning architecture for scheduling workflow applications in the cloud. Proceedings 9th IEEE European Conference on Web Services, ECOWS 2011, pp. 83–90 (2011)

  17. Enda, B., Enda, H., Jim, D.: Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurr. Comput. Pract. Exp. 24(13), 1397–1420 (2012)

    Article  Google Scholar 

  18. Philip, M., Ewa, D., Li, Z., Robert, G., Gaurang, M., Nitin, G., John, M., Carl, K., Scott, C., David, O., Hunter, F., Vipin, G., Yifeng, C., Karan, V., Thomas, J., Edward, F.: SCEC CyberShake workflows-automating probabilistic seismic hazard analysis calculations. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science: Scientific Workflows for Grids, pp. 143–163. Springer, London (2007)

    Google Scholar 

  19. Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Mei-Hui, S., Vahi, K.: Characterization of scientific workflows. In Third Workshop on Workflows in Support of Large-Scale Science, 2008. WORKS 2008, pp. 1–10 (2008)

  20. Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Progr. 13(3), 219–237 (2005)

    Google Scholar 

  21. Duncan, A.B., Patrick, R.B., Alexander, D., Junwei, C., Ben, J., John, M.: A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis, pp. 39–59. Springer, London (2007)

  22. Jonathan, L., Hidayat, T., Miron, L., Matthew, K.W.: High-throughput, kingdom-wide prediction and annotation of bacterial non-coding rnas. PLoS ONE 3(9), 1–12 (2008)

    Google Scholar 

  23. Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software 41(1), 23–50 (2011)

    Google Scholar 

  24. Mann, H.B., Whitney, D.R.: On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18(1), 50–60 (1947)

    Article  MathSciNet  Google Scholar 

  25. Hondo, F., Wercelens, P., da Silva, W., Castro, K., Santana, I., Walter, M.E., AraÃ\(^{\circ }\)jo, A., Holanda, M., Lifschitz, S.: Data provenance management for bioinformatics workflows using nosql database systems in a cloud computing environment. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1929–1934 (Nov 2017)

  26. Khawar, H., Kamran, M.: Reproducibility of scientific workflows execution using cloud-aware provenance (recap). Computing (Apr 2018)

  27. Navendu, J., Ishai, M., Ohad, S.: On-demand, spot, or both: dynamic resource allocation for executing batch jobs in the cloud. Technical report (March 2014)

  28. Xiao, Z., Liang, P., Tong, Z., Li, K., Khan, S.U., Li, K.: Self-adaptation and mutual adaptation for distributed scheduling in benevolent clouds. Concurr. Comput. 29(5), 1–12 (2017)

    Article  Google Scholar 

  29. Duggan, M., Duggan, J., Howley, E., Barrett, E.: A network aware approach for the scheduling of virtual machine migration during peak loads. Clust. Comput. 20(3), 2083–2094 (2017)

    Article  Google Scholar 

  30. Naghmeh, D., Saeed, S.: Learning-based dynamic scalable load-balanced firewall as a service in network function-virtualized cloud computing environments. J. Supercomput. 72(4), 1342–1362 (2018)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the ANPCyT Projects No. PICT-2012-2731 and PICT-2014-1430, and by the UNCuyo Project No. SeCTyP-M041. The first author acknowledges her PhD fellowship granted by the CONICET.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Monge.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garí, Y., Monge, D.A., Mateos, C. et al. Learning budget assignment policies for autoscaling scientific workflows in the cloud. Cluster Comput 23, 87–105 (2020). https://doi.org/10.1007/s10586-018-02902-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-02902-0

Keywords