Abstract
Cloud bursting is an application deployment model wherein additional computing resources are provisioned from public clouds in cases where local resources are not sufficient, e.g. during peak demand periods. We propose and experimentally evaluate a cloud-bursting solution for scientific workflows. Our solution is portable thanks to using Kubernetes for deployment of the workflow management system and computing clusters in multiple clouds. We also introduce transparent data access by employing a virtual distributed file system across the clouds, allowing jobs to use a POSIX file system interface, while hiding data transfer between clouds. To balance load distribution and minimize the communication volume between clouds, we leverage graph partitioning, while ensuring that the algorithm distributes the load equally at each parallel execution stage of a workflow. The solution is experimentally evaluated using the HyperFlow workflow management system integrated with the Onedata data management platform, deployed in our on-premise cloud in Cyfronet AGH and in the Google Cloud.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Afgan, E., Coraor, N., Chilton, J., Baker, D., Taylor, J., Team, G.: Enabling cloud bursting for life sciences within galaxy. Concurrency Comput. Pract. Experience 27(16), 4330–4343 (2015)
Balis, B.: Hyperflow: a model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst. 55, 147–162 (2016)
Balis, B., Figiela, K., Jopek, K., Malawski, M., Pawlik, M.: Porting HPC applications to the cloud: a multi-frontal solver case study. J. Comput. Sci. 18, 106–116 (2017)
Belgacem, M.B., Chopard, B.: A hybrid HPC/cloud distributed infrastructure: coupling EC2 cloud resources with HPC clusters to run large tightly coupled multiscale applications. Future Gener. Comput. Syst. 42, 11–21 (2015)
Bicer, T., Chiu, D., Agrawal, G.: A framework for data-intensive computing with cloud bursting. In: 2011 IEEE International Conference on Cluster Computing, pp. 169–177. IEEE (2011)
Chang, Y.S., Fan, C.T., Sheu, R.K., Jhu, S.R., Yuan, S.M.: An agent-based workflow scheduling mechanism with deadline constraint on hybrid cloud environment. Int. J. Commun. Syst. 31(1), e3401 (2018)
Da Silva, R.F., Chen, W., Juve, G., Vahi, K., Deelman, E.: Community resources for enabling research in distributed scientific workflows. In: 2014 IEEE 10th International Conference on e-Science, vol. 1, pp. 177–184. IEEE (2014)
Dutka, Ł., et al.: Onedata - a step forward towards globalization of data access for computing infrastructures. Procedia Comput. Sci. 51, 2843–2847 (2015). International Conference On Computational Science, ICCS 2015
Goonasekera, N., Mahmoud, A., Chilton, J., Afgan, E.: Galaxycloudrunner: enhancing scalable computing for galaxy. BioRxiv (2020)
Guo, T., Sharma, U., Shenoy, P., Wood, T., Sahu, S.: Cost-aware cloud bursting for enterprise applications. ACM Trans. Internet Technol. (TOIT) 13(3), 1–24 (2014)
Hazekamp, N., et al.: Combining static and dynamic storage management for data intensive scientific workflows. IEEE Trans. Parallel Distrib. Syst. 29(2), 338–350 (2017)
Ilyushkin, A., Ghit, B., Epema, D.: Scheduling workloads of workflows with unknown task runtimes. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 606–616. IEEE (2015)
Lin, B., Guo, W., Lin, X.: Online optimization scheduling for scientific workflows with deadline constraint on hybrid clouds. Concurrency Comput. Pract. Experience 28(11), 3079–3095 (2016)
Liu, Y., et al.: PGen: large-scale genomic variations analysis workflow and browser in SoyKB. In: BMC Bioinformatics, BioMed Central, vol. 17, p. 337 (2016)
Liu, Z., et al.: A data placement strategy for scientific workflow in hybrid cloud. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 556–563. IEEE (2018)
Marathe, A., et al.: A comparative study of high-performance computing on the cloud. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 239–250 (2013)
Mell, P., Grance, T.: The NIST definition of cloud computing (2011)
Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Bourgeois, A.G., Zheng, S.Q. (eds.) ICA3PP 2008. LNCS, vol. 5022, pp. 42–53. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69501-1_6
Netto, M.A., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L., Buyya, R.: HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. (CSUR) 51(1), 1–29 (2018)
Orzechowski, M., Balis, B., Pawlik, K., Pawlik, M., Malawski, M.: Transparent deployment of scientific workflows across clouds-kubernetes approach. In: 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 9–10. IEEE (2018)
Parashar, M., AbdelBaky, M., Rodero, I., Devarakonda, A.: Cloud paradigms and practices for computational and data-enabled science and engineering. Comput. Sci. Eng. 15(4), 10–18 (2013)
Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2012), pp. 65–72. IEEE (2012)
Tchernykh, A., Schwiegelsohn, U., Alexandrov, V., Talbi, E.: Towards understanding uncertainty in cloud computing resource provisioning. Procedia Comput. Sci. 51, 1772–1781 (2015)
Wu, H., et al.: Automatic cloud bursting under fermicloud. In: 2013 International Conference on Parallel and Distributed Systems, pp. 681–686. IEEE (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Baliś, B., Orzechowski, M., Dutka, Ł., Słota, R.G., Kitowski, J. (2021). Scientific Workflow Management on Hybrid Clouds with Cloud Bursting and Transparent Data Access. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-77961-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)