Abstract
Scientific workflows have become a standardized way for scientists to represent a set of tasks to overcome/solve a certain scientific problem. Usually these workflows consist of numerous CPU and I/O-intensive jobs that are executed using workflow management systems (WfMS), on clouds, grids, supercomputers, etc. Previously, it was shown that using k-way partitioning to distribute a workflow’s tasks between multiple machines in the cloud reduces the overall data communication and therefore lowers the cost of the bandwidth usage. A framework was built to automate this process of partitioning and execution of any workflow submitted by a scientist that is meant to be run on Pegasus WfMS, in the cloud, with ease. The framework provisions the instances in the cloud using CloudML, configures and installs all the software needed for the execution, partitions and runs the provided scientific workflow, also showing the estimated makespan and cost.
Similar content being viewed by others
References
Agarwal R, Juve G, Deelman E (2012) Peer-to-peer data sharing for scientific workflows on amazon ec2. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 82–89). IEEE
Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S (2004) Kepler: an extensible system for design and execution of scientific workflows. In: Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on (pp 423–424). IEEE
Amazon: Amazon elastic compute cloud (amazon ec2). http://aws.amazon.com/ec2/. Visited (06.04.2017)
ANSIBLE. https://www.ansible.com/. Visited (11.04.2017)
Apache JClouds. https://jclouds.apache.org/. Visited (22.04.2017)
Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202. https://doi.org/10.1023/A:1019119117297
Bass L, Weber I, Zhu L (2015) DevOps: a software architect’s perspective. Addison-Wesley Professional
Bhandarkar SM, Arabnia HR (1995) The refine multiprocessor theoretical properties and algorithms. Parallel Comput 21(11):1783–1805. 10.1016/0167-8191(95)00032-9. http://www.sciencedirect.com/science/article/pii/0167819195000329
Bharathi S, Chervenak A, Deelman E, Mehta G, Su M.H, Vahi K (2008) Characterization of scientific workflows. In: Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on (pp 1–10). IEEE
Blumenthal A (2016) How isi’s pegasus helped scientists make the discovery of a century. Accessible: https://viterbi.usc.edu/news/news/2016/isi-gravitational-waves-software-pegasus.htm. Visited (22.04.2014)
Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Algorithm engineering. Springer, pp 117–158
Çatalyürek Ü, Aykanat C (2011) Patoh (partitioning tool for hypergraphs). In: Padua D (ed) Encyclopedia of parallel computing. Springer, New York, pp 1479–1487
Çatalyürek UV, Kaya K, Uçar B (2011) Integrated data placement and task assignment for scientific workflows in clouds. In: Proceedings of the Fourth International Workshop on Data-Intensive Distributed Computing (DIDC ’11) (pp 45–54). ACM. https://doi.org/10.1145/1996014.1996022
CHEF. https://www.chef.io/solutions/cloud-management/. Visited (11.04.2017)
Chirkin AM, Belloum AS, Kovalchuk SV, Makkes MX, Melnik MA, Visheratin AA, Nasonov DA (2017) Execution time estimation for workflow scheduling. Future Gener Comput Syst 75:376–387
Deelman E, Singh G, Livny M, Berriman B, Good J (2008) The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (p 50). IEEE Press
Deelman E, Vahi K, Juve G, Rynge M, Callaghan S, Maechling PJ, Mayani R, Chen W, Ferreira da Silva R, Livny M, Wenger K (2015) Pegasus: a workflow management system for science automation. Future Gener Comput Syst 46:17–35. https://doi.org/10.1016/j.future.2014.10.008
Ferry N, Chauvel F, Rossini A, Morin B, Solberg A (2013) Managing multi-cloud systems with cloudmf. In: Proceedings of the Second Nordic Symposium on Cloud Computing and Internet Technologies (NordiCloud ’13) (pp 38–45). ACM. https://doi.org/10.1145/2513534.2513542
Gil Y, Deelman E, Ellisman M, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. Computer. https://doi.org/10.1109/MC.2007.421
Golab L, Hadjieleftheriou M, Karloff H, Saha B (2014) Distributed data placement to minimize communication costs via graph partitioning. In: Proceedings of the 26th International Conference on Scientific and Statistical Database Management (p 20). ACM
Goncalves G, Endo P, Santos M, Sadok D, Kelner J, Melander B, Mangs JE (2011) Cloudml: an integrated language for resource, service and request description for d-clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 399–406). IEEE
Graves R, Jordan TH, Callaghan S, Deelman E, Field E, Juve G, Kesselman C, Maechling P, Mehta G, Milner K et al (2011) Cybershake: a physics-based seismic hazard model for southern California. Pure Appl Geophys 168(3–4):367–381
Hendrickson B, Leland R (1995) The chaco users guide: Version 2.0. Tech. rep., Technical Report SAND95-2344, Sandia National Laboratories
Hiden H, Woodman S, Watson P (2013) A framework for dynamically generating predictive models of workflow execution. In: Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science (pp 77–87). ACM
Hiden H, Woodman S, Watson P, Cala J (2013) Developing cloud applications using the e-science central platform. Philos Trans R Soc A 371(1983):20120,085
Juve G, Deelman E (2011) Automating application deployment in infrastructure clouds. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on (pp 658–665). IEEE
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Cloud Computing (CLOUD), 2011 IEEE International Conference on (pp 746–747). IEEE
Liu L, Zhang M, Buyya R, Fan Q (2017) Deadline-constrained coevolutionary genetic algorithm for scientific workflow scheduling in cloud computing. Concurr Comput. https://doi.org/10.1002/cpe.3942
Liu Y, Khan SM, Wang J, Rynge M, Zhang Y, Zeng S, Chen S, dos Santos JVM, Valliyodan B, Calyam PP et al (2016) Pgen: large-scale genomic variations analysis workflow and browser in SoyKB. BMC Bioinformatics 17(13):337
Miu T, Missier P (2012) Predicting the execution time of workflow activities based on their input features. In: High performance computing, networking, storage and analysis (SCC), 2012 SC companion (pp 64–72). IEEE
Montage: an astronomical image engine. http://montage.ipae.caltech.edu
Pietri I, Juve G, Deelman E, Sakellariou R (2014) A performance model to estimate execution time of scientific workflows on the cloud. In: Proceedings of the 9th Workshop on Workflows in Support of Large-Scale Science (pp 11–19). IEEE Press. https://doi.org/10.1109/WORKS.2014.12
Poola D, Garg SK, Buyya R, Yang Y, Ramamohanarao K (2014) Robust scheduling of scientific workflows with deadline and budget constraints in clouds. In: Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on (pp 858–865). IEEE
REMICS: reuse and migration of legacy applications to interoperable cloud services. http://www.remics.eu/
Rodriguez MA, Buyya R (2017) Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. Future Gener Comput Syst 79:739–750
SALT. https://docs.saltstack.com/en/latest/topics/cloud/. Visited (11.04.2017)
SINTEF (2017) Cloudml. https://github.com/SINTEF-9012/cloudml
Srirama S, Batrashev O, Vainikko E (2010) Scicloud: scientific computing on the cloud. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (pp 579–580). IEEE Computer Society
Srirama SN, Batrashev O, Jakovits P, Vainikko E (2011) Scalability of parallel scientific applications on the cloud. Sci Program J 19(2–3):91–105. https://doi.org/10.1155/2011/361854
Srirama SN, Iurii T, Viil J (2016) Dynamic deployment and auto-scaling enterprise applications on the heterogeneous cloud. In: Cloud Computing (CLOUD), 2016 IEEE 9th International Conference on (pp 927–932). IEEE
Srirama SN, Ostovar A (2014) Optimal resource provisioning for scaling enterprise applications on the cloud. In: Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on (pp 262–271). IEEE
Srirama SN, Viil J (2014) Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing. In: 16th IEEE International Conference on High Performance Computing and Communications (HPCC 2014) (pp 1105–1112). IEEE
Tanaka M, Tatebe O (2012) Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on (pp 65–72). IEEE
Tannenbaum T, Wright D, Miller K, Livny M (2002) Condor: a distributed job scheduler. In: Sterling TL (ed) Beowulf cluster computing with linux. MIT Press, Cambridge, pp 307–350
Thapliyal H, Arabnia HR, Bajpai R, Sharma KK (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of 2007 International Conference on Parallel & Distributed Processing Techniques & Applications, PDPTA’07, USA, pp 449–450
Topcuoglu H, Hariri S, Wu My (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Viil J (2017) Cloud partitioning tool. https://bitbucket.org/JaagupViil/cloud-partition-tool
Vukojevic-Haupt K, Haupt F, Leymann F, Reinfurt L (2015) Bootstrapping complex workflow middleware systems into the cloud. In: e-Science (e-Science), 2015 IEEE 11th International Conference on (pp 126–135). IEEE
Zhang J, Wang M, Luo J, Dong F, Zhang J (2015) Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment. Concurr Comput 27(18):5606–5622. https://doi.org/10.1002/cpe.3601
Acknowledgements
This research is supported by the Estonian Science Foundation Grants PUT360 and IUT20-55. The authors would also like to thank the anonymous reviewers for their suggestions to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Viil, J., Srirama, S.N. Framework for automated partitioning and execution of scientific workflows in the cloud. J Supercomput 74, 2656–2683 (2018). https://doi.org/10.1007/s11227-018-2296-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2296-7