Abstract
Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.
Similar content being viewed by others
References
Bittencourt, L.F., Madeira, E.R.M.: HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds. J. Internet Serv. Appl. 2, 207–227 (2011)
Ahn, Y., Kim, Y.: Auto-scaling of virtual resources for scientific workflows on hybrid clouds. In: ScienceCloud ’14 Proceedings of the 5th ACM Workshop on Scientific Cloud Computing (pp. 47–52). (2014)
Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p. 49). ACM (2011)
Choi, J., Ahn, Y., Kim, S., Kim, Y., Choi, J.: VM auto-scaling methods for high throughput computing on hybrid infrastructure. J. Clust. Comput. 18, 1063–1073 (2015)
OpenStack, http://www.OpenStack.org
Bu, X., Rao, J., Xu, C.-Z.: Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: HPDC ’13 Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing (pp. 227–238). (2013)
Phan, L.T., Zhang, Z., Zheng, Q., Loo, B.T., Lee, I.: An empirical analysis of scheduling techniques for real-time cloud-based data processing. In: SOCA ’11 Proceedings of the 2011 IEEE International Conference on Service-Oriented Computing and Applications (2011)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI’08 Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (pp. 29–42). (2008)
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: EuroSys ’10 Proceedings of the 5th European Conference on Computer Systems (pp. 265–278). (2010)
Wang, K., Qiao, K., Sadooghi, I., Zhou, X., Li, T., Lang, M., Raicu, I.: Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales. J. Concurr. Comput. Pract. Exp. (CCPE) 28, 70–94 (2015)
Thaha, A.F., Singh, M., Amin, A.H.M., Ahmad, N.M., Kannan, S.: Hadoop in OpenStack: Data-location-aware cluster provisioning. In: Information and Communication Technologies (WICT), 2014 Fourth World Congress on (pp. 296–301). (2014)
Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14, 359–378 (2015)
Flavors, http://docs.openstack.org/openstack-ops/content/flavors.html
Overcommiting on compute nodes, http://docs.openstack.org/openstack-ops/content/compute_nodes.html
Montage, http://montage.ipac.caltech.edu/
Peek, J.E.G., et al.: The GALFA-HI survey: data release 1. Astrophys. J. Suppl. 194(2), 20 (2011)
Acknowledgments
This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning (2015M3C4A7065646)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choi, J., Adufu, T. & Kim, Y. Data-Locality Aware Scientific Workflow Scheduling Methods in HPC Cloud Environments. Int J Parallel Prog 45, 1128–1141 (2017). https://doi.org/10.1007/s10766-016-0463-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0463-0