Skip to main content
Log in

Data-Locality Aware Scientific Workflow Scheduling Methods in HPC Cloud Environments

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Efficient data-aware methods in job scheduling, distributed storage management and data management platforms are necessary for successful execution of data-intensive applications. However, research about methods for data-intensive scientific applications are insufficient in large-scale distributed cloud and cluster computing environments and data-aware methods are becoming more complex. In this paper, we propose a Data-Locality Aware Workflow Scheduling (D-LAWS) technique and a locality-aware resource management method for data-intensive scientific workflows in HPC cloud environments. D-LAWS applies data-locality and data transfer time based on network bandwidth to scientific workflow task scheduling and balances resource utilization and parallelism of tasks at the node-level. Our method consolidates VMs and consider task parallelism by data flow during the planning of task executions of a data-intensive scientific workflow. We additionally consider more complex workflow models and data locality pertaining to the placement and transfer of data prior to task executions. We implement and validate the methods based on fairness in cloud environments. Experimental results show that, the proposed methods can improve performance and data-locality of data-intensive workflows in cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bittencourt, L.F., Madeira, E.R.M.: HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds. J. Internet Serv. Appl. 2, 207–227 (2011)

    Article  Google Scholar 

  2. Ahn, Y., Kim, Y.: Auto-scaling of virtual resources for scientific workflows on hybrid clouds. In: ScienceCloud ’14 Proceedings of the 5th ACM Workshop on Scientific Cloud Computing (pp. 47–52). (2014)

  3. Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (p. 49). ACM (2011)

  4. Choi, J., Ahn, Y., Kim, S., Kim, Y., Choi, J.: VM auto-scaling methods for high throughput computing on hybrid infrastructure. J. Clust. Comput. 18, 1063–1073 (2015)

    Article  Google Scholar 

  5. OpenStack, http://www.OpenStack.org

  6. Bu, X., Rao, J., Xu, C.-Z.: Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: HPDC ’13 Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing (pp. 227–238). (2013)

  7. Phan, L.T., Zhang, Z., Zheng, Q., Loo, B.T., Lee, I.: An empirical analysis of scheduling techniques for real-time cloud-based data processing. In: SOCA ’11 Proceedings of the 2011 IEEE International Conference on Service-Oriented Computing and Applications (2011)

  8. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI’08 Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (pp. 29–42). (2008)

  9. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: EuroSys ’10 Proceedings of the 5th European Conference on Computer Systems (pp. 265–278). (2010)

  10. Wang, K., Qiao, K., Sadooghi, I., Zhou, X., Li, T., Lang, M., Raicu, I.: Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales. J. Concurr. Comput. Pract. Exp. (CCPE) 28, 70–94 (2015)

    Article  Google Scholar 

  11. Thaha, A.F., Singh, M., Amin, A.H.M., Ahmad, N.M., Kannan, S.: Hadoop in OpenStack: Data-location-aware cluster provisioning. In: Information and Communication Technologies (WICT), 2014 Fourth World Congress on (pp. 296–301). (2014)

  12. Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14, 359–378 (2015)

    Article  Google Scholar 

  13. Flavors, http://docs.openstack.org/openstack-ops/content/flavors.html

  14. Overcommiting on compute nodes, http://docs.openstack.org/openstack-ops/content/compute_nodes.html

  15. Montage, http://montage.ipac.caltech.edu/

  16. Peek, J.E.G., et al.: The GALFA-HI survey: data release 1. Astrophys. J. Suppl. 194(2), 20 (2011)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT and Future Planning (2015M3C4A7065646)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoonhee Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, J., Adufu, T. & Kim, Y. Data-Locality Aware Scientific Workflow Scheduling Methods in HPC Cloud Environments. Int J Parallel Prog 45, 1128–1141 (2017). https://doi.org/10.1007/s10766-016-0463-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0463-0

Keywords

Navigation