Abstract
Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication. Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation, we compare HEFT-P with three typical workflow scheduling policies. The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive workflows.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. 275, 314–347 (2014)
Armbrust, M., Fox, A., Griffith, R.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)
Gong, X., Jin, C.Q., Wang, X.L.: Data-intensive science and engineer: requirements and challenges. J. Comput. Sci. 35(8), 1563–1578 (2012)
Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Xiao, P., Hu, Z.G., Qu, X.L.: Energy-aware scheduling policy for data-intensive workflow. J. Commun. 36(1), 17-2015017 (2015)
Ahmad, S.G., Liew, C.S., Munir, E.U.: A hybrid genetic algorithm for optimization of scheduling workflow applications in heterogeneous computing systems. Journal of Parallel and Distributed Computing 87, 80–90 (2016)
Hu, M., Luo, J., Wang, Y.: Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans. Comput. 66(1), 17–23 (2017)
Shin, K.S., Cha, M.J., Jang, M.S.: Task scheduling algorithm using minimized duplications in homogeneous systems. J. Parallel Distrib. Comput. 68(8), 1146–1156 (2008)
Catalyurek, U.V., Boman, E.G., Devine, K.D.: Hypergraph-based dynamic load balancing for adaptive scientific computations. In: IEEE International Parallel and Distributed Processing Symposium, pp. 1–11. IEEE (2012)
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2010)
Zhao, H., Liu, X.: Hypergraph-based task-bundle scheduling towards efficiency and fairness in heterogeneous distributed systems. In: Parallel & Distributed Processing, pp. 1–12. IEEE (2010)
Çatalyürek, Ü., Aykanat, C.: PaToH (partitioning tool for hypergraphs). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1479–1487. Springer, New York (2011)
Çatalyürek, Ü.V., Deveci, M., Kaya, K.: Multithreaded clustering for multi-level hypergraph partitioning. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 848–859. IEEE (2012)
Biswal, P., Lee, J.R., Rao, S.: Eigenvalue bounds, spectral partitioning, and metrical deformations via flows. J. ACM 57(3), 13 (2010)
Devine, K.D., Boman, E.G., Heaphy, R.T.: Parallel hypergraph partitioning for scientific computing. In: Proceedings of 20th IEEE International Parallel & Distributed Processing Symposium, p. 10. IEEE (2010)
Selvitopi, O., Acer, S., Aykanat, C.: A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously. IEEE Trans. Parallel Distrib. Syst. 28(2), 345–358 (2016)
Sun Xuedong, F., Xu Xiaofei, S., Wang Gang, T.: Directed hypergraph based and re-source constrained enterprise process structure optimization. J. Softw. 17(1), 59–67 (2006)
Laura, L., Nanni, U., Temperini, M.: The organization of large-scale repositories of learning objects with directed hypergraphs. In: Cao, Y., Väljataga, T., Tang, J., Leung, H., Laanpere, M. (eds.) ICWL 2014. LNCS, vol. 8699, pp. 23–33. Springer, Cham (2014). doi:10.1007/978-3-319-13296-9_3
Lengauer, T.: Combinatorial Algorithms for Integrated Circuit Layout. Springer Science & Business Media, Heidelberg (2012)
Çatalyürek, Ü.V., Aykanat, C., Uçar, B.: On two-dimensional sparse matrix partitioning: models, methods, and a recipe. SIAM J. Sci. Comput. 32(2), 656–683 (2010)
Ümit, V.Ç., Mehmet, D., Kamer, K.: Multithreaded clustering for multi-level hypergraph partitioning. In: 6th IEEE International Parallel and Distributed Processing Symposium, IPDPS, Shanghai, China, pp. 848–859 (2012)
Pegasus Team: Workflow Generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator
Acknowledgements
The authors warmly thank the reviewers for their insightful comments which helped to improve this work. This work was supported in part by National Natural Science Foundation of China (NSFC), project 61602525 and project 61572525.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, Z. et al. (2017). Hypergraph-Based Data Reduced Scheduling Policy for Data-Intensive Workflow in Clouds. In: Zou, B., Han, Q., Sun, G., Jing, W., Peng, X., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-10-6388-6_28
Download citation
DOI: https://doi.org/10.1007/978-981-10-6388-6_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6387-9
Online ISBN: 978-981-10-6388-6
eBook Packages: Computer ScienceComputer Science (R0)