Skip to main content

Hypergraph-Based Data Reduced Scheduling Policy for Data-Intensive Workflow in Clouds

  • Conference paper
  • First Online:
Data Science (ICPCSEE 2017)

Abstract

Data-intensive computing is expected to be the next-generation IT computing paradigm. Data-intensive workflows in clouds are becoming more and more popular. How to schedule data-intensive workflow efficiently has become the key issue. In this paper, first, we build a directed hypergraph model for data-intensive workflow, since Hypergraphs can more accurately model communication volume and better represent asymmetric problems, and the cut metric of hypergraphs is well suited for minimizing the total volume of communication. Second, we propose a concept data supportive ability to help the presentation of data-intensive workflow application and provide the merge operation details considering the data supportive ability. Third, we present an optimized hypergraph multi-level partitioning algorithm. Finally we bring a data reduced scheduling policy HEFT-P for data-intensive workflow. Through simulation, we compare HEFT-P with three typical workflow scheduling policies. The results indicate that HEFT-P could obtain reduced data scheduling and reduce the makespan of executing data-intensive workflows.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  2. Armbrust, M., Fox, A., Griffith, R.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  3. Gong, X., Jin, C.Q., Wang, X.L.: Data-intensive science and engineer: requirements and challenges. J. Comput. Sci. 35(8), 1563–1578 (2012)

    Google Scholar 

  4. Topcuoglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  5. Xiao, P., Hu, Z.G., Qu, X.L.: Energy-aware scheduling policy for data-intensive workflow. J. Commun. 36(1), 17-2015017 (2015)

    Google Scholar 

  6. Ahmad, S.G., Liew, C.S., Munir, E.U.: A hybrid genetic algorithm for optimization of scheduling workflow applications in heterogeneous computing systems. Journal of Parallel and Distributed Computing 87, 80–90 (2016)

    Article  Google Scholar 

  7. Hu, M., Luo, J., Wang, Y.: Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans. Comput. 66(1), 17–23 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Shin, K.S., Cha, M.J., Jang, M.S.: Task scheduling algorithm using minimized duplications in homogeneous systems. J. Parallel Distrib. Comput. 68(8), 1146–1156 (2008)

    Article  MATH  Google Scholar 

  9. Catalyurek, U.V., Boman, E.G., Devine, K.D.: Hypergraph-based dynamic load balancing for adaptive scientific computations. In: IEEE International Parallel and Distributed Processing Symposium, pp. 1–11. IEEE (2012)

    Google Scholar 

  10. Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2010)

    Google Scholar 

  11. Zhao, H., Liu, X.: Hypergraph-based task-bundle scheduling towards efficiency and fairness in heterogeneous distributed systems. In: Parallel & Distributed Processing, pp. 1–12. IEEE (2010)

    Google Scholar 

  12. Çatalyürek, Ü., Aykanat, C.: PaToH (partitioning tool for hypergraphs). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1479–1487. Springer, New York (2011)

    Google Scholar 

  13. Çatalyürek, Ü.V., Deveci, M., Kaya, K.: Multithreaded clustering for multi-level hypergraph partitioning. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 848–859. IEEE (2012)

    Google Scholar 

  14. Biswal, P., Lee, J.R., Rao, S.: Eigenvalue bounds, spectral partitioning, and metrical deformations via flows. J. ACM 57(3), 13 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Devine, K.D., Boman, E.G., Heaphy, R.T.: Parallel hypergraph partitioning for scientific computing. In: Proceedings of 20th IEEE International Parallel & Distributed Processing Symposium, p. 10. IEEE (2010)

    Google Scholar 

  16. Selvitopi, O., Acer, S., Aykanat, C.: A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously. IEEE Trans. Parallel Distrib. Syst. 28(2), 345–358 (2016)

    Google Scholar 

  17. Sun Xuedong, F., Xu Xiaofei, S., Wang Gang, T.: Directed hypergraph based and re-source constrained enterprise process structure optimization. J. Softw. 17(1), 59–67 (2006)

    Article  MATH  Google Scholar 

  18. Laura, L., Nanni, U., Temperini, M.: The organization of large-scale repositories of learning objects with directed hypergraphs. In: Cao, Y., Väljataga, T., Tang, J., Leung, H., Laanpere, M. (eds.) ICWL 2014. LNCS, vol. 8699, pp. 23–33. Springer, Cham (2014). doi:10.1007/978-3-319-13296-9_3

    Google Scholar 

  19. Lengauer, T.: Combinatorial Algorithms for Integrated Circuit Layout. Springer Science & Business Media, Heidelberg (2012)

    MATH  Google Scholar 

  20. Çatalyürek, Ü.V., Aykanat, C., Uçar, B.: On two-dimensional sparse matrix partitioning: models, methods, and a recipe. SIAM J. Sci. Comput. 32(2), 656–683 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Ümit, V.Ç., Mehmet, D., Kamer, K.: Multithreaded clustering for multi-level hypergraph partitioning. In: 6th IEEE International Parallel and Distributed Processing Symposium, IPDPS, Shanghai, China, pp. 848–859 (2012)

    Google Scholar 

  22. Pegasus Team: Workflow Generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator

Download references

Acknowledgements

The authors warmly thank the reviewers for their insightful comments which helped to improve this work. This work was supported in part by National Natural Science Foundation of China (NSFC), project 61602525 and project 61572525.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meiguang Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Hu, Z. et al. (2017). Hypergraph-Based Data Reduced Scheduling Policy for Data-Intensive Workflow in Clouds. In: Zou, B., Han, Q., Sun, G., Jing, W., Peng, X., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-10-6388-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6388-6_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6387-9

  • Online ISBN: 978-981-10-6388-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics