Skip to main content
Log in

Dynamic load balancing in distributed exascale computing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

According to exascale computing roadmap, the dynamic nature of new generation scientific problems needs an undergoing review in the static management of computing resources. Therefore, it is necessary to present a dynamic load balancing model to manage the load of the system, efficiently. Currently, the distributed exascale systems are the promising solution to support the scientific programs with dynamic requests to resources. In this work, we propose a dynamic load balancing mechanism for distributed controlling of the load in the computing nodes. The presented method overcomes the challenges of dynamic behavior in the next generation problems. The proposed model considers many practical parameters including the load transition and communication delay. We also propose a compensating factor to minimize the idle time of computing nodes. We propose an optimized method to calculate this compensating factor. We estimate the status of nodes and also calculate the exact portion of the load that should be transferred to perform the optimized load balancing. The evaluation results show significant improvements regarding the performance by proposed load balancing in compared with some earlier distributed load balancing mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. DOE Workshop Report (2014).: Software Productivity for Extreme-Scale Science. Rockville

  2. Mirtaheri, S.L., Khaneghah, E.M., Grandinetti, L., Sharifi, M.: A mathematical model for empowerment of Beowulf clusters for exascale computing. In: High Performance Computing and Simulation (HPCS), 2013 International Conference on, pp. 682-687. IEEE, Helsinki (2013)

  3. Dongarra, J.: International Exascale Software Project Roadmap (Draft 1/27/10 5: 08 PM) (2009)

  4. strm, J.A., Carter, A., Hetherington, J., Ioakimidis, K., Lindahl, E., Mozdzynski, G., Westerholm, J.: Preparing scientific application software for exascale computing. In: International Workshop on Applied Parallel Computing, pp. 27–42. Springer, Berlin (2012)

  5. Wang, K., Kulkarni, A., Lang, M., Arnold, D., Raicu, I.: Exploring the design tradeoffs for extreme-scale high-performance computing system software. IEEE Trans. Parallel Distrib. Syst. 27(4), 1070–1084 (2016)

    Article  Google Scholar 

  6. Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W., Hill, K.: Exascale computing study: technology challenges in achieving exascale systems (2008)

  7. Qin, X., Jiang, H., Manzanares, A., Ruan, X., Yin, S.: Dynamic load balancing for I/O-intensive applications on clusters. ACM Trans. Storage (TOS) 5(3), 9 (2009)

    Google Scholar 

  8. Reddy, H.: Performance Evaluation of Static and Dynamic Load-Balancing Schemes for a Parallel Computational Fluid Dynamics Software Application (Fluent) Distributed Across Clusters of Heterogeneous Symmetric Multiprocessor System. IBM Red Book, 6609 Carriage Drive Colleyville, TX 76034 (2004)

  9. Mohamed, N., Al-Jaroodi, J.: Delay-tolerant dynamic load balancing. In: High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference on, pp. 237–245. IEEE (2011)

  10. Llanes, A., Cecilia, J.M., Snchez, A., Garca, J.M., Amos, M., Ujaldn, M.: Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization. Cluster Comput. 19(1), 1–11 (2016)

    Article  Google Scholar 

  11. Langer, A.: An optimal distributed load balancing algorithm for homogeneous work units. In: Proceedings of the 28th ACM international conference on Supercomputing, pp. 165–165. ACM (2014)

  12. Alam, T., Raza, Z.: An adaptive threshold based hybrid load balancing scheme with sender and receiver initiated approach using random information exchange. Practice and Experience, Concurrency and Computation (2016)

  13. Mahafzah, B.A., Jaradat, B.A.: The hybrid dynamic parallel scheduling algorithm for load balancing on Chained-Cubic Tree interconnection networks. J. Supercomput. 52(3), 224–252 (2010)

    Article  Google Scholar 

  14. Martnez, J.A., Almeida, F., Garzn, E.M., Acosta, A., Blanco, V.: Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J. Supercomput. 58(3), 385–393 (2011)

    Article  Google Scholar 

  15. Ybenes, P., Escudero-Sahuquillo, J., Garca, P.J., Quiles, F.J.: Straightforward solutions to reduce HoL blocking in different Dragonfly fully-connected interconnection patterns. J. Supercomput. 72(12), 1–23 (2016)

  16. Mirtaheri, S.L., Sharifi, M.: An efficient resource discovery framework for pure unstructured peer-to-peer systems. Comput. Netw. 59, 213–226 (2014)

    Article  Google Scholar 

  17. Balasangameshwara, J., Raju, N.: Performance-driven load balancing with a primary-backup approach for computational grids with low communication cost and replication cost. IEEE Trans. Comput. 62(5), 990–1003 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  18. Domanal, S.G., Reddy, G.R.M.: Load Balancing in Cloud Environment using a Novel Hybrid Scheduling Algorithm. In: 2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 37–42. IEEE (2015)

  19. Dhakal, S., Hayat, M.M., Pezoa, J.E., Yang, C., Bader, D.A.: Dynamic load balancing in distributed systems in the presence of delays: a regeneration-theory approach. IEEE Trans. Parallel Distrib. Syst. 18(4), 485–497 (2007)

    Article  Google Scholar 

  20. Mkel, A., Siikavirta, S., Manner, J.: Comparison of load-balancing approaches for multipath connectivity. Comput. Netw. 56(8), 2179–2195 (2012)

    Article  Google Scholar 

  21. Heene, M., Kowitz, C., Pflger, D.: Load Balancing for Massively Parallel Computations with the Sparse Grid Combination Technique. In: PARCO, pp. 574–583. (2013)

  22. Arafat, M.H.: Runtime Systems for Load Balancing and Fault Tolerance on Distributed Systems (Doctoral dissertation, The Ohio State University), (2014)

  23. Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and data-locality with data-aware scheduling. In: Big Data (Big Data), 2014 IEEE International Conference on, pp. 119–128. IEEE (2014)

  24. Wang, K., Zhou, X., Qiao, K., Lang, M., McClelland, B., Raicu, I.: Towards scalable distributed workload manager with monitoring-based weakly consistent resource stealing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 219–222. ACM (2015)

  25. Laredo, J.L.J., Guinand, F., Olivier, D., Bouvry, P.: Load Balancing at the edge of chaos: how self-organized criticality can lead to energy-efficient computing. IEEE Trans. Parallel Distrib. Syst. 28(2), 517–529 (2016)

    Article  Google Scholar 

  26. Pitek, W., Oleksiak, A., Da Costa, G.: Energy and thermal models for simulation of workload and resource management in computing systems. Simul. Modell. Pract. Theory 58, 40–54 (2015)

    Article  Google Scholar 

  27. Pickartz, S., Lankes, S., Monti, A., Clauss, C., Breitbart, J.: Application migration in HPCA driver of the exascale era?. In: High Performance Computing & Simulation (HPCS), 2016 International Conference on, pp. 318–325. IEEE (2016)

  28. Alowayyed, S., Groen, D., Coveney, P.V., Hoekstra, A.G.: Multiscale Computing in the Exascale Era. arXiv preprint arXiv:1612.02467 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyedeh Leili Mirtaheri.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirtaheri, S.L., Grandinetti, L. Dynamic load balancing in distributed exascale computing systems. Cluster Comput 20, 3677–3689 (2017). https://doi.org/10.1007/s10586-017-0902-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0902-8

Keywords

Navigation