Skip to main content

An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster

  • Conference paper
Distributed and Parallel Computing (ICA3PP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3719))

  • 629 Accesses

Abstract

Random stealing is a well-known dynamic load-balancing algorithm. However, for a large-scale cluster, the simple random stealing policy is no longer efficient because an idle node must randomly steal many times to obtain a task from another node. This will not only increase the idle time for all nodes but also produce a heavy network communication overhead. In this paper, we propose a novel dynamic load-balancing algorithm, Transitive Random Stealing (TRS), which can make any idle node obtain a task from another node with much fewer stealing times in a large-scale cluster. A probabilistic model is constructed to analyze the performance of TRS, random stealing and Shis, one of load balance policies in the EARTH system. Finally, by the random baseline technique, an experiment designed to compare TRS with Shis and random stealing for five different load distributions in the Tsinghua EastSun cluster convinces us that TRS is a highly efficient dynamic load-balancing algorithm in a large-scale cluster.

This work is supported by Chinese NSF for DYS granted by No. 60425205 and National Postdoctor Science Foundation of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The Natural Work-Stealing Algorithm is Stable. SIAM Journal on Computing 32(5), 1260–1279 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  2. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 1995, Santa Barbara, California, July 1995, pp. 207–216 (1995)

    Google Scholar 

  3. Blumofe, R.D., Leiserson, C.E.: Scheduling Multithreaded Computations by Work Stealing. In: Proceedings of the 35th Annual IEEE conference on Foundations of Computer Science (FOCS 1994), Santa Fe, New Mexico, November 20-22 (1994)

    Google Scholar 

  4. Cai, H., Maquelin, O., Kakulavarapu, P., Gao, G.R.: Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model. In: Proceedings of the Multithreaded Execution Architecture and Compilation Workshop, Orlando, Florida (January 1999), Delaware (May 1999)

    Google Scholar 

  5. Eager, D.L., Lazowska, E.D., Zahorjan, J.: A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing. Performance Evaluation 6, 53–68 (1986)

    Article  Google Scholar 

  6. Giloi, W.K., Bruning, U., Schroderpreikschat, W.: MANTA: Prototype of a distributed memory architecture with maximized sustained performance. In: Proceedings of Eurornicm PDP 1996 Workshop (1996)

    Google Scholar 

  7. Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Tang, X., Gao, G.R., Cupryk, P., Elmasri, N., Hendren, L.-r.J., Jimenez, A., Krishnan, S., Marquez, A., Merali, S., Nemawarkar, S.S., Panangaden, P., Xue, X., Zhu, Y.: A design study of the EARTH multiprocessor. In: Malyshkin, V.E. (ed.) PaCT 1995. LNCS, vol. 964, pp. 59–68. Springer, Heidelberg (1995)

    Google Scholar 

  8. van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Satin: Efficient Parallel Divide and Conquer in Java. In: Proceedings of Euro-Par 2000, Munich, Germany, August 29-September 1, pp. 690–699 (2000)

    Google Scholar 

  9. van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Efficient Load Balancing for Wide-area Divide-and-Conquer Applications. In: Proceedings of Eighth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2001), Snowbird, UT, June 18-19 (2001)

    Google Scholar 

  10. Shivaratri, N.G., Krueger, P.: Two Adaptive Location Policies for Global Scheduling Algorithms. In: IEEE International Conference on Distributed Computing Systems, ICDCS (1990)

    Google Scholar 

  11. Wu, I.C., Kung, H.: Communication Complexity for Parallel Divide and Conquer. In: 32nd Annual Symposium on Foundations of Computer Science (FOCS 1991), San Juan, Puerto Rico, October 1991, pp. 151–162 (1991)

    Google Scholar 

  12. Zhang, B.Y.: A Java paralle environment, Available at http://vip.6to23.com/jcluster/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, BY., Mo, ZY., Yang, GW., Zheng, WM. (2005). An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds) Distributed and Parallel Computing. ICA3PP 2005. Lecture Notes in Computer Science, vol 3719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564621_20

Download citation

  • DOI: https://doi.org/10.1007/11564621_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29235-7

  • Online ISBN: 978-3-540-32071-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics