Abstract
Random stealing is a well-known dynamic load-balancing algorithm. However, for a large-scale cluster, the simple random stealing policy is no longer efficient because an idle node must randomly steal many times to obtain a task from another node. This will not only increase the idle time for all nodes but also produce a heavy network communication overhead. In this paper, we propose a novel dynamic load-balancing algorithm, Transitive Random Stealing (TRS), which can make any idle node obtain a task from another node with much fewer stealing times in a large-scale cluster. A probabilistic model is constructed to analyze the performance of TRS, random stealing and Shis, one of load balance policies in the EARTH system. Finally, by the random baseline technique, an experiment designed to compare TRS with Shis and random stealing for five different load distributions in the Tsinghua EastSun cluster convinces us that TRS is a highly efficient dynamic load-balancing algorithm in a large-scale cluster.
This work is supported by Chinese NSF for DYS granted by No. 60425205 and National Postdoctor Science Foundation of China.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The Natural Work-Stealing Algorithm is Stable. SIAM Journal on Computing 32(5), 1260–1279 (2003)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 1995, Santa Barbara, California, July 1995, pp. 207–216 (1995)
Blumofe, R.D., Leiserson, C.E.: Scheduling Multithreaded Computations by Work Stealing. In: Proceedings of the 35th Annual IEEE conference on Foundations of Computer Science (FOCS 1994), Santa Fe, New Mexico, November 20-22 (1994)
Cai, H., Maquelin, O., Kakulavarapu, P., Gao, G.R.: Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model. In: Proceedings of the Multithreaded Execution Architecture and Compilation Workshop, Orlando, Florida (January 1999), Delaware (May 1999)
Eager, D.L., Lazowska, E.D., Zahorjan, J.: A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing. Performance Evaluation 6, 53–68 (1986)
Giloi, W.K., Bruning, U., Schroderpreikschat, W.: MANTA: Prototype of a distributed memory architecture with maximized sustained performance. In: Proceedings of Eurornicm PDP 1996 Workshop (1996)
Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Tang, X., Gao, G.R., Cupryk, P., Elmasri, N., Hendren, L.-r.J., Jimenez, A., Krishnan, S., Marquez, A., Merali, S., Nemawarkar, S.S., Panangaden, P., Xue, X., Zhu, Y.: A design study of the EARTH multiprocessor. In: Malyshkin, V.E. (ed.) PaCT 1995. LNCS, vol. 964, pp. 59–68. Springer, Heidelberg (1995)
van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Satin: Efficient Parallel Divide and Conquer in Java. In: Proceedings of Euro-Par 2000, Munich, Germany, August 29-September 1, pp. 690–699 (2000)
van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Efficient Load Balancing for Wide-area Divide-and-Conquer Applications. In: Proceedings of Eighth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2001), Snowbird, UT, June 18-19 (2001)
Shivaratri, N.G., Krueger, P.: Two Adaptive Location Policies for Global Scheduling Algorithms. In: IEEE International Conference on Distributed Computing Systems, ICDCS (1990)
Wu, I.C., Kung, H.: Communication Complexity for Parallel Divide and Conquer. In: 32nd Annual Symposium on Foundations of Computer Science (FOCS 1991), San Juan, Puerto Rico, October 1991, pp. 151–162 (1991)
Zhang, B.Y.: A Java paralle environment, Available at http://vip.6to23.com/jcluster/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, BY., Mo, ZY., Yang, GW., Zheng, WM. (2005). An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds) Distributed and Parallel Computing. ICA3PP 2005. Lecture Notes in Computer Science, vol 3719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564621_20
Download citation
DOI: https://doi.org/10.1007/11564621_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29235-7
Online ISBN: 978-3-540-32071-5
eBook Packages: Computer ScienceComputer Science (R0)