An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster

Zhang, Bao-Yin; Mo, Ze-Yao; Yang, Guang-Wen; Zheng, Wei-Min

doi:10.1007/11564621_20

Bao-Yin Zhang¹⁹,
Ze-Yao Mo¹⁹,
Guang-Wen Yang²⁰ &
…
Wei-Min Zheng²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3719))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

629 Accesses

Abstract

Random stealing is a well-known dynamic load-balancing algorithm. However, for a large-scale cluster, the simple random stealing policy is no longer efficient because an idle node must randomly steal many times to obtain a task from another node. This will not only increase the idle time for all nodes but also produce a heavy network communication overhead. In this paper, we propose a novel dynamic load-balancing algorithm, Transitive Random Stealing (TRS), which can make any idle node obtain a task from another node with much fewer stealing times in a large-scale cluster. A probabilistic model is constructed to analyze the performance of TRS, random stealing and Shis, one of load balance policies in the EARTH system. Finally, by the random baseline technique, an experiment designed to compare TRS with Shis and random stealing for five different load distributions in the Tsinghua EastSun cluster convinces us that TRS is a highly efficient dynamic load-balancing algorithm in a large-scale cluster.

This work is supported by Chinese NSF for DYS granted by No. 60425205 and National Postdoctor Science Foundation of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Multi-agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing

A Load Balancing Method Based on Node Features in a Heterogeneous Hadoop Cluster

Large-Scale Experiment for Topology-Aware Resource Management

References

Berenbrink, P., Friedetzky, T., Goldberg, L.A.: The Natural Work-Stealing Algorithm is Stable. SIAM Journal on Computing 32(5), 1260–1279 (2003)
Article MATH MathSciNet Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. In: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 1995, Santa Barbara, California, July 1995, pp. 207–216 (1995)
Google Scholar
Blumofe, R.D., Leiserson, C.E.: Scheduling Multithreaded Computations by Work Stealing. In: Proceedings of the 35th Annual IEEE conference on Foundations of Computer Science (FOCS 1994), Santa Fe, New Mexico, November 20-22 (1994)
Google Scholar
Cai, H., Maquelin, O., Kakulavarapu, P., Gao, G.R.: Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model. In: Proceedings of the Multithreaded Execution Architecture and Compilation Workshop, Orlando, Florida (January 1999), Delaware (May 1999)
Google Scholar
Eager, D.L., Lazowska, E.D., Zahorjan, J.: A Comparison of Receiver-Initiated and Sender-Initiated Adaptive Load Sharing. Performance Evaluation 6, 53–68 (1986)
Article Google Scholar
Giloi, W.K., Bruning, U., Schroderpreikschat, W.: MANTA: Prototype of a distributed memory architecture with maximized sustained performance. In: Proceedings of Eurornicm PDP 1996 Workshop (1996)
Google Scholar
Hum, H.H.J., Maquelin, O., Theobald, K.B., Tian, X., Tang, X., Gao, G.R., Cupryk, P., Elmasri, N., Hendren, L.-r.J., Jimenez, A., Krishnan, S., Marquez, A., Merali, S., Nemawarkar, S.S., Panangaden, P., Xue, X., Zhu, Y.: A design study of the EARTH multiprocessor. In: Malyshkin, V.E. (ed.) PaCT 1995. LNCS, vol. 964, pp. 59–68. Springer, Heidelberg (1995)
Google Scholar
van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Satin: Efficient Parallel Divide and Conquer in Java. In: Proceedings of Euro-Par 2000, Munich, Germany, August 29-September 1, pp. 690–699 (2000)
Google Scholar
van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Efficient Load Balancing for Wide-area Divide-and-Conquer Applications. In: Proceedings of Eighth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2001), Snowbird, UT, June 18-19 (2001)
Google Scholar
Shivaratri, N.G., Krueger, P.: Two Adaptive Location Policies for Global Scheduling Algorithms. In: IEEE International Conference on Distributed Computing Systems, ICDCS (1990)
Google Scholar
Wu, I.C., Kung, H.: Communication Complexity for Parallel Divide and Conquer. In: 32nd Annual Symposium on Foundations of Computer Science (FOCS 1991), San Juan, Puerto Rico, October 1991, pp. 151–162 (1991)
Google Scholar
Zhang, B.Y.: A Java paralle environment, Available at http://vip.6to23.com/jcluster/

Download references

Author information

Authors and Affiliations

Institute of Applied Physics and Computational Mathematics, Beijing, 100088, P.R. China
Bao-Yin Zhang & Ze-Yao Mo
Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China
Guang-Wen Yang & Wei-Min Zheng

Authors

Bao-Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ze-Yao Mo
View author publications
You can also search for this author in PubMed Google Scholar
Guang-Wen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Min Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Geelong, Deakin University, 3217, Vic, Australia
Michael Hobbs
School of Engineering and Information Technology, Deakin University, Pigdons Road, Geelong
Andrzej M. Goscinski
Deakin University, Melbourne, Australia
Wanlei Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, BY., Mo, ZY., Yang, GW., Zheng, WM. (2005). An Efficient Dynamic Load-Balancing Algorithm in a Large-Scale Cluster. In: Hobbs, M., Goscinski, A.M., Zhou, W. (eds) Distributed and Parallel Computing. ICA3PP 2005. Lecture Notes in Computer Science, vol 3719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564621_20

Download citation

DOI: https://doi.org/10.1007/11564621_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29235-7
Online ISBN: 978-3-540-32071-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics