History-driven dynamic load balancing for recurring applications on networks of workstations
Introduction
The development and use of expensive supercomputers and/or parallel computers during the last two decades represent one solution to the breaking of the speed barrier imposed on single processor computers. They have contributed to the solution of many so-called grand-challenge computational problems.
On the other hand, the advances in desktop computing in the form of personal computers and workstations, coupled with advances in communication technology, have altered the state of the art in computing forever. Some applications that were once associated with supercomputers such as CRAY, or parallel machines such as Hypercube, Paragons, and CM are now candidates to run on networks of workstations (NOW). Many sequential applications can now be run in parallel on widely available NOW.
Supercomputers are rather uniform. For example, they are run by one operating system, one communication medium, and often managed by one organization. Thus, application programming and running is much easier. However, with NOW, there are diverse architectures, operating systems, communication mediums, transport protocols, and ownership. This makes application programming and system support for parallelism much more difficult. Yet, workstations or NOW are available in abundance and are under utilized most of the time.
The main objective of this study is to provide a transparent and complete load balancing model over NOW, to improve the performance of parallel and independent applications. The current load on an individual workstation, at the time of submission of an application or task, is an important factor, but not sufficient for making a sound decision on accepting further load. In DYLOBA, the load exertable by the incoming application is also considered. One way of predicting the exertable load is to utilize the past executions of the application, as an important contribution. Thus, DYLOBA maintains a cumulative history database to keep a record of past execution of the application and uses it to drive the system-wide dynamic load balancing.
The experimental results have shown considerable improvement over non-load balancing assignment schemes or load balancing schemes based purely on current system load. DYLOBA uses Parallel Virtual Machine (PVM) to transfer a task to a selected workstation or handle inter-task communication in the case of parallel applications.
Following this brief introduction, Section 2 gives a brief review of the recent literature on dynamic load balancing. The modeling aspects are discussed in Section 3. The design and implementation details of DYLOBA are given in Section 4. The experiments and performance are presented in Section 5. Finally, Section 6 provides a brief conclusion and comments on future research directions.
Section snippets
Current work on dynamic load balancing
Load balancing, in this context, is a technique to enhance the utilization of system resources, exploit parallelism, improve throughput, and cut response time through an appropriate distribution of the application load. Almost all studies on load balancing fall under one of the two broad types, static and dynamic.
Static load balancing is characterized by pre-execution task placement based on a priori knowledge of the applications and target system characteristics. The majority of studies on
Modeling load balancing
DYLOBA is primarily utilized by recurring applications whose execution requirements are sufficiently intensive to justify the overhead involved. The heart of DYLOBA is a decision process regarding which task to execute on which workstation so that completion time is minimized. The decision process is driven by two components: the application and system characteristics. Both of these components have static and dynamic parts. Among the various static characteristics of an application, the
Implementation issues
DYLOBA basically creates a client-server type distributed computing environment on a network(s) of workstations. It can handle multi-thread (parallel and/or distributed) as well as single-thread (independent) applications.
There are five active subsystems that together form DYLOBA as shown in Fig. 1. Arrows indicate the flow of data or control. The subsystems are distributed over the current NOW and they communicate via a set of well-defined protocols. A brief functionality of each subsystem is
Performance
The aim of this analysis is to show how DYLOBA performs under varying application and system states. Thus, it is subjected to extreme as well as average application characteristics, such as the level of processing, communication, and I/O intensities. For this reason, special pseudo applications are generated by a generic application generator. Appropriate variations are inserted into computation, communication, and input/output patterns of the application as required.
Another important criterion
Conclusion
DYLOBA is primarily aimed at load balancing highly CPU-intensive recurring parallel and or distributed applications. However, it does not exclude single task applications. Also, it does not harm any of the existing applications, as an application is free in joining DYLOBA.
The user interface is in the form of a set of library calls implemented in C Programming Language. The system load is monitored and maintained, along with the run-time application information, in the dynamic application
Acknowledgements
Acknowledgment is due to King Fahd University of Petroleum and Minerals for the support provided to carry out this research.
Müslim Bozyiğit has received his BS and MS degrees from Middle East Technical University (METU), Ankara, Turkey, both in engineering, and Ph.D. degree in computing, from University of Westminster (UW/PCL), London, UK. Currently, he is a faculty member of College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia. He spent his sabbatical year with University of Cambridge, UK. Formerly, he was with Computer Engineering Department, METU. Dr.
References (17)
- et al.
Models and algorithms for coscheduling compute-intensive tasks on a network of workstations
Journal of Parallel and Distributed Computing
(1992) - et al.
Stardust an environment for parallel programming on networks of heterogeneous workstations
Journal of Parallel and Distributed Computing
(1997) - et al.
A decentralized algorithm for dynamic load balancing with file transfer
Journal of Systems and Software
(1991) - et al.
A migratable user-level process package for PVM
Journal of Parallel and Distributed Computing
(1997) - et al.
Customized dynamic load balancing for a network of workstations
Journal of Parallel and Distributed Computing
(1997) - et al.
On the assignment problem of arbitrary process systems to heterogeneous distributed computer systems
IEEE Transactions on Computers
(1992) - et al.
A load balancing framework for distributed systems
International Journal of Computer Systems Science and Engineering
(1997) - et al.
Prediction based dynamic load sharing heuristics
IEEE Transactions on Parallel and Distributed Systems
(1993)
Cited by (2)
Competition-based load balancing for distributed systems
2006, Proceedings of ISCN'06: 7th International Symosium on Computer Networks
Müslim Bozyiğit has received his BS and MS degrees from Middle East Technical University (METU), Ankara, Turkey, both in engineering, and Ph.D. degree in computing, from University of Westminster (UW/PCL), London, UK. Currently, he is a faculty member of College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia. He spent his sabbatical year with University of Cambridge, UK. Formerly, he was with Computer Engineering Department, METU. Dr. Bozyiğit's research interests focus on load balancing, fault tolerance, and scheduling, in the field of parallel and distributed computing. He is a member of ACM and New York Academy of Sciences.