On submesh allocation for 2D mesh multicomputers using the free-list approach: Global placement schemes

doi:10.1016/j.peva.2008.10.001

Performance Evaluation

Volume 66, Issue 2, February 2009, Pages 105-120

https://doi.org/10.1016/j.peva.2008.10.001 Get rights and content

Abstract

Two global placement schemes for contiguous processor allocation in two-dimensional mesh-connected multicomputers are proposed in this paper. The first scheme gives preference to allocating a free peripheral submesh that has the largest number of mesh-boundary processors. This peripheral placement has for goal producing large leftover free submeshes, which can improve system performance. Another characteristic of this scheme is that it reduces the search space by halting the search process when a large-enough multicomputer corner submesh is found. The second proposed scheme considers allocation in the corners of all large-enough free submeshes and allocates a submesh that has the maximum number of allocated neighbors and multicomputer peripheral nodes. Using extensive simulations, we evaluated the proposed schemes and compared them with previous promising schemes. The simulation results show that the peripheral placement scheme produces the best average turnaround times, and its measured allocation and de-allocation times are smaller than those of the previous schemes. The second proposed scheme ranks overall second in terms of turnaround times, however it is last in terms of efficiency.

Introduction

Mesh-connected interconnection networks are widely used in current distributed memory parallel computers. This is mainly because they are regular, simple, easy to implement, and scalable. Both two-dimensional (2D) and three-dimensional (3D) meshes and tori have been used in recent commercial and experimental multicomputers, such as the Caltech Mosaic [4], the Intel Paragon [17], the IBM BlueGene/L [6], and the Cray XT3 [12].

In most processor allocation policies proposed in the literature for mesh-connected multicomputers, a parallel job is allocated a distinct submesh of processors of the size and shape it has requested. This can result in high external processor fragmentation, which occurs when free processors are not allocated to a parallel job because of the shape constraint. A recurring outcome in allocation studies is that contiguous allocation suffers from low overall system utilization [1], [10], [20]. It can reduce system utilization to levels unacceptable for government-audited systems in the USA [7]. Therefore, noncontiguous allocation policies have been proposed with the goal of increasing system utilization by allowing dispersed free processors to be allocated to a parallel job [2], [8], [20]. Another approach to improving system utilization is using job scheduling policies that are not strictly first-come-first-served. These policies consider allocation to recent jobs so as to decrease the number of idle processors [5], [14], [21].

Notwithstanding the ability of noncontiguous allocation to reduce, even eliminate, processor fragmentation, an advantage of contiguous allocation over noncontiguous allocation is that it isolates jobs from each other, which is useful for security and accounting reasons. For example, contiguous allocation is proposed for use in the IBM BlueGene/L for security reasons. Because of the sensitive nature of some of its applications a BlueGene/L job is allocated a partition of processors that is isolated from partitions allocated to other jobs [3].

Contiguous allocation for 2D meshes has received extensive interest in the literature. Several of the proposed policies use the first-fit approach to submesh selection [9], [11], [15], [24], [25]. In addition, a global compaction scheme was proposed [13]. This scheme considers allocation on the corners of all allocated submeshes along with the four corners of the mesh system, and attempts to compact allocated submeshes by selecting an allocation submesh that has the largest number of busy adjacent processors and mesh peripheral length. Also, a first-fit policy that supports compaction in a similar fashion was proposed [19]. It places the current request in a corner of the first large-enough free submesh that it finds. The corner chosen is one that also has the largest number of busy neighbors and mesh peripheral length. These two compacting policies have shown better queue waiting times than the previous strategies; they performed well in several studies [9], [13], [19]. By placing allocated submeshes together, compaction has for goal reducing processor fragmentation.

A transformation that can improve the performance of 2D submesh allocation schemes is switching the orientation of allocation requests [15]. This technique was adopted in many previous algorithms [9], [13], [19], [24], and it will be adopted in the algorithms proposed in this paper.

A common issue with the two previous compacting schemes is that they do not give strict priority to allocating peripheral submeshes. A submesh allocated inside the mesh can generate more free fragments than a peripheral allocated submesh. That is, peripheral placement is expected to produce larger leftover free submeshes, which should decrease processor fragmentation and increase system utilization, meaning a decrease in the average job waiting delays and turnaround times. In this paper, we propose an allocation policy that assigns a submesh with the largest number of multicomputer peripheral processors to the current request. An internal submesh is allocated only when no corner or boundary submesh can accommodate the current request, however the choice of the internal submesh is independent from the allocation states of adjacent processors. The allocation search space is reduced in this Maximal Peripheral Length (MPL) policy by halting the search process when a suitable mesh corner submesh is found. Another factor that contributes to the efficiency of MPL is that adjacency with busy processors is not an allocation factor. Such adjacency is considered fundamental in previous promising policies, where a busy-list is maintained and employed in computing the number of busy processors adjacent to candidate submeshes [1], [13], [19]. In addition to MPL, we propose a global compacting allocation policy that considers all possible request placements in the corners of the free submeshes, and allocates a candidate submesh that has the largest number of busy adjacent processors and multicomputer peripheral nodes. This is different from the previous global compacting scheme proposed in [13] in that the previous scheme considers allocation on the corners of all allocated submeshes, whereas the proposed scheme considers allocation in the corners of all free submeshes.

Using detailed simulations, we have compared the two proposed schemes to previous effective and efficient schemes. The simulation results show that the peripheral placement scheme produces the best average turnaround times, while being very efficient. The measured allocation and de-allocation times of MPL are smaller than those of the previous allocation schemes, including the simple first-fit scheme. The second proposed scheme ranks overall second in terms of turnaround times and last in terms of efficiency.

Because the relative performance of allocation algorithms can depend on the job scheduling scheme, we have considered First-Come-First-Served (FCFS) scheduling and non-FCFS scheduling schemes that allow bypassing the head of the FIFO job waiting queue. The simulation results show that job scheduling has substantial influence on the mean turnaround times and maximum waiting delays of jobs. That is, an effective job scheduling scheme should be used in conjunction with an effective submesh allocation scheme so as to obtain good system performance.

This paper is organized as follows. The following section contains relevant preliminaries. Section 3 contains a review of previous related allocation schemes. The proposed schemes are presented and analyzed in Section 4. Simulation results are presented in Section 5. Finally, conclusions are given in Section 6.

Section snippets

Preliminaries

A two-dimensional mesh-connected parallel computer is denoted in this paper as $M (W, H)$ , where $W$ is the width of the mesh and $H$ is its height. A processor (node) in column $x$ and row $y$ is represented by the coordinates $(x, y)$ , where $1 \leq x \leq W$ and $1 \leq y \leq H$ . An internal node is directly connected to four neighbors: $(x - 1, y)$ , $(x + 1, y), (x, y - 1)$ , and $(x, y + 1)$ . The four mesh corner nodes have two neighbors each, and the remaining boundary nodes have three neighbors each, as can be seen in Fig. 1. The size of the

Previous allocation schemes

The allocation schemes proposed previously for 2D meshes vary in their ability to detect free submeshes, in the method used for selecting the allocation submesh, in the placement of the request within the selected submesh, and in efficiency. An allocation scheme is said to be recognition-complete if it never fails to detect a suitable free submesh. In addition, allocation can be classified into two approaches: first-fit and best-fit. The first-fit approach selects the first large-enough free

Proposed submesh allocation schemes

The results of the research efforts summarized in the previous section indicate that the system performance of contiguous allocation depends on request orientation switching, and on the position of allocated submeshes relative to mesh edges and adjacent busy submeshes. The influence of request switching on performance was investigated thoroughly in several previous studies (e.g., [1], [9], [13]), where it was found to substantially improve performance. The main goal of this research is to

Simulation

We conducted simulation experiments under different loads and job characteristics so as to evaluate the performance of the proposed allocation schemes. The characteristics of jobs and their arrival process are determined using both synthetic workload models and workload traces. The primary system performance parameter observed is the average turnaround time of jobs, where the turnaround time of a job is the time it spends in the system from arrival to departure. The efficiency of the submesh

Conclusions

In this paper, we have proposed two global placement schemes for contiguous allocation in 2D mesh-connected multicomputers. The first scheme, MPL, gives preference to allocating a free submesh located in a corner of the multicomputer, and when no such submesh exists preference is given to allocating a peripheral submesh that has its longest edge aligned with a multicomputer boundary. An internal submesh is allocated only if there is no suitable peripheral free submesh. Placing a job in a mesh

Acknowledgments

I would like to thank the reviewers for their very valuable comments that have substantially improved the paper in substance and presentation.

References (26)

I. Ababneh
An efficient free-list submesh allocation scheme for two-dimensional mesh-connected multicomputers
Journal of Systems and Software
(2006)
I. Ababneh
Availability-based noncontiguous processor allocation policies for 2D mesh-connected multicomputers
Journal of Systems and Software
(2008)
C.-Y. Chang et al.
Performance improvement of allocation schemes for mesh-connected computers
Journal of Parallel and Distributed Computing
(1998)
D. Das Sharma et al.
Submesh allocation in mesh multicomputers using busy-list: A best-fit approach with complete recognition capability
Journal of Parallel and Distributed Computing
(1996)
Y. Zhu
Efficient processor allocation strategies for mesh-connected parallel computers
Journal of Parallel and Distributed Computing
(1992)
Y. Aridor et al.
Resource allocation and utilization in the BlueGene/L supercomputer
IBM Journal of Res. & Dev.
(2005)
W.C. Athas et al.
Multicomputers: Message-passing concurrent computers
IEEE Computer
(1988)
S. Bhattacharya, W.-T. Tsai, Lookahead processor allocation in mesh-connected massively parallel multicomputer, in:...
M. Blumrich, D. Chen, P. Coteus, A. Gara, M. Giampapa, P. Heidelberger, S. Singh, B. Steinmacher-Burow, T. Takken, P....
D.P. Bunde, V.J. Leung, J. Mache, Communication patterns and allocation strategies, Sandia Technical Report...

G.-M. Chiu et al.

An efficient submesh allocation scheme for two-dimensional meshes with little overhead

IEEE Transactions on Parallel and Distributed Systems

(1999)

H. Choo et al.

Processor scheduling and allocation for 3D torus multicomputer systems

IEEE Transactions on Parallel and Distributed Systems

(2000)

P.-J. Chuang et al.

Allocating precise submeshes in mesh connected systems

IEEE Transactions on Parallel and Distributed Systems

(1994)

Cited by (17)

A new window-based job scheduling scheme for 2D mesh multicomputers
2011, Simulation Modelling Practice and Theory
Citation Excerpt :
Other schemes [8,16] place small bounds on the ability of the non-FCFS scheme to bypass the head of the waiting queue. Results in [3] indicate that such bypassing bounds should increase with the system load, and they should be much larger than those considered in [8,16]. Finally, the approach proposed in [24] assumes that job execution time estimates are available upon job submission.
Allocating submeshes to jobs in mesh-connected multicomputers in a FCFS fashion can lead to poor system performance (e.g., long job waiting delays) because the job at the head of the waiting queue can prevent the allocation of free submeshes to other waiting jobs with smaller submesh requirements. However, serving jobs aggressively out-of-order can lead to excessive waiting delays for jobs with large allocation requests. In this paper, we propose a scheduling scheme that uses a window of consecutive jobs from which it selects jobs for allocation and execution. This window starts with the current oldest waiting job and corresponds to the lookahead of the scheduler. The performance of the proposed window-based scheme has been compared to that of FCFS and other previous job scheduling schemes. Extensive simulation results based on synthetic workloads and real workload traces indicate that the new scheduling strategy exhibits good performance when the scheduling window size is large. In particular, it is substantially superior to FCFS in terms of system utilization, average job turnaround times, and maximum waiting delays under medium to heavy system loads. Also, it is superior to aggressive out-of-order scheduling in terms of maximum job waiting delays. Window-based job scheduling can improve both overall system performance and fairness (i.e., maximum job waiting delays) by adopting large lookahead job scheduling windows.
An Efficient Maximal Free Submesh Detection Scheme for Space-Multiplexing in 2D Mesh-Connected Manycore Computers
2022, International Journal of Computers and their Applications
All request shapes non-contiguous submesh allocation strategy for 2D mesh multicomputer
2017, Proceedings - 2017 International Conference on Engineering and MIS, ICEMIS 2017
Competitive processors allocation in 2d mesh connected multicomputer networks: A dynamic game approach
2017, International Journal of Grid and High Performance Computing
All shapes busy list contiguous allocation strategy for 3D mesh multicomputers
2016, Proceedings - 2016 International Conference on Engineering and MIS, ICEMIS 2016
A new compacting non-contiguous processor allocation algorithm for 2d mesh multicomputers
2015, Journal of Information Technology Research

View all citing articles on Scopus

Ismail Ababneh received the Engineer degree from the National Superior School of Electronics and Electro-mechanics of Caen, France, in 1979, the MS degree in Software Engineering from Boston University in 1984, and the Ph.D. degree in Computer Engineering from Iowa State University in 1995. From 1984 to 1989, he was a Software Engineer with Data Acquisition Systems, Boston, Massachusetts. He is presently an associate professor in the Department of Computer Science at Al al-Bayt University, Jordan and a visiting associate professor in the Department of Computer Science at Jordan University of Science and Technology. He is a member of Tau Beta Pi and Eta Kappa Nu. His current research interests include processor allocation in multicomputers, and ad hoc routing algorithms.

View full text

On submesh allocation for 2D mesh multicomputers using the free-list approach: Global placement schemes

Abstract

Introduction

Section snippets

Preliminaries

Previous allocation schemes

Proposed submesh allocation schemes

Simulation

Conclusions

Acknowledgments

Journal of Systems and Software

Journal of Systems and Software

Journal of Parallel and Distributed Computing

Journal of Parallel and Distributed Computing

Journal of Parallel and Distributed Computing

Resource allocation and utilization in the BlueGene/L supercomputer

IBM Journal of Res. & Dev.

Multicomputers: Message-passing concurrent computers

IEEE Computer

An efficient submesh allocation scheme for two-dimensional meshes with little overhead

IEEE Transactions on Parallel and Distributed Systems

Processor scheduling and allocation for 3D torus multicomputer systems

IEEE Transactions on Parallel and Distributed Systems

Allocating precise submeshes in mesh connected systems

IEEE Transactions on Parallel and Distributed Systems