Elsevier

Information Sciences

Volume 177, Issue 14, 15 July 2007, Pages 2867-2883
Information Sciences

An efficient non-contiguous processor allocation strategy for 2D mesh connected multicomputers

https://doi.org/10.1016/j.ins.2007.02.022Get rights and content

Abstract

In non-contiguous allocation, a job request can be split into smaller parts that are allocated possibly non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. In this paper, a new non-contiguous processor allocation strategy, referred to as Greedy-Available-Busy-List, is suggested for the 2D mesh network. Request partitioning in our suggested strategy is based on the sub-meshes available for allocation. To evaluate the performance improvement achieved by our strategy and compare it against well-known existing non-contiguous and contiguous strategies, we conduct extensive simulation runs under the assumption of wormhole routing and three communication patterns, notably one-to-all, all-to-all and random. The results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of job turnaround time and system utilization.

Introduction

In a multicomputer, processor allocation is responsible for selecting the set of processors on which parallel jobs are executed, while job scheduling is responsible for determining the order in which the jobs are executed. Most allocation strategies employed in a multicomputer are based on contiguous allocation, where the processors allocated to a parallel job are physically contiguous and have the same topology as that of the interconnection network of the multicomputer [1], [10], [11], [12], [14], [17], [27], [28]. Contiguous strategies often result in high external processor fragmentation, as has been shown in [28]. External fragmentation occurs when there are free processors sufficient in number to satisfy the number requested by a parallel job, but they are not allocated to it because the free processors are not contiguous or they do not have the same topology as the multicomputer.

The main goal of processor allocation strategies is to reduce job turnaround time and to maximize the system utilization. Finding an efficient processor allocation strategy with high performance and low overhead is not simple. It has been shown that optimal allocation is an NP-complete problem [16], [22]. Therefore, a set of heuristics has to be developed.

Several studies have attempted to reduce external fragmentation [2], [3], [6], [9], [12], [18], [23], [24]. One suggested solution is to adopt non-contiguous allocation [3], [6], [9], [18]. In non-contiguous allocation, a job can execute on multiple disjoint smaller sub-networks rather than always waiting until a single sub-network of the requested size and shape is available. Although non-contiguous allocation increases message contention in the network, lifting the contiguity condition is expected to reduce processor fragmentation and increase processor utilization [3], [6], [9], [18]. It is the introduction of wormhole routing [4], [15] that has lead researchers to consider non-contiguous allocation on multicomputer networks with a long communication distances, such as the 2D mesh [4], [6], [9], [18]. This is due to the fact that one of main advantages of wormhole routing over earlier communication schemes, e.g. store-and-forward, is that message latency depends less on the distance the message travels from source to destination.

Folding has also been proposed as a technique for reducing processor fragmentation in the 2D mesh [2], [9]. It permits applications to execute on fewer processors than they have requested, when necessary. This could improve the performance of contiguous and non-contiguous allocation, as demonstrated in [2], [9]. The problem with folding is that jobs must be able to execute on a number of processors determined at load time. Nonetheless, most existing research studies have been conducted in the context of contiguous allocation [1], [10], [11], [12], [14], [17], [24], [27], [28]. There has been comparatively very little work on non-contiguous allocation. Whereas contiguous allocation eliminates contention among the messages of concurrently executing jobs, non-contiguous allocation can eliminate external processor fragmentation that contiguous allocation suffers from.

Most existing research on contiguous and non-contiguous allocation has been carried out in the context of the 2D mesh [1], [2], [6], [9], [10], [11], [12], [14], [17], [18], [23], [24], [25], [27], [28]. The mesh network has been used as the underlying network in a number of practical and experimental parallel machines, such as iWARP [20], the IBM BlueGene/L [5], [7], [8], and Delta Touchstone [13]. In this study, we propose a new non-contiguous allocation strategy, referred to here as Greedy-Available-Busy-List (GABL), for the 2D mesh, and we compare its performance properties using detailed simulations against those of the previous non-contiguous and contiguous allocation strategies. The suggested strategy combines the desirable features of both contiguous and non-contiguous schemes. The method used for decomposing allocation requests in existing non-contiguous allocation schemes are not based on free contiguous sub-meshes. For example, allocation requests are subdivided into two equal parts in [9]. The subparts are successively subdivided in a similar fashion if allocation fails for any of them. In the study of [18], a promising strategy (MBS) expresses the allocation request as a base-4 number, and bases allocation on this expression. In this paper, GABL partitions requests based on the sub-meshes available for allocation. A major goal of the partitioning process is to maintain a high degree of contiguity among the processors allocated to a given parallel job. The performance of GABL is compared against the performance of the non-contiguous allocation strategies Paging(0) and MBS [18]. These two strategies have been selected because they have been shown to perform well in [18]. Furthermore, GABL is also compared against the contiguous First Fit strategy [28] as this has been used in several previous related studies [9], [10], [18].

While the ideas in this paper can easily be extended to 3D meshes and to 2D and 3D tori, we concentrate on 2D meshes. The rest of the paper is organized as follows. Section 2 contains a brief summary of allocation strategies previously proposed for the 2D mesh. Section 3 describes our proposed non-contiguous allocation strategy. Section 4 compares the performance of the contiguous and non-contiguous allocation strategies. Finally, Section 5 concludes this study.

Section snippets

Related work

This section provides a brief overview of some existing contiguous and non-contiguous allocation strategies.

The proposed non-contiguous allocation strategy

In the following, we present the system model assumed in this paper. The target system is a W × L 2D mesh, where W is the width of the mesh and L is its length. Every processor is denoted by a pair of coordinates (x, y), where 0  x  W  1 and 0  y  L  1 [6]. Each processor is connected by bidirectional communication links to its neighbour processors, as depicted in Fig. 1. This figure shows an example of a 4 × 4 2D mesh, where allocated processors are denoted by shaded circles and free processors are

Performance evaluation

In this section, the time and space complexities of the proposed allocation strategy are presented first. Then, the results from simulations that have been carried out to evaluate the performance of the proposed algorithm are presented and compared against those of Paging(0), MBS and FF.

Conclusion and future directions

This paper has investigated the performance merits of non-contiguous allocation in the 2D mesh network. To this end, we have suggested a new non-contiguous allocation strategy, referred to as Greedy-Available-Busy-List, which differs from the earlier non-contiguous allocation strategies in the method used for decomposing allocation requests. The GABL strategy decomposes the allocation requests based on the sub-meshes available for allocation. The major goal of the partitioning process is to

Acknowledgement

The ProcSimity was developed by Dr. Kurt Windisch, Dr. Virginia Lo and Dr. Jayne Miller at University of Oregon. We thank them for making it public and allowing the use of it.

References (28)

  • S. Bani-Mohammad et al.

    Non-contiguous processor allocation strategy for 2D mesh connected multicomputers based on sub-meshes available for allocation

  • Blue Gene Project, 2005....
  • M. Blumrich, D. Chen, P. Coteus, A. Gara, M. Giampapa, P. Heidelberger, S. Singh, B. Steinmacher-Burow, T. Takken, P....
  • G.-M. Chiu et al.

    An efficient submesh allocation scheme for two-dimensional meshes with little overhead

    IEEE Transactions on Parallel & Distributed Systems

    (1999)
  • Cited by (21)

    • On the performance of non-contiguous allocation for common communication patterns in 2D mesh-connected multicomputers

      2013, Simulation Modelling Practice and Theory
      Citation Excerpt :

      These strategies cover a wide range of choices. They are the Paging(0) [28], the Multiple Buddy Strategy [28], the Adaptive Non-contiguous Allocation strategy [6,24], and the Greedy Available Busy List strategy [25]. Simulation results have shown that GABL has the best overall performance in terms of job turnaround time for the general communication patterns (i.e., Near Neighbour, Ring, and Random), whereas MBS and ANCA perform well for FFT and DQBT, which are special in that they require sub-meshes with power of two side-lengths.

    • Performance evaluation of noncontiguous allocation algorithms for 2D mesh interconnection networks

      2011, Journal of Systems and Software
      Citation Excerpt :

      ANCA artificially subdivides job requests into two equal parts, and the subparts are successively subdivided in a similar fashion if allocation fails for any of them (Chang and Mohapatra, 1998) (see Section 3.1.1). More recent noncontiguous allocation strategies partition the allocation requests based on the submeshes available for allocation (Bani-Mohammad et al., 2007). The major goal of this partitioning process is to maintain a high degree of contiguity among processors allocated to a job.

    • Analysis of All-to-all Collective Operations on Hierarchical Computer Clusters

      2020, 2020 International Multi-Conference on Industrial Engineering and Modern Technologies, FarEastCon 2020
    View all citing articles on Scopus
    View full text