Distribution algorithms for document allocation in multiprocessor information retrieval systems

https://doi.org/10.1016/0165-6074(94)90096-5Get rights and content

Abstract

Distributed memory information retrieval systems have been used as a means of managing the vast volume of documents in an information retrieval system, and to improve query response time. However, proper allocation of documents plays an important role in improving the performance of such systems. Maximising the amount of parallelism can be achieved by distributing the documents, while the inter-node communication cost is minimised by avoiding documents distribution. Unfortunately, these two factors contradict each other. Finding an optimal allocation satisfying the above objectives is referred to as distributed memory document allocation problem (DDAP), and it is an NP-Complete problem. Heuristic algorithms are usually employed to find an optimal solution to this problem. Genetic algorithm is one such algorithms. In this paper, a genetic algorithm is developed to find an optimal document allocation for DDAP. Several well-known network topologies are investigated to evaluate the performance of the algorithm. The approach relies on the fact that documents of an information retrieval system are clustered by some arbitrary method. The advantages of a clustered document approach specially in a distributed memory information retrieval system are well-known.

Since genetic algorithms work with a set of candidate solutions, parallelisation based on a Single Instruction Multiple Data (SIMD) paradigm seems to be the natural way to obtain a speedup. Using this approach, the population of strings is distributed among the processing elements. Each string is processed independently. The performance gain comes from the parallel execution of the strings, and hence, it is heavily dependent on the population size. The approach is favoured for genetic algorithms' applications where the parameter set for a particular run is well-known in advance, and where such applications require a big population size to solve the problem. DDAP fits nicely into the above requirements. The aim of the parallelisation is two-fold: the first one is to speedup the allocation process in DDAP which usually consists of thousands of documents and has to use a big population size, and second, it can be seen as an attempt to port the genetic algorithm's processes into SIMD machines.

References (29)

  • O Frieder et al.

    On the allocation of documents in multiprocessor information retrieval system

  • D Ghazfan et al.

    Document allocation in distributed memory information retrieval system

  • D Ghazfan et al.

    Massively parallel genetic algorithms

  • Cited by (0)

    View full text