Skip to main content
Log in

A scalable implementation of barrier synchronization using an adaptive combining tree

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Barrier synchronization is commonly used for synchronizing processors prior to a join operation and to enforce data dependencies during the execution of parallelized loops. Simple software implementations of barrier synchronization can result in memory hot-spots, especially in large scale shared-memory multiprocessors containing hundreds of processors and memory modules communicating through an interconnection network. A software combining tree can be used to substantially reduce memory contention due to hot-spots. However, such an implementation results inO(logn) latency in recognition of barrier synchronization, wheren is the number of processors. In this paper anadaptive software combining tree is used to implement a scalable barrier withO(1) recognition latency. The processors that arrive early at the barrier adapt the combining tree so that it has a structure appropriate for reducing the latency for the processors that arrive later. We also show how adaptive combining trees can be used to implement the fuzzy barrier. The fuzzy barrier mechanism reduces the idling of processors at the barriers by allowing the processors to execute useful instructions while they are waiting at the barrier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. E. D. Brooks, The Butterfly Barrier,International Journal of Parallel Programming,15(4):295–307 (August 1986).

    Google Scholar 

  2. D. Hansgen, R. Finkel, and U. Manber, Two Algorithms for Barrier Synchronization,International Journal of Parallel Programming,17(1):1–18 (February 1988).

    Google Scholar 

  3. P. C. Yew, N. F. Tzeng, and D. H. Lawrie, Distributing Hot-Spot Addressing in Large Scale Multiprocessors,IEEE Transactions on Computers,C-36(4):388–395 (April 1987).

    Google Scholar 

  4. D. H. Lawrie, Access and Alignment of Data in an Array Processor,IEEE Transactions on Computers,C-24:1145–1155 (December 1975).

    Google Scholar 

  5. D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, Parallel Supercomputing Today and the Cedar Approach,Science,231:967–974 (February 1986).

    Google Scholar 

  6. A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer-Designing a MIMD Shared Memory Parallel Machine,IEEE Transactions on Computers,C-32(2):175–189 (February 1983).

    Google Scholar 

  7. G. F. Pfister, The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture, InProc. of the International Conf. on Parallel Processing, pp. 764–771 (August 1985).

  8. R. Gupta, The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors, InProc. of the Third International Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 54–64 (April 1989).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, R., Hill, C.R. A scalable implementation of barrier synchronization using an adaptive combining tree. Int J Parallel Prog 18, 161–180 (1989). https://doi.org/10.1007/BF01407897

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01407897

Key Words

Navigation