Abstract
Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow since it requires execution of several instructions. Secondly, processors that are stalled waiting for other processors to reach the barrier cannot do any useful work. In this paper, the notion of thefuzzy barrier is presented, that avoids these drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of instructions such that a processor is ready to synchronize upon reaching the first instruction and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Hardware fuzzy barriers have been implemented as part of a RISC-based multi-processor system. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechaism.
Similar content being viewed by others
References
P. Tang and P. C. Yew, Processor Self-Scheduling for Multiple-Nested Parallel Loops, InProc. of the International Conf. on Parallel Processing, pp. 528–535 (August 1986).
R. Gupta, Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems, InProc. of the International Conf. on Parallel Processing, pp. 23–30 (August 1989).
H. S. Stone,High-Performance Computer Architecture, Addison-Wesley, Reading, Massachusetts (1987).
R. Gupta and C. R. Hill, A Scalable Implementation of Barrier Synchronization Using an Adaptive Combining Tree,International Journal of Parallel Programming 18(3):161–180 (June 1989).
P. C. Yew, N. F. Tzeng, and D. H. Lawrie, Distributing Hot-Spot Addressing in Large Scale Multiprocessors,IEEE Transactions on Computers C-30(4):388–395 (April 1987).
C. D. Polychronopoulos, Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design,IEEE Trans. on Computers 37(8):991–1004 (August 1988).
J. R. Ellis,Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, Massachusetts (1986).
R. Gupta and M. L. Soffa, Compilation Techniques for a Reconfigurable LIW Architecture,The Journal of Supercomputing 3:271–304 (1989).
R. Gupta and M. L. Soffa, A Reconfigurable LIW Architecture, InProc. of the International Conf. on Parallel Processing, pp. 893–900 (August 1987).
D. A. Patterson, Reduced Instruction Set Computers,Communication of the ACM 28(1):8–21 (January 1985).
J. Hennessy and T. Cross, Postpass Code Optimization of Pipeline Constraints,ACM Transactions on Programming Languages and Systems 3(5):422–448 (1983).
W. C. Hsu,Register Allocation and Code Scheduling for Load/Store Architectures Dept. of Computer Science; Ph.D. dissertation, University of Wisconsin, Madison (1987).
A. V. Aho, R. Sethi, and J. D. Ullman,Compilers: Principles, Techniques, and Tools, Addison-Wesley, Reading, Massachusetts (1986).
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe, Dependence Graphs and Compiler Optimizations, InProc. of the 8th Annual ACM Symp. on Principles of Programming Languages, pp. 207–218 (1981).
Multimax Technical Summary, Encore Computer Corporation, Marlboro, Massachusetts (1987).
A. Osterhaug,Guide to Parallel programming on Sequent Computer Systems, Sequent Computer Systems, Inc., Beaverton, Oregon (1987).
R. Cytron, Doacross: Beyond Vectorization for Multiprocessors, InProc. of the International Conf. on Parallel Processing, pp. 836–844 (August 1986).
R. Gupta, Loop Displacement: An Approach for Transforming and Scheduling Loops for Parallel Execution, InProc. of the Supercomputing Conf., (November 1990).
C. D. Polychronopoulos, D. J. Kuck, and D. A. Padua, Execution of Parallel loops on Parallel Processor Systems, InProc. of the International Conf. on Parallel Processing, pp. 235–242 (August 1986).
C. D. Polychronopoulos and D. J. Kuck, Guided Self Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,IEEE Trans. on Supercomputers C-36(12):1425–1439 (December 1987).
M. Byler, J. R. B. Davies, C. Huson, B. Leasure, and M. Wolfe, Multiple Version Loops, InProc. of the International Conf. on Parallel Processing, pp. 312–318 (August 1987).
Author information
Authors and Affiliations
Additional information
A preliminary version of this paper appeared inASPLOS '89.
This work was done while the author was at Philips Laboratories.
Rights and permissions
About this article
Cite this article
Gupta, R., Epstein, M. High speed synchronization of processors using fuzzy barriers. Int J Parallel Prog 19, 53–73 (1990). https://doi.org/10.1007/BF01407864
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01407864