Skip to main content

Achieving low cost synchronization in a multiprocessor system

  • Submitted Presentations
  • Conference paper
  • First Online:
PARLE '89 Parallel Architectures and Languages Europe (PARLE 1989)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 365))

Abstract

A barrier are a commonly used mechanism for synchronizing processors executing in parallel. A software implementation of the barrier mechanism using shared variables has two major drawbacks. First, the synchronization overhead is high and second, when a processor reaches the barrier it must idle until all processors reach the barrier. In this paper, the fuzzy barrier, a mechanism that avoids the above drawbacks, is presented. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by using software techniques to find useful instructions that can be executed by a processor while it awaits synchronization. The hardware implementation eliminates busy waiting at barriers, provides a mask that allows disjoint subsets of processors to synchronize simultaneously, and provides multiple barriers by associating a tag with a barrier. Compiler techniques are presented for constructing barrier regions which consist of instructions that a processor can execute while it is waiting for other processors to reach the barrier. The larger the barrier region, the more likely it is that none of the processors will have to stall. Initial observations show that barrier regions can be large and the use of program transformations can significantly increase their size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Tang and P.C. Yew, “Processor Self-Scheduling for Multiple-Nested Parallel Loops,” Proc. International Conf. on Parallel Processing, pp. 528–535, August, 1986.

    Google Scholar 

  2. R. Gupta, “Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems,” Tech. Report TR-88-019, Philips Laboratories, Briarcliff Manor, NY, 1988.

    Google Scholar 

  3. H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.

    Google Scholar 

  4. P.C. Yew, N.F. Tzeng, and D.H. Lawrie, “Distributing Hot-Spot Addressing in Large Scale Multiprocessors,” IEEE Trans. on Computers, vol. C-36, no. 4, April, 1987.

    Google Scholar 

  5. C.D. Polychronopoulos, “Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design,” IEEE Trans. on Computers, vol. 37, no. 8, pp. 991–1004, August, 1988.

    Article  Google Scholar 

  6. J.R. Ellis, Bulldog: A Compiler for VLIW Architectures, MIT Press, 1986.

    Google Scholar 

  7. R. Gupta and M.L. Soffa, “A Reconfigurable LIW Architecture,” Proc. of the International Conf. on Parallel Processing, pp. 893–900, August, 1987.

    Google Scholar 

  8. A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986.

    Google Scholar 

  9. J. Hennessy and T. Gross, “Postpass Code Optimization of Pipeline Constraints,” ACM Trans. on Programming Languages and Systems, vol. 3, no. 5, pp. 422–448, 1983.

    Article  Google Scholar 

  10. W.C. Hsu, “Register Allocation and Code Scheduling for Load/Store Architectures,” Dept. of Computer Science; Ph.D. dissertation, University of Wisconsin, Madison, 1987.

    Google Scholar 

  11. D.J Kuck, R.H. Kuhn, D.A. Padua, B. Leasure, and M. Wolfe, “Dependence Graphs and Compiler Optimizations,” 8th Annual ACM Symp. on Principles of Programming Languages, pp. 207–218, 1981.

    Google Scholar 

  12. R. Gupta, “The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors,” to appear Third International Conf. on Architectural Support for Programming Languages and Operating Systems, April 1989.

    Google Scholar 

  13. R. Cytron, “Doacross: Beyond Vectorization for Multiprocessors,” Proc. International Conf. on Parallel Processing, pp. 836–844, August, 1986.

    Google Scholar 

  14. “Multimax Technical Summary,” Encore Computer Corporation, Marlboro MA, 1987.

    Google Scholar 

  15. A. Osterhaug, “Guide to Parallel Programming on Sequent Computer Systems,” Sequent Computer Systems, Inc., Beaverton, Oregan, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eddy Odijk Martin Rem Jean-Claude Syre

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gupta, R., Epstein, M. (1989). Achieving low cost synchronization in a multiprocessor system. In: Odijk, E., Rem, M., Syre, JC. (eds) PARLE '89 Parallel Architectures and Languages Europe. PARLE 1989. Lecture Notes in Computer Science, vol 365. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540512845_33

Download citation

  • DOI: https://doi.org/10.1007/3540512845_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-51284-4

  • Online ISBN: 978-3-540-46183-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics