Skip to main content
Log in

Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance comparable to messagepassing compilers for dense-matrix kernels. However, synchronization and load imbalance are significant sources of overhead. In this paper, we investigate the impact of compilation techniques for eliminating barrier synchronization overhead in software DSMs. Our compile-time barrier elimination algorithm extends previous techniques in three ways: (1) we perform inexpensive communication analysis through local subscript analysis when using chunk iteration partitioning for parallel loops; (2) we exploit delayed updates in lazy-release-consistency DSMs to eliminate barriers guarding only anti-dependences; (3) when possible we replace barriers with customized nearest-neighbor synchronization. Experiments on an IBM SP-2 indicate these techniques can improve parallel performance by 20% on average and by up to 60% for some applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. C. Koelbel, D. Loveman, R. Schreiber, G. Steele, Jr., and M. Zosel, The High Performance Fortran Handbook, The MIT Press, Cambridge, Massachusetts (1994).

    Google Scholar 

  2. A. Cox, S. Dwarkadas, H. Lu, and W. Zwaenepoel, Evaluating the performance of Software Distributed Shared Memory as a Target for Parallelizing Compilers, Proc. 11th International Parallel Processing Symposium, Geneva, Switzerland (April 1997).

  3. P. Keleher and C.-W. Tseng, Enhancing Software DSM for Compiler-Parallelized Applications, Proc. 11th International Parallel Processing Symposium, Geneva, Switzerland (April 1997).

  4. P. Keleher, The Relative Importance of Concurrent Writers and Weak Consistency Models, 16th International Conference on Distributed Computing Systems, Hong Kong (May 1996).

  5. M. Hall, S. Amarasinghe, B. Murphy, S. Liao, and M. Lam, Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler, Proc. Supercomputing '95, San Diego, California (December 1995).

  6. P. Keleher, A. L. Cox, and W. Zwaenepoel, Lazy Release Consistency for Software Distributed Shared Memory, Proc. 19th Annual International Symposium on Computer Architecture, pp. 13–21 (May 1992).

  7. S. Dwarkadas, P. Keleher, A. L. Cox, and W. Zwaenepoel, Evaluation of Release Consistent Software Distributed Shared Memory on Emerging Network Technology, Proc. 20th Annual International Symposium on Computer Architecture, pp. 244–255 (May 1993).

  8. C.-W. Tseng, Compiler Optimizations for Eliminating Barrier Synchronization, Proc. Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, California (July 1995).

  9. P. Tang, P. Yew, and C. Zhu, Compiler Techniques for Data Synchronization in Nested Parallel Loops, Proc. 1990 ACM International Conference on Supercomputing, Amsterdam, The Netherlands (June 1990).

  10. Z. Li, Compiler Algorithms for Event Variable Synchronization, Proc. 1991 ACM International Conference on Supercomputing, Cologne, Germany (June 1991).

  11. P. Hatcher and M. Quinn, Data-Parallel Programming on MIMD Computers, The MIT Press, Cambridge, Massachusetts (1991).

    Google Scholar 

  12. M. Philippsen and E. Heinz, Automatic Synchronization Elimination in Synchronous FORALLs, Frontiers '95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, Virginia (February 1995).

  13. S. Prakash, M. Dhagat, and R. Bagrodia, Synchronization Issues in Data-Parallel Languages, Proc. Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon (August 1993).

  14. R. Cytron, J. Lipkis, and E. Schonberg, A Compiler-Assisted Approach to SPMD Execution, Proc. Supercomputing '90, New York, New York (November 1990).

  15. S. Hiranandani, K. Kennedy, and C.-W. Tseng, Compiling Fortran D for MIMD Distributed-Memory Machines, Communications of the ACM, 35(8):66–80 (August 1992).

    Google Scholar 

  16. M. O'Boyle and F. Bodin, Compiler Reduction of Synchronization in Shared Virtual Memory Systems, Proc. 1995 ACM International Conference on Supercomputing, Barcelona, Spain (July 1995).

  17. J. B. Carter, J. K. Bennett, and W. Zwaenepoel, Implementation and Performance of Munin, Proc. 13th ACM Symposium on Operating Systems Principles, pp. 152–164 (October 1991).

  18. E. Stohr and M. O'Boyle, A Graph Based Approach to Barrier Synchronization Minimization, Proc. 1997 ACM International Conference on Supercomputing, Vienna, Austria (July 1997).

  19. S. Mukherjee, S. Sharma, M. Hill, J. Larus, A. Rogers, and J. Saltz, Efficient Support for Irregular Applications on Distributed-Memory Machines, Proc. Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, California (July 1995).

  20. G. Viswanathan and J. R. Larus, Compiler-Directed Shared-Memory Communication for Iterative Parallel Computations, Proc. Supercomputing '96, Pittsburgh, Pennsylvania (November 1996).

  21. S. Chandra and J. R. Larus, HPF on Fine-Grain Distributed Shared Memory: Early Experience, Proc. Ninth Workshop Lang. and Compilers for Parallel Computing, San Jose, California (August 1996).

  22. S. Chandra and J. R. Larus, Optimizing Communication in HPF Programs for Fine-Grain Distributed Shared Memory, Proc. Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, Nevada (June 1997).

  23. E. Granston and H. Wishoff, Managing Pages in Shared Virtual Memory Systems: Getting the Compiler into the Game, Proc. 1993 ACM International Conference on Supercomputing, Tokyo, Japan ( July 1993).

  24. R. Mirchandaney, S. Hiranandani, and A. Sethi, Improving the Performance of DSM Systems via Compiler Involvement, Proc. Supercomputing '94, Washington, DC (November 1994).

  25. S. Dwarkadas, A. Cox, and W. Zwaenepoel, An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System, Proc. Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOSVIII ), Boston, Massachusetts (October 1996).

  26. R. Rajamony and A. L. Cox, A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs, Proc. Fourth International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS ) (February 1996).

  27. N.-F. Tzeng and A. Kongmunvattana, Distributed Shared Memory Systems with Improved Barrier Synchronization and Data Transfer, Proc. 1997 ACM International Conference on Supercomputing, Vienna, Austria ( July 1997).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, H., Tseng, CW. & Keleher, P. Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs. International Journal of Parallel Programming 26, 591–612 (1998). https://doi.org/10.1023/A:1018724631720

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018724631720

Navigation