Skip to main content

Compiler techniques for fine-grain execution on workstation clusters using PAPERS

  • Starting Small: Fine-Grain Parallelism
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 892))

Abstract

Just a few years ago, parallel computers were tightly-coupled SIMD, VLIW, or MIMD machines. Now, they are clusters of workstations connected by communication networks yielding ever-higher bandwidth (e.g., Ethernet, FDDI, HiPPI, ATM). For these clusters, compiler research is centered on techniques for hiding huge synchronization and communication latencies, etc. — in general, trying to make parallel programs based on fine-grain aggregate operations fit an existing network execution model that is optimized for point-to-point block transfers.

In contrast, we suggest that the network execution model can and should be altered to more directly support fine-grain aggregate operations. By augmenting workstation hardware with a simple barrier mechanism (PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization), and appropriate operating system hooks for its direct use from user processes, the user is given a variety of efficient aggregate operations and the compiler is provided with a more static (i.e., more predictable), lower-latency, target execution model. This paper centers on compiler techniques that use this new target model to achieve more efficient parallel execution: first, techniques that statically schedule aggregate operations across processors, second, techniques that implement SIMD and VLIW execution.

This work was supported in part by the Office of Naval Research (ONR) under grant number N00014-91-J-4013 and by the National Science Foundation (NSF) under award number 9015696-CDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T.B. Berg and H.J. Siegel, “Instruction Execution Trade-Offs for SIMD vs. MIMD vs. Mixed Mode Parallelism,” 5th International Parallel Processing Symposium, April 1991, pp. 301–308.

    Google Scholar 

  2. U. Bruening, W. K. Giloi, and W. Schroeder-Preikschat, “Latency Hiding in Message-Passing Architectures,” 8th International Parallel Processing Symposium, Cancun, Mexico, April, 1994, pp. 704–709.

    Google Scholar 

  3. E. A. Brewer and B. C. Kuszmaul, “How to Get Good Performance from the CM-5 Data Network,” 8th International Parallel Processing Symposium, Cancun, Mexico, April 1994, pp. 858–867.

    Google Scholar 

  4. C.J. Brownhill and A. Nicolau, Percolation Scheduling for Non-VLIW Machines, Technical Report 90-02, University of California at Irvine, Irvine, California, January 1990.

    Google Scholar 

  5. W. E. Cohen, H. G. Dietz, and J. B. Sponaugle, “Dynamic Barrier Architecture For Multi-Mode Fine-Grain Parallelism Using Conventional Processors,” Int'l Conf. on Parallel Processing, August 1994, vol. 1, pp. 93–96.

    Google Scholar 

  6. R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman, “A VLIW Architecture for a Trace Scheduling Compiler,” IEEE Trans. on Computers, vol. C-37, no. 8, pp. 967–979, Aug. 1988.

    Google Scholar 

  7. H. G. Dietz, Coding Multiway Branches Using Customized Hash Functions, Purdue University School of Electrical Engineering Technical Report TR-EE 92-31, July 1992.

    Google Scholar 

  8. H. G. Dietz, T. Muhammad, J. B. Sponaugle, and T. Mattox, PAPERS: Purdue's Adapter for Parallel Execution and Rapid Synchronization, Purdue University School of Electrical Engineering Technical Report TR-EE 94-11, March 1994.

    Google Scholar 

  9. H. G. Dietz, M.T. O'Keefe, and A. Zaafrani, “Static Scheduling for Barrier MIMD Architectures,” The Journal of Supercomputing, vol. 5, pp. 263–289, 1992.

    Google Scholar 

  10. J. A. Fisher, “The VLIW Machine: A Multiprocessor for Compiling Scientific Code,” IEEE Computer, July 1984, pp. 45–53.

    Google Scholar 

  11. R. Gupta, “The Fuzzy Barrier: A Mechanism for the High Speed Synchronization of Processors,” Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, April 1989, pp. 54–63.

    Google Scholar 

  12. P. J. Hatcher, A. J. Lapadula, R. R. Jones, M. J. Quinn, and R. J. Anderson, “A Production-Quality C* Compiler for Hypercube Multicomputers” Third ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Williamsburg, Virginia, April 1991, pp. 73–82.

    Google Scholar 

  13. R. Keryell and N. Paris. “Activity Counter: New Optimization for the Dynamic Scheduling of SIMD Control Flow,” Proc. Int'l Conf. Parallel Processing, pp. II 184–187, August 1993.

    Google Scholar 

  14. L. T. Liu, and D. E. Culler, “Measurements of Active Messages Performance on the CM-5,” University of California Berkeley, CS csd-94-807(dir), UCB//CSD-94-807, May 1994. 36 pages.

    Google Scholar 

  15. MasPar Computer Corporation, MasPar Programming Language (ANSI C compatible MPL) Reference Manual, Software Version 2.2, Document Number 9302-0001, Sunnyvale, California, November 1991.

    Google Scholar 

  16. T. J. Parr, H. G. Dietz, and W. E. Cohen, “PCCTS Reference Manual (version 1.00),” ACM SIGPLAN Notices, Feb. 1992, pp. 88–165.

    Google Scholar 

  17. C. D. Polychronopolous, “Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design,” IEEE Trans. Comput., vol. C-37, no. 8, pp. 991–1004, Aug. 1989.

    Google Scholar 

  18. Thinking Machines Corporation, C * Programming Guide, Thinking Machines Corporation, Cambridge, Massachusetts, November 1990.

    Google Scholar 

  19. D. W. Watson, Compile-Time Selection of Parallel Modes in an SIMD/SPMD Heterogeneous Parallel Environment, Ph.D. Dissertation, Purdue University School of Electrical Engineering, August 1993.

    Google Scholar 

  20. J. Wiegand, “Cooperative development of Linux,” Proceedings of the 1993 IEEE International Professional Communication Conference, Philadelphia, PA, October 1993, pp. 386–390.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keshav Pingali Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dietz, H.G., Cohen, W.E., Muhammad, T., Mattox, T.I. (1995). Compiler techniques for fine-grain execution on workstation clusters using PAPERS. In: Pingali, K., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1994. Lecture Notes in Computer Science, vol 892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0025869

Download citation

  • DOI: https://doi.org/10.1007/BFb0025869

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58868-9

  • Online ISBN: 978-3-540-49134-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics