Skip to main content

HPF on fine-grain distributed shared memory: Early experience

  • Compiling HPF
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Abstract

This paper examines the performance of a suite of HPF applications on a network of workstations using two different compilation approaches: generating explicit message-passing code, and generating code for a shared address space provided by a finegrain distributed shared memory system (DSM). Preliminary experiments indicate that the DSM approach performs with usually a small slowdown compared to the message passing approach on regular programs, yet enables efficient execution of non-regular programs.

This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Material Command, USAF, under grant #F33615-94-1-1525 and ARPA order no. B550, an NSF NYI Award CCR-9357779, NSF Grant MIP-9225097, DOE Grant DE-FGO2-93ER25176, and donations from Digital Equipment Corporation, Sun Microsystems, and The Portland Group. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sarita V. Adve and Mark D. Hill. Weak Ordering — A New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2–14, May 1990.

    Google Scholar 

  2. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstation s. IEEE Computer, pages 18–28, February 1996.

    Google Scholar 

  3. Jennifer M. Anderson, Saman P. Amarasinghe, and Monica S. Lam. Data and Computation Transformations for Multiprocessors. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), July 1995.

    Google Scholar 

  4. Jennifer M. Anderson and Monica S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI), pages 112–125, June 1993.

    Google Scholar 

  5. William Blume and Rudolf Eigemann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643–656, November 1992.

    Article  Google Scholar 

  6. Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.

    Article  Google Scholar 

  7. Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, February 1995.

    Google Scholar 

  8. David Callahan and Ken Kennedy. Compiling Programs for Distributed-Memory Multiprocessors. The Journal of Supercomputing, 2:151–169, 1988.

    Article  Google Scholar 

  9. Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. Global Communication Analysis and Optimization. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation (PLDI), May 1996.

    Google Scholar 

  10. Satish Chandra, James R. Larus, and Anne Rogers. Where is Time Spent in Message-Passing and Shared-Memory Programs? In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 61–75, October 1994.

    Google Scholar 

  11. Michal Cierniak and Wei Li. Unifying Data and Control Transformations for Distributed Shared-Memory Machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (PLDI), June 1995.

    Google Scholar 

  12. G. Cybenko, J. Bruner, S. Ho, and S. Sharma. Parallel Computing and the Perfect Benchmarks. Technical Report 1191, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, November 1991.

    Google Scholar 

  13. Babak Falsafi, Alvin Lebeck, Steven Reinhardt, Ioannis Schoinas, Mark D. Hill, James Larus, Anne Rogers, and David Wood. Application-Specific Protocols for User-Level Shared Memory. In Proceedings of Supercomputing '94, pages 380–389, November 1994.

    Google Scholar 

  14. Ian Foster. Task Parallelism and High Performance Languages. IEEE Parallel and Distributed Technology: Systems and Applications, 2(3):-–-, Fall 1994.

    Google Scholar 

  15. Hans Michael Gemdt Automatic Parallelization for Distributed-Memory Multiprocessor Systems. PhD thesis, Rheinischen Friedrich-Wilhelms-Universit”at, 1989.

    Google Scholar 

  16. Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Philip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, June 1990.

    Google Scholar 

  17. Manish Gupta and Prithviraj Banerjee. PARADIGM: A Compiler for Automatic Data Distribution on Multicomputers. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.

    Google Scholar 

  18. Manish Gupta, Edith Schonberg, and Harinia Srivivasan. A Unified Framework for Optimizing Communication in Data-Parallel Programs. IEEE Transactions on Parallel and Distributed Systems, 7(7):689–704, July 1996.

    Article  Google Scholar 

  19. Mark D. Hill, James R. Larus, and David A. Wood. Tempest: A Substrate for Portable Parallel Programs. In COMPCON '95, pages 327–332, San Francisco, California, March 1995. IEEE Computer Society.

    Google Scholar 

  20. Charles Koelbel and Piyush Mehrotra. Compiling Global Name-Space Parallel Loops for Distributed Execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.

    Article  Google Scholar 

  21. Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr., and Mary E. Zosel. High Performance Fortran Handbook. MIT Press, Cambridge, Mass., 1994.

    Google Scholar 

  22. D.A. Koufaty, X. Chen, D.K. Poulsen, and J. Torrellas. Data Forwarding in Scalable Shared-Memory Multprocessors. In Proceedings of the 1995 International Conference on Supercomputing, page ?, 1995.

    Google Scholar 

  23. James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1–4): 165–180, March–December 1994.

    Article  Google Scholar 

  24. Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.

    Google Scholar 

  25. Ravi Mirchandaney, Seema Hiranandani, and Ajay Sethi. Improving the Performance of DSM Systems via Compiler Involvement. In Proceedings of Supercomputing '94, pages 763–772, 1994.

    Google Scholar 

  26. Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth Proceedings of Symposium on Architectural Support for Programming Languages and Operations Systems, pages 62–73, October 1992.

    Google Scholar 

  27. D. Padua, R. Eigenmann, J. Hoeflinger, P. Peterson, P. Tu, S. Weatherford, and K. Faigin. Polaris: A New-Generation Parallelizing Compiler for MPP's. Technical Report 1306, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, June 1993.

    Google Scholar 

  28. Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 325–337, April 1994.

    Google Scholar 

  29. Steven K. Reinhardt, Robert W. Pfile, and David A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.

    Google Scholar 

  30. Anne Marie Rogers. Compiling for Locality of Reference. Technical Report TR 91-1195, Department of Computer Science, Cornell University, March 1991. PhD thesis.

    Google Scholar 

  31. Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-Time Parallelization and Scheduling of Loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.

    Article  Google Scholar 

  32. Ioannis Schoinas, Babak Falsafi, Mark D. Hill, James R. Larus, Christopher E. Lucas, Shubhendu S. Mukherjee, Steven K. Reinhardt, Eric Schnarr, and David A. Wood. Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations. Technical Report 1307, Computer Sciences Department, University of Wisconsin-Madison, March 1996.

    Google Scholar 

  33. Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. Finegrain Access Control for Distributed Shared Memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 297–307, October 1994.

    Google Scholar 

  34. Per Stenstrom, Truman Joe, and Anoop Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 80–91, 1992.

    Google Scholar 

  35. Chau-Wen Tseng. An Optimizing FORTRAN D Compiler for Distributed Memory MIMD Machines. PhD thesis, Rice University, January 1993. Also available as Rice CRPC-TR93291-S.

    Google Scholar 

  36. Chau-Wen Tseng. Compiler Optimization for Eliminating Barrier Synchronization. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 144–155, August 1995.

    Google Scholar 

  37. Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennesy. SUIF: An Infrastructure for Reseasrch on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12):31–37, December 1994.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chandra, S., Larus, J.R. (1997). HPF on fine-grain distributed shared memory: Early experience. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017269

Download citation

  • DOI: https://doi.org/10.1007/BFb0017269

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63091-3

  • Online ISBN: 978-3-540-69128-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics