HPF on fine-grain distributed shared memory: Early experience

Chandra, Satish; Larus, James R.

doi:10.1007/BFb0017269

Satish Chandra¹ &
James R. Larus¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1239))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

100 Accesses
1 Citations

Abstract

This paper examines the performance of a suite of HPF applications on a network of workstations using two different compilation approaches: generating explicit message-passing code, and generating code for a shared address space provided by a finegrain distributed shared memory system (DSM). Preliminary experiments indicate that the DSM approach performs with usually a small slowdown compared to the message passing approach on regular programs, yet enables efficient execution of non-regular programs.

This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Material Command, USAF, under grant #F33615-94-1-1525 and ARPA order no. B550, an NSF NYI Award CCR-9357779, NSF Grant MIP-9225097, DOE Grant DE-FGO2-93ER25176, and donations from Digital Equipment Corporation, Sun Microsystems, and The Portland Group. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sarita V. Adve and Mark D. Hill. Weak Ordering — A New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2–14, May 1990.
Google Scholar
Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstation s. IEEE Computer, pages 18–28, February 1996.
Google Scholar
Jennifer M. Anderson, Saman P. Amarasinghe, and Monica S. Lam. Data and Computation Transformations for Multiprocessors. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), July 1995.
Google Scholar
Jennifer M. Anderson and Monica S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI), pages 112–125, June 1993.
Google Scholar
William Blume and Rudolf Eigemann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643–656, November 1992.
Article Google Scholar
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.
Article Google Scholar
Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, February 1995.
Google Scholar
David Callahan and Ken Kennedy. Compiling Programs for Distributed-Memory Multiprocessors. The Journal of Supercomputing, 2:151–169, 1988.
Article Google Scholar
Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. Global Communication Analysis and Optimization. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation (PLDI), May 1996.
Google Scholar
Satish Chandra, James R. Larus, and Anne Rogers. Where is Time Spent in Message-Passing and Shared-Memory Programs? In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 61–75, October 1994.
Google Scholar
Michal Cierniak and Wei Li. Unifying Data and Control Transformations for Distributed Shared-Memory Machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (PLDI), June 1995.
Google Scholar
G. Cybenko, J. Bruner, S. Ho, and S. Sharma. Parallel Computing and the Perfect Benchmarks. Technical Report 1191, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, November 1991.
Google Scholar
Babak Falsafi, Alvin Lebeck, Steven Reinhardt, Ioannis Schoinas, Mark D. Hill, James Larus, Anne Rogers, and David Wood. Application-Specific Protocols for User-Level Shared Memory. In Proceedings of Supercomputing '94, pages 380–389, November 1994.
Google Scholar
Ian Foster. Task Parallelism and High Performance Languages. IEEE Parallel and Distributed Technology: Systems and Applications, 2(3):-–-, Fall 1994.
Google Scholar
Hans Michael Gemdt Automatic Parallelization for Distributed-Memory Multiprocessor Systems. PhD thesis, Rheinischen Friedrich-Wilhelms-Universit”at, 1989.
Google Scholar
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Philip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, June 1990.
Google Scholar
Manish Gupta and Prithviraj Banerjee. PARADIGM: A Compiler for Automatic Data Distribution on Multicomputers. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.
Google Scholar
Manish Gupta, Edith Schonberg, and Harinia Srivivasan. A Unified Framework for Optimizing Communication in Data-Parallel Programs. IEEE Transactions on Parallel and Distributed Systems, 7(7):689–704, July 1996.
Article Google Scholar
Mark D. Hill, James R. Larus, and David A. Wood. Tempest: A Substrate for Portable Parallel Programs. In COMPCON '95, pages 327–332, San Francisco, California, March 1995. IEEE Computer Society.
Google Scholar
Charles Koelbel and Piyush Mehrotra. Compiling Global Name-Space Parallel Loops for Distributed Execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.
Article Google Scholar
Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr., and Mary E. Zosel. High Performance Fortran Handbook. MIT Press, Cambridge, Mass., 1994.
Google Scholar
D.A. Koufaty, X. Chen, D.K. Poulsen, and J. Torrellas. Data Forwarding in Scalable Shared-Memory Multprocessors. In Proceedings of the 1995 International Conference on Supercomputing, page ?, 1995.
Google Scholar
James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1–4): 165–180, March–December 1994.
Article Google Scholar
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.
Google Scholar
Ravi Mirchandaney, Seema Hiranandani, and Ajay Sethi. Improving the Performance of DSM Systems via Compiler Involvement. In Proceedings of Supercomputing '94, pages 763–772, 1994.
Google Scholar
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth Proceedings of Symposium on Architectural Support for Programming Languages and Operations Systems, pages 62–73, October 1992.
Google Scholar
D. Padua, R. Eigenmann, J. Hoeflinger, P. Peterson, P. Tu, S. Weatherford, and K. Faigin. Polaris: A New-Generation Parallelizing Compiler for MPP's. Technical Report 1306, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, June 1993.
Google Scholar
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 325–337, April 1994.
Google Scholar
Steven K. Reinhardt, Robert W. Pfile, and David A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
Google Scholar
Anne Marie Rogers. Compiling for Locality of Reference. Technical Report TR 91-1195, Department of Computer Science, Cornell University, March 1991. PhD thesis.
Google Scholar
Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-Time Parallelization and Scheduling of Loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.
Article Google Scholar
Ioannis Schoinas, Babak Falsafi, Mark D. Hill, James R. Larus, Christopher E. Lucas, Shubhendu S. Mukherjee, Steven K. Reinhardt, Eric Schnarr, and David A. Wood. Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations. Technical Report 1307, Computer Sciences Department, University of Wisconsin-Madison, March 1996.
Google Scholar
Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. Finegrain Access Control for Distributed Shared Memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 297–307, October 1994.
Google Scholar
Per Stenstrom, Truman Joe, and Anoop Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 80–91, 1992.
Google Scholar
Chau-Wen Tseng. An Optimizing FORTRAN D Compiler for Distributed Memory MIMD Machines. PhD thesis, Rice University, January 1993. Also available as Rice CRPC-TR93291-S.
Google Scholar
Chau-Wen Tseng. Compiler Optimization for Eliminating Barrier Synchronization. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 144–155, August 1995.
Google Scholar
Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennesy. SUIF: An Infrastructure for Reseasrch on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12):31–37, December 1994.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department, University of Wisconsin-Madison, 1210 W. Dayton Street, 53706, Madison, WI, USA
Satish Chandra & James R. Larus

Authors

Satish Chandra
View author publications
You can also search for this author in PubMed Google Scholar
James R. Larus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

David Sehr Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chandra, S., Larus, J.R. (1997). HPF on fine-grain distributed shared memory: Early experience. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017269

Download citation

DOI: https://doi.org/10.1007/BFb0017269
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics