Abstract
This paper examines the performance of a suite of HPF applications on a network of workstations using two different compilation approaches: generating explicit message-passing code, and generating code for a shared address space provided by a finegrain distributed shared memory system (DSM). Preliminary experiments indicate that the DSM approach performs with usually a small slowdown compared to the message passing approach on regular programs, yet enables efficient execution of non-regular programs.
This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Material Command, USAF, under grant #F33615-94-1-1525 and ARPA order no. B550, an NSF NYI Award CCR-9357779, NSF Grant MIP-9225097, DOE Grant DE-FGO2-93ER25176, and donations from Digital Equipment Corporation, Sun Microsystems, and The Portland Group. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Wright Laboratory Avionics Directorate or the U.S. Government.
Preview
Unable to display preview. Download preview PDF.
References
Sarita V. Adve and Mark D. Hill. Weak Ordering — A New Definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2–14, May 1990.
Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, and Willy Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstation s. IEEE Computer, pages 18–28, February 1996.
Jennifer M. Anderson, Saman P. Amarasinghe, and Monica S. Lam. Data and Computation Transformations for Multiprocessors. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), July 1995.
Jennifer M. Anderson and Monica S. Lam. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI), pages 112–125, June 1993.
William Blume and Rudolf Eigemann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643–656, November 1992.
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.
Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Compiling High Performance Fortran. In Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing, February 1995.
David Callahan and Ken Kennedy. Compiling Programs for Distributed-Memory Multiprocessors. The Journal of Supercomputing, 2:151–169, 1988.
Soumen Chakrabarti, Manish Gupta, and Jong-Deok Choi. Global Communication Analysis and Optimization. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation (PLDI), May 1996.
Satish Chandra, James R. Larus, and Anne Rogers. Where is Time Spent in Message-Passing and Shared-Memory Programs? In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 61–75, October 1994.
Michal Cierniak and Wei Li. Unifying Data and Control Transformations for Distributed Shared-Memory Machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation (PLDI), June 1995.
G. Cybenko, J. Bruner, S. Ho, and S. Sharma. Parallel Computing and the Perfect Benchmarks. Technical Report 1191, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, November 1991.
Babak Falsafi, Alvin Lebeck, Steven Reinhardt, Ioannis Schoinas, Mark D. Hill, James Larus, Anne Rogers, and David Wood. Application-Specific Protocols for User-Level Shared Memory. In Proceedings of Supercomputing '94, pages 380–389, November 1994.
Ian Foster. Task Parallelism and High Performance Languages. IEEE Parallel and Distributed Technology: Systems and Applications, 2(3):-–-, Fall 1994.
Hans Michael Gemdt Automatic Parallelization for Distributed-Memory Multiprocessor Systems. PhD thesis, Rheinischen Friedrich-Wilhelms-Universit”at, 1989.
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Philip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, June 1990.
Manish Gupta and Prithviraj Banerjee. PARADIGM: A Compiler for Automatic Data Distribution on Multicomputers. In Proceedings of the 1993 ACM International Conference on Supercomputing, Tokyo, Japan, July 1993.
Manish Gupta, Edith Schonberg, and Harinia Srivivasan. A Unified Framework for Optimizing Communication in Data-Parallel Programs. IEEE Transactions on Parallel and Distributed Systems, 7(7):689–704, July 1996.
Mark D. Hill, James R. Larus, and David A. Wood. Tempest: A Substrate for Portable Parallel Programs. In COMPCON '95, pages 327–332, San Francisco, California, March 1995. IEEE Computer Society.
Charles Koelbel and Piyush Mehrotra. Compiling Global Name-Space Parallel Loops for Distributed Execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440–451, October 1991.
Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr., and Mary E. Zosel. High Performance Fortran Handbook. MIT Press, Cambridge, Mass., 1994.
D.A. Koufaty, X. Chen, D.K. Poulsen, and J. Torrellas. Data Forwarding in Scalable Shared-Memory Multprocessors. In Proceedings of the 1995 International Conference on Supercomputing, page ?, 1995.
James R. Larus. Compiling for Shared-Memory and Message-Passing Computers. ACM Letters on Programming Languages and Systems, 2(1–4): 165–180, March–December 1994.
Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.
Ravi Mirchandaney, Seema Hiranandani, and Ajay Sethi. Improving the Performance of DSM Systems via Compiler Involvement. In Proceedings of Supercomputing '94, pages 763–772, 1994.
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth Proceedings of Symposium on Architectural Support for Programming Languages and Operations Systems, pages 62–73, October 1992.
D. Padua, R. Eigenmann, J. Hoeflinger, P. Peterson, P. Tu, S. Weatherford, and K. Faigin. Polaris: A New-Generation Parallelizing Compiler for MPP's. Technical Report 1306, Center for Supercomputing Research & Development, University of Illinois at Urbana-Champaign, June 1993.
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 325–337, April 1994.
Steven K. Reinhardt, Robert W. Pfile, and David A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
Anne Marie Rogers. Compiling for Locality of Reference. Technical Report TR 91-1195, Department of Computer Science, Cornell University, March 1991. PhD thesis.
Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-Time Parallelization and Scheduling of Loops. IEEE Transactions on Computers, 40(5):603–612, May 1991.
Ioannis Schoinas, Babak Falsafi, Mark D. Hill, James R. Larus, Christopher E. Lucas, Shubhendu S. Mukherjee, Steven K. Reinhardt, Eric Schnarr, and David A. Wood. Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations. Technical Report 1307, Computer Sciences Department, University of Wisconsin-Madison, March 1996.
Ioannis Schoinas, Babak Falsafi, Alvin R. Lebeck, Steven K. Reinhardt, James R. Larus, and David A. Wood. Finegrain Access Control for Distributed Shared Memory. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 297–307, October 1994.
Per Stenstrom, Truman Joe, and Anoop Gupta. Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 80–91, 1992.
Chau-Wen Tseng. An Optimizing FORTRAN D Compiler for Distributed Memory MIMD Machines. PhD thesis, Rice University, January 1993. Also available as Rice CRPC-TR93291-S.
Chau-Wen Tseng. Compiler Optimization for Eliminating Barrier Synchronization. In Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 144–155, August 1995.
Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennesy. SUIF: An Infrastructure for Reseasrch on Parallelizing and Optimizing Compilers. ACM SIGPLAN Notices, 29(12):31–37, December 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chandra, S., Larus, J.R. (1997). HPF on fine-grain distributed shared memory: Early experience. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1996. Lecture Notes in Computer Science, vol 1239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0017269
Download citation
DOI: https://doi.org/10.1007/BFb0017269
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63091-3
Online ISBN: 978-3-540-69128-0
eBook Packages: Springer Book Archive