Skip to main content
Log in

A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

We compare the performance of three major programming models on a modern, 64-processor hardware cache-coherent machine, one of the two major types of platforms upon which high-performance computing is converging. We focus on applications that are either regular, predictable or at least do not require fine-grained dynamic replication of irregularly accessed data. Within this class, we use programs with a range of important communication patterns. We examine whether the basic parallel algorithm and communication structuring approaches needed for best performance are similar or different among the models, whether some models have substantial performance advantages over others as problem size and number of processors change, what the sources of these performance differences are, where the programs spend their time, and whether substantial improvements can be obtained by modifying either the application programming interfaces or the implementations of the programming models on this type of tightly-coupled multiprocessor platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

REFERENCES

  1. Message Passing Interface Forum, Document for a Standard Message Passing Interface (June 1993), http://www-c.mcs.anl.gov/mpi.

  2. J. P. Singh, A. Gupta, and J. L. Hennessy, Implications of Hierarchical N-Body Techniques for Multiprocessor Architecture, ACM Trans. Computer Syst. (May 1995).

  3. J. P. Singh, A. Gupta, and M. Levoy, Parallel Visualization Algorithms: Performance and Architectural Implications, IEEE Computer, 27(6) (June 1994).

  4. T. A. Ngo and L. Snyder, ON the Influence of Programming Models on Shared Memory Computer Performance, Scalable High Performance Computing Conf. (April 1992).

  5. S. Chandra, J. R. Larus, and A. Rogers, Where is Time Spent in Message Passing and Shared Memory Programs, ASPLOS (October 1994).

  6. S. C. Woo, J. P. Singh, and J. L. Hennessy, The Performance Advantages of Integrating Message-Passing in Cache-Coherent Multiprocessors, Proc. Architectural Support Progr. Lang. Oper. Syst. (1994).

  7. D. Kranz et al., Integrating Message-Passing and Shared Memory: Early Experience, Principles and Practice of Parallel Progr. (May 1993).

  8. T. LeBlanc and E. Markatos, Shared Memory vs. Message Passing in Shared Memory Multiprocessor, Fourth SPDP (1992).

  9. A. C. Klaiber and H. M. Levy, A Comparison of Message Passing and Shared Memory Architectures for Data Parallel Program, Proc. 21st Ann. Int'l. Symp. Computer Architecture (April 1994).

  10. H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel, Quantifying the Performance Differences Between {PVM} and Threadmarks, J. Parallel and Distributed Computing (June 1997).

  11. S. Karlsson and M. Brorsson, A Comparative Characterization of Communication Patterns in Applications using MPI and Shared Memory on the IBM SP2, Network-Based Parallel Computing, CANPC98 (1998).

  12. D. Cortesi, Origin2000 and onyx2 Performance Tuning and Optimization Guide (1997). http://techpubs.sgi.com.

  13. ANL/MSU MPI impementation, MPICH—A portable Implementation of MPI (June 1995), http://www-c.mcs.anl.gov/mpi/mpich.

  14. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The Splash-2 Programs: Characterization and Methodological Considerations, Proc. 22th Ann. Int'l. Symp. Com-puter Architecture (June 1995).

  15. D. H. Bailey, FFTs in External or Hierarchical Memories, J. Supercomputing, 4:23-25 (1990).

    Google Scholar 

  16. G. E. Blelloch et al., A Comparison of Sorting Algorithms for the Connection Machine CM-2,Symp. Parallel Algorithms and Architectures (July 1991).

  17. D. Jiang and J. P. Singh, Does Application Performance Scale on Modern Cache-Coherent Multiprocessors: A Case Study of a 128-Processor SGI Origin2000, Proc. 26th Int'l. Symp. Computer Architecture (May 1999).

  18. H. Shan, J. Feng, and H. Shan, Programming FFT on DSM Multiprocessors, HPC-ASIAN 2000 (May 2000).

  19. NASA Ames Research Center, The {NAS} Parallel Benchmarks 2.0, http://science.nas.nasa.gov/software/NPB (November 1995).

  20. X. Li and P. Lu, On the Versatiity of Parallel Sorting by Regular Sampling, Parallel Computing (1993).

  21. A. Sohn, P. Druschel, and W. Zwaenepoel, IO-Lite: A Unified I/O Buffering and Caching System, Proc. Third Symp. Oper. Syst. Design Implementation, New Orleans, Louisiana (February 1999).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shan, H., Singh, J.P. A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors. International Journal of Parallel Programming 29, 283–318 (2001). https://doi.org/10.1023/A:1011120120698

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011120120698

Navigation