Skip to main content

rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4917))

Abstract

With multicore processors becoming the standard architecture, programmers are faced with the challenge of developing applications that capitalize on multicore’s advantages. This paper presents rMPI, which leverages the on-chip networks of multicore processors to build a powerful abstraction with which many programmers are familiar: the MPI programming interface. To our knowledge, rMPI is the first MPI implementation for multicore processors that have on-chip networks. This study uses the MIT Raw processor as an experimentation and validation vehicle, although the findings presented are applicable to multicore processors with on-chip networks in general. Likewise, this study uses the MPI API as a general interface which allows parallel tasks to communicate, but the results shown in this paper are generally applicable to message passing communication. Overall, rMPI’s design constitutes the marriage of message passing communication and on-chip networks, allowing programmers to employ a well-understood programming model to a high performance multicore processor architecture.

This work assesses the applicability of the MPI API to multicore processors with on-chip interconnect, and carefully analyzes overheads associated with common MPI operations. This paper contrasts MPI to lower-overhead network interface abstractions that the on-chip networks provide. The evaluation also compares rMPI to hand-coded applications running directly on one of the processor’s low-level on-chip networks, as well as to a commercial-quality MPI implementation running on a cluster of Ethernet-connected workstations. Results show speedups of 4x to 15x for 16 processor cores relative to one core, depending on the application, which equal or exceed performance scalability of the MPI cluster system. However, this paper ultimately argues that while MPI offers reasonable performance on multicores when, for instance, legacy applications must be run, its large overheads squander the multicore opportunity. Performance of multicores could be significantly improved by replacing MPI with a lighter-weight communications API with a smaller memory footprint.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Intel pentium d, http://www.intel.com/products/processor/pentium_d/

  2. Moore’s law 40th anniversary, http://www.intel.com/technology/mooreslaw/index.htm

  3. The multicore association communications api, http://www.multicore-association.org/workgroup/ComAPI.html

  4. Transputer reference manual. Prentice Hall International (UK) Ltd. Hertfordshire, UK (1998)

    Google Scholar 

  5. Borkar, S., Cohn, R., Cox, G., Gleason, S., Gross, T., Kung, H.T., Lam, M., Moore, B., Peterson, C., et al.: iwarp: An integrated solution to high-speed parallel computing. In: Proceedings of Supercomputing (1998)

    Google Scholar 

  6. Burns, G., Daoud, R., Vaigl, J.: LAM: An Open Cluster Environment for MPI. In: Proceedings of Supercomputing Symposium, pp. 379–386 (1994)

    Google Scholar 

  7. Espasa, et al.: Tarantula: A Vector Extension to the Alpha Architecture. In: ISCA, pp. 281–292 (2002)

    Google Scholar 

  8. T.J., et al.: POWER4 system microarchitecture. IBM Journal of Research and Development 46(1), 5–25 (2002)

    Article  Google Scholar 

  9. Forum, M.: A message passing interface standard. Technical report, University of Tennessee, Knoxville (1994)

    Google Scholar 

  10. Forum, M.P.I.: Mpi: A message-passing interface standard (1995), http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html

  11. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (September 2004)

    Google Scholar 

  12. Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Lamb, A.A., Leger, C., Wong, J., Hoffmann, H., Maze, D., Amarasinghe, S.: A Stream Compiler for Communication-Exposed Architectures. In: Conference on Architectural Support for Programming Languages and Operating Systems, pp. 291–303 (2002)

    Google Scholar 

  13. Griffin, P.: CFlow. Master’s thesis, Lab for Computer Science, MIT (2005)

    Google Scholar 

  14. Gropp, W., Huss-Lederman, S., et al.: MPI: The Complete Reference, vol. 2. The MIT Press, Cambridge (1998)

    Google Scholar 

  15. Gropp, W.D., Lusk, E.: User’s Guide for mpich, a Portable Implementation of MPI. In: ANL-96/6. Mathematics and Computer Science Division, Argonne National Laboratory (1996)

    Google Scholar 

  16. Hinrichs, S., Kosak, C., O’Hallaron, D., Stricker, T., Take, R.: An architecture for optimal all-to-all personalized communication. In: Proceedings of Symposium on Parallelism in Algorithms and Architectures (1994)

    Google Scholar 

  17. Karamcheti, V., Chien, A.A.: Software overhead in messaging layers: Where does the time go? In: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, pp. 51–60 (1994)

    Google Scholar 

  18. Kozyrakis, C.E., Patterson, D.: A new direction for computer architecture research. Journal of the ACM (1997)

    Google Scholar 

  19. Kubiatowicz, J.: Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor. PhD thesis, MIT (1998)

    Google Scholar 

  20. Lee, W., Barua, R., Frank, M., Srikrishna, D., Babb, J., Sarkar, V., Amarasinghe, S.: Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In: Proceedings of the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, pp. 46–57 (October 1998)

    Google Scholar 

  21. Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W., Horowitz, M.: Smart memories: A modular reconfigurable architecture. In: Proceedings of the 27th International Symposium on Computer Architecture, pp. 161–170 (2000)

    Google Scholar 

  22. Mai, et al.: Smart Memories: A Modular Reconfigurable Architecture. In: ISCA (2000)

    Google Scholar 

  23. Nagarajan, R., Sankaralingam, K., Burger, D., Keckler, S.W.: A design space evaluation of grid processor architectures. In: International Symposium on Microarchitecture (MICRO) (2001)

    Google Scholar 

  24. Pacheco, P.S.: Parallel Programming with MPI. Morgan Kaufmann Publishers, San Francisco (1997)

    MATH  Google Scholar 

  25. Psota, J.: rMPI: An MPI-Compliant Message Passing Library for Tiled Architectures. Master’s thesis, Lab for Computer Science, MIT (2005), http://cag.lcs.mit.edu/jim/publications/ms.pdf

  26. Quinn, M.J.: Parallel Programming in C with MPI and OpenMP. McGraw Hill, New York (2004)

    Google Scholar 

  27. Gropp, W., Huss-Lederman, S., et al.: MPI: The Complete Reference. The MIT Press, Cambridge (1998)

    Google Scholar 

  28. Squyres, J.M., Lumsdaine, A.: A Component Architecture for LAM/MPI. In: Dongarra, J.J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 379–387. Springer, Heidelberg (2003)

    Google Scholar 

  29. Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: Wavescalar. In: In the 36th Annual International Symposium on Microarchitecture (MICRO-36) (2003)

    Google Scholar 

  30. Taylor, M.B.: The Raw Processor Specification, ftp://ftp.cag.lcs.mit.edu/pub/raw/documents/RawSpec99.pdf

  31. Taylor, et al.: The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. IEEE Micro, 25–35 ( March 2002)

    Google Scholar 

  32. Taylor, et al.: Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. In: HPCA (2003)

    Google Scholar 

  33. Taylor, et al.: Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In: ISCA (2004)

    Google Scholar 

  34. William Gropp, A.S., Lusk. E.: A high-performance, portable implementation of the mpi message passing interface standard, http://www-unix.mcs.anl.gov/mpi/mpich/papers/mpicharticle/paper.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Per Stenström Michel Dubois Manolis Katevenis Rajiv Gupta Theo Ungerer

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Psota, J., Agarwal, A. (2008). rMPI: Message Passing on Multicore Processors with On-Chip Interconnect. In: Stenström, P., Dubois, M., Katevenis, M., Gupta, R., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77560-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77559-1

  • Online ISBN: 978-3-540-77560-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics