Skip to main content

A Synchronous Mode MPI Implementation on the Cell BETM Architecture

  • Conference paper
Parallel and Distributed Processing and Applications (ISPA 2007)

Abstract

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multi-core processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 μs on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. An Introduction to Compiling for the Cell Broadband Engine Architecture, Part 4: Partitioning Large Tasks (February 2006), http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile4-i.html

  2. An Introduction to Compiling for the Cell Broadband Engine Architecture, Part 5: Managing Memory, Analyzing Calling Frequencies for Maximum SPE Partitioning Optimization (February 2006), http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile5-i.html

  3. Buntinas, D., Mercier, G., Gropp, W.: Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem. In: Proceedings of the Euro PVM/MPI Conference (2006)

    Google Scholar 

  4. Buntinas, D., Mercier, G., Gropp, W.: Data Transfers Between Processes in an SMP System: Performance Study and Application to MPI. In: Proceedings of the International Conference on Parallel Processing, pp. 487–496 (2006)

    Google Scholar 

  5. Buntinas, D., Mercier, G., Gropp, W.: Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem. In: Proceedings of the International Symposium on Cluster Computing and the Grid (2006)

    Google Scholar 

  6. Cell Broadband Engine Programming Handbook, Version 1.0 (April 19, 2006), http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/9F820A5FFA3ECE8C8725716A0062585F/file/BE_Handbook_v1.0_10May2006.pdf

  7. Fatahalian, K., Knight, T.J., Houston, M., Erez, M.: Sequoia: Programming the Memory Hierarchy. In: Löwe, W., Südholt, M. (eds.) SC 2006. LNCS, vol. 4089. Springer, Heidelberg (2006)

    Google Scholar 

  8. Gropp, W., Lusk, E.: A High Performance MPI Implementation on a Shared Memory Vector Supercomputer. Parallel Computing 22, 1513–1526 (1997)

    Article  MATH  Google Scholar 

  9. Gropp, W., Lusk, E.: Reproducible Measurements of MPI Performance Characteristics, Argonne National Lab Technical Report ANL/MCS/CP-99345 (1999)

    Google Scholar 

  10. Jin, H.-W., Panda, D.K.: LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster. In: Proceedings of the International Conference on Parallel Processing, pp. 184–191 (2005)

    Google Scholar 

  11. MultiCore Framework: Harnessing the Performance of the Cell BETM Processor, Mercury Computer Systems, Inc. (2006), http://www.mc.com/literature/literature_files/MCF-ds.pdf

  12. Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI Microtask for Programming the Cell Broadband EngineTM Processor. IBM Systems Journal 45, 85–102 (2006)

    Article  Google Scholar 

  13. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI - The Complete Reference, The MPI Core, 2nd edn. vol. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  14. Tang, H., Shen, K., Yang, T.: Program Transformation and Runtime Support for Threaded MPI Execution on Shared-Memory Machines. ACM Transactions on Programming Languages and Systems 22, 673–700 (2000)

    Article  Google Scholar 

  15. Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The Potential of the Cell Processor for Scientific Computing. In: Proceedings of the ACM International Conference on Computing Frontiers (2006)

    Google Scholar 

  16. Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Sharma, R., Srinivasan, A., Kapoor, S.: A Buffered Mode MPI Implementation for the Cell BETM Processor. In: Proceedings of the International Conference on Computational Science (ICCS), Lecture Notes in Computer Science (to appear, 2007)

    Google Scholar 

  17. Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Sharma, R., Srinivasan, A., Kapoor, S.: Brief Announcement: Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BETM Architecture. In: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (to appear, 2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ivan Stojmenovic Ruppa K. Thulasiram Laurence T. Yang Weijia Jia Minyi Guo Rodrigo Fernandes de Mello

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krishna, M. et al. (2007). A Synchronous Mode MPI Implementation on the Cell BETM Architecture. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74742-0_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74741-3

  • Online ISBN: 978-3-540-74742-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics