A Synchronous Mode MPI Implementation on the Cell BETM Architecture

Krishna, Murali; Kumar, Arun; Jayam, Naresh; Senthilkumar, Ganapathy; Baruah, Pallav K.; Sharma, Raghunath; Kapoor, Shakti; Srinivasan, Ashok

doi:10.1007/978-3-540-74742-0_86

Murali Krishna¹,
Arun Kumar¹,
Naresh Jayam¹,
Ganapathy Senthilkumar¹,
Pallav K. Baruah¹,
Raghunath Sharma¹,
Shakti Kapoor² &
…
Ashok Srinivasan³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4742))

Included in the following conference series:

International Symposium on Parallel and Distributed Processing and Applications

771 Accesses
5 Citations

Abstract

The Cell Broadband Engine shows much promise in high performance computing applications. The Cell is a heterogeneous multi-core processor, with the bulk of the computational work load meant to be borne by eight co-processors called SPEs. Each SPE operates on a distinct 256 KB local store, and all the SPEs also have access to a shared 512 MB to 2 GB main memory through DMA. The unconventional architecture of the SPEs, and in particular their small local store, creates some programming challenges. We have provided an implementation of core features of MPI for the Cell to help deal with this. This implementation views each SPE as a node for an MPI process, with the local store used as if it were a cache. In this paper, we describe synchronous mode communication in our implementation, using the rendezvous protocol, which makes MPI communication for long messages efficient. We further present experimental results on the Cell hardware, where it demonstrates good performance, such as throughput up to 6.01 GB/s and latency as low as 0.65 μs on the pingpong test. This demonstrates that it is possible to efficiently implement MPI calls even on the simple SPE cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

An Introduction to Compiling for the Cell Broadband Engine Architecture, Part 4: Partitioning Large Tasks (February 2006), http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile4-i.html
An Introduction to Compiling for the Cell Broadband Engine Architecture, Part 5: Managing Memory, Analyzing Calling Frequencies for Maximum SPE Partitioning Optimization (February 2006), http://www-128.ibm.com/developerworks/edu/pa-dw-pa-cbecompile5-i.html
Buntinas, D., Mercier, G., Gropp, W.: Implementation and Shared-Memory Evaluation of MPICH2 over the Nemesis Communication Subsystem. In: Proceedings of the Euro PVM/MPI Conference (2006)
Google Scholar
Buntinas, D., Mercier, G., Gropp, W.: Data Transfers Between Processes in an SMP System: Performance Study and Application to MPI. In: Proceedings of the International Conference on Parallel Processing, pp. 487–496 (2006)
Google Scholar
Buntinas, D., Mercier, G., Gropp, W.: Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem. In: Proceedings of the International Symposium on Cluster Computing and the Grid (2006)
Google Scholar
Cell Broadband Engine Programming Handbook, Version 1.0 (April 19, 2006), http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/9F820A5FFA3ECE8C8725716A0062585F/file/BE_Handbook_v1.0_10May2006.pdf
Fatahalian, K., Knight, T.J., Houston, M., Erez, M.: Sequoia: Programming the Memory Hierarchy. In: Löwe, W., Südholt, M. (eds.) SC 2006. LNCS, vol. 4089. Springer, Heidelberg (2006)
Google Scholar
Gropp, W., Lusk, E.: A High Performance MPI Implementation on a Shared Memory Vector Supercomputer. Parallel Computing 22, 1513–1526 (1997)
Article MATH Google Scholar
Gropp, W., Lusk, E.: Reproducible Measurements of MPI Performance Characteristics, Argonne National Lab Technical Report ANL/MCS/CP-99345 (1999)
Google Scholar
Jin, H.-W., Panda, D.K.: LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster. In: Proceedings of the International Conference on Parallel Processing, pp. 184–191 (2005)
Google Scholar
MultiCore Framework: Harnessing the Performance of the Cell BE^TM Processor, Mercury Computer Systems, Inc. (2006), http://www.mc.com/literature/literature_files/MCF-ds.pdf
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: MPI Microtask for Programming the Cell Broadband EngineTM Processor. IBM Systems Journal 45, 85–102 (2006)
Article Google Scholar
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI - The Complete Reference, The MPI Core, 2nd edn. vol. 1. MIT Press, Cambridge (1998)
Google Scholar
Tang, H., Shen, K., Yang, T.: Program Transformation and Runtime Support for Threaded MPI Execution on Shared-Memory Machines. ACM Transactions on Programming Languages and Systems 22, 673–700 (2000)
Article Google Scholar
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The Potential of the Cell Processor for Scientific Computing. In: Proceedings of the ACM International Conference on Computing Frontiers (2006)
Google Scholar
Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Sharma, R., Srinivasan, A., Kapoor, S.: A Buffered Mode MPI Implementation for the Cell BE^TM Processor. In: Proceedings of the International Conference on Computational Science (ICCS), Lecture Notes in Computer Science (to appear, 2007)
Google Scholar
Krishna, M., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Sharma, R., Srinivasan, A., Kapoor, S.: Brief Announcement: Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BE^TM Architecture. In: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (to appear, 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Mathematics and Computer Science, Sri Sathya Sai University, Prashanthi Nilayam, India
Murali Krishna, Arun Kumar, Naresh Jayam, Ganapathy Senthilkumar, Pallav K. Baruah & Raghunath Sharma
IBM, Austin,
Shakti Kapoor
Dept. of Computer Science, Florida State University,
Ashok Srinivasan

Authors

Murali Krishna
View author publications
You can also search for this author in PubMed Google Scholar
Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Jayam
View author publications
You can also search for this author in PubMed Google Scholar
Ganapathy Senthilkumar
View author publications
You can also search for this author in PubMed Google Scholar
Pallav K. Baruah
View author publications
You can also search for this author in PubMed Google Scholar
Raghunath Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Shakti Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Ashok Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ivan Stojmenovic Ruppa K. Thulasiram Laurence T. Yang Weijia Jia Minyi Guo Rodrigo Fernandes de Mello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krishna, M. et al. (2007). A Synchronous Mode MPI Implementation on the Cell BE^TM Architecture. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds) Parallel and Distributed Processing and Applications. ISPA 2007. Lecture Notes in Computer Science, vol 4742. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74742-0_86

Download citation

DOI: https://doi.org/10.1007/978-3-540-74742-0_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74741-3
Online ISBN: 978-3-540-74742-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics