ABSTRACT
Parallel architectures are the way of the future, but are notoriously difficult to program. In addition to the low-level constructs they often present (e.g., locks, DMA, and non-sequential memory models), most parallel programming environments admit data races: the environment may make nondeterministic scheduling choices that can change the function of the program.
We believe the solution is model-based design, where the programmer is presented with a constrained higher-level language that prevents certain unwanted behavior. In this paper, we describe a compiler for the SHIM scheduling-independent concurrent language that generates code for the Cell Broadband heterogeneous multicore processor. The complexity of the code our compiler generates relative to the source illustrates how difficult it is to manually write code for the Cell.
We demonstrate the efficacy of our compiler on two examples. While the SHIM language is (by design) not ideal for every algorithm, it works well for certain applications and simplifies the parallel programming process, especially on the Cell architecture.
- S. V. Adve et al. Shared memory consistency models: A tutorial. Computer, 29(12):66--76, 1996. Google ScholarDigital Library
- V. Agarwal et al. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Intl. Symp. Computer Architecture (ISCA), pages 248--259, June 2000. Google ScholarDigital Library
- T. W. Ainsworth and T. M. Pinkston. Characterizing the Cell EIB on-chip network. IEEE Micro, 27(5):6--14, Sept. 2007. Google ScholarDigital Library
- R. D. Blumofe et al. Cilk: An efficient multithreaded runtime system. In Principles and Practice of Parallel Programming (PPoPP), pages 207--216, Santa Barbara, CA, July 1995. Google ScholarDigital Library
- A. C. Chow et al. A programming example: Large FFT on the Cell Broadband Engine. In Global Signal Processing Expo (GSPx), Santa Clara, CA, Oct. 2005. (from IBM)Google Scholar
- S. A. Edwards and O. Tardieu. SHIM: A deterministic model for heterogeneous embedded systems. In Embedded Software (Emsoft), pages 37--44, Jersey City, New Jersey, Sept. 2005. Google ScholarDigital Library
- S. A. Edwards, N. Vasudevan, and O. Tardieu. Programming shared memory multiprocessors with deterministic message-passing concurrency: Compiling SHIM to Pthreads. In Proc. Design, Automation, and Test in Europe (DATE), pages 1498--1503, Munich, Germany, Mar. 2008. Google ScholarDigital Library
- A. E. Eichenberger et al. Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture. IBM Sys. J., 45(1):59--84, 2006. Google ScholarDigital Library
- A. E. Eichenberger et al. Optimizing compiler for the CELL processor. In Par. Arch. and Compilation Techniques (PACT), pages 161--172, Saint Louis, MO, Sept. 2005. Google ScholarDigital Library
- K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the memory hierarchy. In Supercomputing (SC), Tampa, FL, 2006. Article 83. Google ScholarDigital Library
- B. Gedik et al. CellSort: High performance sorting on the Cell processor. In Very Large Data Bases (VLDB), pp. 1286--1297, Vienna, Austria, Sept. 2007. Google ScholarDigital Library
- C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666--677, Aug. 1978. Google ScholarDigital Library
- IBM. Cell Broadband Engine Architecture v1.02, Oct. 2007.Google Scholar
- IBM. Example Library API Reference v3.0, Sept. 2007.Google Scholar
- J. A. Kahle et al. Introduction to the Cell multiprocessor. IBM J. of R&D, 49(4/5):589--604, July/Sep. 2005. Google ScholarDigital Library
- G. Kahn. The semantics of a simple language for parallel programming. In Information Processing 74: IFIP Congress 74, pages 471--475, Stockholm, Sweden, Aug. 1974.Google Scholar
- M. Kistler, M. Perrone, and F. Petrini. Cell multiprocessor communication network: Built for speed. IEEE Micro, 26(3):10--23, May-June 2006. Google ScholarDigital Library
- E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proc. IEEE, 75(9):1235--1245, Sept. 1987.Google ScholarCross Ref
- The Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, June 1995. Version 1.1.Google Scholar
- M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, and T. Nakatani. MPI microtask for programming the Cell Broadband Engine processor. IBM Systems Journal, 45(1):85--102, 2006. Google ScholarDigital Library
- OpenMP Arch. Review Board, www.openmp.org. OpenMP C and C++ Application Program Interface, 2002. Ver. 2.0.Google Scholar
- F. Petrini et al. Multicore surprises: Lessons learned from optimizing Sweep3D on the Cell Broadband Engine. In Intl. Parallel and Distributed Processing Symposium (IPDPS), pages 1--10, Long Beach, CA, Mar. 2007.Google Scholar
- D. Pham et al. The design and implementation of a first-generation Cell processor. In Solid-State Cir. Conf. (ISSCC), v. 1, pp. 184--185, San Francisco, CA, Feb. 2005.Google ScholarCross Ref
- T. Saidani, S. Piskorski, L. Lacassagne, and S. Bouaziz. Parallelization schemes for memory optimization on the Cell processor: A case study of image processing algorithm. In Workshop on Memory Performance: Dealing with Applications, Systems and Architecture (MEDEA), pages 9--16, Brastov, Romania, Sept. 2007. Google ScholarDigital Library
- O. Tardieu and S. A. Edwards. Scheduling-independent threads and exceptions in SHIM. In Embedded Software (Emsoft), pages 142--151, Seoul, Korea, Oct. 2006. Google ScholarDigital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Compiler Construction (CC), volume 2304 of LNCS, pages 179--196, Grenoble, France, Apr. 2002. Google ScholarDigital Library
- N. Vasudevan and S. A. Edwards. Static deadlock detection for the SHIM concurrent language. In Formal Methods and Models for Codesign, Anaheim, CA, June 2008.Google ScholarDigital Library
Index Terms
- Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore
Recommendations
Compositional deadlock detection for rendezvous communication
EMSOFT '09: Proceedings of the seventh ACM international conference on Embedded softwareConcurrent programming languages are growing in importance with the advent of multi-core systems. However, concurrent programs suffer from problems, such as data races and deadlock, absent from sequential programs. Unfortunately, traditional race and ...
Parallel concurrent ML
ICFP '09: Proceedings of the 14th ACM SIGPLAN international conference on Functional programmingConcurrent ML (CML) is a high-level message-passing language that supports the construction of first-class synchronous abstractions called events. This mechanism has proven quite effective over the years and has been incorporated in a number of other ...
Parallel concurrent ML
ICFP '09Concurrent ML (CML) is a high-level message-passing language that supports the construction of first-class synchronous abstractions called events. This mechanism has proven quite effective over the years and has been incorporated in a number of other ...
Comments