Abstract
Multicore systems provide potential to improve the performance of the applications. However, substantial programming effort is required to exploit the power of the parallelism. This paper presents a single source compiler to map the data-parallel programs onto Cell Broadband Engine. Based on the distributed memory model, the compiler performs automatic data distribution and generates SPMD programs with message-passing primitives for Cell. We evaluate our compiler using a range of computation intensive benchmarks, high performance is achieved on Cell platform. In contrast to OpenMP, our method can fully exploit data locality through managing the shared data using inter-processor communication instead of accessing main memory, which significantly reduces the off-chip memory access overhead.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hofstee, H.P.: Power efficient processor design and the Cell processor. In: HPCA 2005, pp. 258–262 (2005)
ATI (2009), http://ati.amd.com
NVIDIA (2009), http://www.nvidia.com
Eichenberger, A.E., O’Brien, K.M., O’Brien, K., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M.: Optimizing Compiler for a Cell processor. In: PACT 2005, September 2005, pp. 161–172 (2005)
Top500 (2008), http://www.top500.org
Knight, T.J., Park, J.Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for Explicitly Managed Memory Hierarchies. In: PPoPP 2007, pp. 226–236 (2007)
Wang, Z., O’Boyle, M.F.P.: Mapping parallelism to multi-cores: a machine learning based approach. In: PPoPP 2009, pp. 75–84 (2009)
Zhao, Y., Kennedy, K.: Dependence-Based Code Generation for a CELL Processor. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 64–79. Springer, Heidelberg (2007)
O’Brien, K., O’Brien, K.M., Sura, Z., Chen, T., Zhang, T.: Supporting OpenMP on Cell. International Journal of Parallel Programming 36, 289–311 (2008)
Multicore Communications API Specification V1.063 (2009), www.multicore-association.org
Li, J., Chen, M.: The Data Alignment Phase in Compiling Programs for Distributed-Memory Machines. J. Parallel and Distributed Computing 13, 213–221 (1991)
Lee, P.Z.: Techniques for compiling programs on distributed memory multicomputers. Parallel Computing 21(12), 1895–1923 (1995)
IBM. Software Development Kit for Multicore Acceleration Version 3.0 Programming Tutorial, http://www.ibm.com
Lee, P.Z.: Efficient algorithms for data distribution on distributed memory parallel computers. IEEE Transactions on Parallel and Distributed Systems (1997)
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a Programming Model for the Cell BE Architecture. In: Löwe, W., Südholt, M. (eds.) SC 2006. LNCS, vol. 4089, pp. 5–16. Springer, Heidelberg (2006)
Chen, T., Sura, Z., O’Brien, K.M., O’Brien, K.: Optimizing the Use of Static Buffers for DMA on a CELL Chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) KSEM 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2007)
Chen, T., Zhang, T., Sura, Z., Tallada, M.G.: Prefetching Irregular References for Software Cache on Cell. In: CGO 2008, pp. 155–164 (2008)
Vujić, N., Gonzalez, M., Martorell, X., Ayguadé, E.: Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 31–46. Springer, Heidelberg (2008)
Gonzalez, M., Vujić, N., Martorell, X., Ayguadé, E., Eichenberger, A.E., Chen, T., Sura, Z., Zhang, T., O’Brien, K., O’Brien, K.M.: Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. In: PACT 2008, pp. 292–303 (2008)
Kudlur, M., Mahlke, S.: Orchestrating the Execution of Stream Programs on Multicore Platforms. In: PLDI 2008, pp. 114–124 (2008)
Kennedy, K., Kremer, U.: Automatic data layout for High Performance Fortran. Rice Univeristy (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, M., Bodin, F., Matz, S. (2010). Automatic Data Distribution for Improving Data Locality on the Cell BE Architecture. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-13374-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)