Abstract
The STI Cell Broadband Engine (BE) is a highly capable heterogeneous multicore processor with large bandwidth and computing power perfectly suited for numerical simulation. However, all performance benefits come at the price of productivity since more responsibility is put to the programmer. In particular, programming with the IBM Cell SDK is hampered by not only taking care of a parallel decomposition of the problem but also of managing all data transfers and organizing all computations in a performance-beneficial manner. While raising complexity of program development, this approach enables efficient utilization of available resources.
In the present work we investigate the potential and the performance behavior of Cell’s parallel cores for a resource-demanding and bandwidth-bound multigrid solver for a three-dimensional Poisson problem. The chosen multigrid method based on a parallel Gauß-Seidel and Jacobi smoothers combines mathematical optimality with a high degree of inherent parallelism. We investigate dedicated code optimization strategies on the Cell platform and evaluate associated performance benefits by a comprehensive analysis. Our results show that the Cell BE platform can give tremendous benefits for numerical simulation based on well-structured data. However, it is inescapable that isolated, vendor-specific, but performance-optimal programming approaches need to be replaced by portable and generic concepts like OpenCL – maybe at the price of performance loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AMD. ATI Radeon HD 5870 GPU Feature Summary (February 2010), http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870/Pages/ati-radeon-hd-5870-specifications.aspx
Axelsson, O., Barker, V.A.: Finite element solution of boundary value problems. Academic Press, London (1984)
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors. SIAM Review 51(1), 129–159 (2009)
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: SC 2008: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1–12 (2008)
Göddeke, D., Strzodka, R., Turek, S.: Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations (part 2: Doubleprecision GPUs). Technical report, Technical University Dortmund (2008)
Khronos Group. OpenCL (February 2010), http://www.khronos.org/opencl/
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitive Approach. Elsevier Academic Press, Amsterdam (2006)
Heuveline, V., Lukarski, D., Weiss, J.-P.: RapidMind Stream Processing on the Playstation 3 for a Chorin-based Navier-Stokes solver. In: Proc. of 1st Int. Workshop on New Frontiers in High-performance and Hardware-aware Computing, Lake Como, pp. 31–38. Universitätsverlag Karlsruhe (2008)
Heuveline, V., Lukarski, D., Weiss, J.-P.: Performance of a Stream Processing Model on the Cell BE NUMA Architecture Applied to a 3d Conjugate Gradient Poisson Solver. International Journal of Computational Science 3(5), 473–490 (2009)
IBM. OpenCL Development Kit for Linux on Power (February 2010), http://www.alphaworks.ibm.com/tech/opencl
IBM. Programming the Cell Broadband Engine Architecture: Examples and Best Practices (August 2008), http://www.redbooks.ibm.com/abstracts/sg247575.html
Intel. Single-chip Cloud Computer (February 2010), http://techresearch.intel.com/UserFiles/en-us/File/terascale/SCC-Overview.pdf
NVIDIA. NVIDIA’s Next Generation CUDA Compute Architecture: FERMI (February 2010), http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
NVIDIA. Tesla C1060 Computing Processor (February 2010), http://www.nvidia.com/object/product_tesla_c1060_us.html
Oboril, F.: Parallele 3D Mehrgitter-Methoden auf der STI Cell BE Architektur. Diploma thesis, Karlsruhe Institute of Technology, Germany (2009)
Ritter, D.: A Fast Multigrid Solver for Molecular Dynamics on the Cell Broadband Engine. Master’s thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2008)
Trottenberg, U., Oosterlee, C.W., Schüller, A.: Multigrid. Elsevier Academic Press, Amsterdam (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Oboril, F., Weiss, JP., Heuveline, V. (2010). Parallel 3D Multigrid Methods on the STI Cell BE Architecture. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore-Challenge. Lecture Notes in Computer Science, vol 6310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16233-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-16233-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16232-9
Online ISBN: 978-3-642-16233-6
eBook Packages: Computer ScienceComputer Science (R0)