Abstract
Driven by increasing specialization, multicore integration will soon enable large-scale chip multiprocessors (CMPs) with many processing cores. In order to take advantage of increasingly parallel hardware, independent tasks must be expressed at a fine level of granularity to maximize the available parallelism and thus potential speedup. However, the efficiency of this approach depends on the runtime system, which is responsible for managing and distributing the tasks. In this paper, we present a hierarchically distributed task pool for task parallel programming on Cell processors. By storing subsets of the task pool in the local memories of the Synergistic Processing Elements (SPEs), access latency and thus overheads are greatly reduced. Our experiments show that only a worker-centric runtime system that utilizes the SPEs for both task creation and execution is suitable for exploiting fine-grained parallelism.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49(4/5) (2005)
Johns, C.R., Brokenshire, D.A.: Introduction to the Cell Broadband Engine Architecture. IBM J. Res. Dev. 51(5) (2007)
Hoffmann, R., Prell, A., Rauber, T.: Dynamic Task Scheduling and Load Balancing on Cell Processors. In: Proc. of the 18th Euromicro Intl. Conference on Parallel, Distributed and Network-Based Processing (2010)
Griebel, M., Knapek, S., Zumbusch, G.: Numerical Simulation in Molecular Dynamics, 1st edn. Springer, Heidelberg (September 2007)
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a Programming Model for the Cell BE Architecture. In: Proc. of the 2006 ACM/IEEE conference on Supercomputing (2006)
Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it easier to program the Cell Broadband Engine processor. IBM J. Res. Dev. 51(5) (2007)
IBM: IBM Software Development Kit (SDK) for Multicore Acceleration Version 3.1, http://www.ibm.com/developerworks/power/cell
Mohr, E., Kranz, D.A., Halstead Jr., R.H.: Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. In: Proc. of the 1990 ACM conference on LISP and functional programming (1990)
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE conference on Supercomputing (2008)
Rico, A., Ramirez, A., Valero, M.: Available task-level parallelism on the Cell BE. Scientific Programming 17, 59–76 (2009)
Kumar, S., Hughes, C.J., Nguyen, A.: Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors. In: Proc. of the 34th Intl. Symposium on Computer Architecture (2007)
Kumar, S., Hughes, C.J., Nguyen, A.: Architectural Support for Fine-Grained Parallelism on Multi-core Architectures. Intel Technology Journal 11(3) (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoffmann, R., Prell, A., Rauber, T. (2010). Exploiting Fine-Grained Parallelism on Cell Processors. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)