Abstract
OpenMP task is the most significant feature in the new specification, which provides us with a way to handle unstructured parallelism. This paper presents a runtime library of task model on Cell heterogeneous multicore, which attempts to maximally utilize architectural advantages. Moreover, we propose two optimizations, an original scheduling strategy and an adaptive cut-off technique. The former combines breadth-first with the work-first scheduling strategy. While the latter adaptively chooses the optimal cut-off technique between max number of tasks and max task recursion level according to application characteristics. Performance evaluations indicate that our scheme achieves a speedup factor from 3.4 to 7.2 compared to serial executions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Frigo, M., Leiserson, C.E., Randall, K.H.: The Implementation of the Cilk-5 Multithreaded Language. In: ACM SIGPLAN conference on Programming language design and implementation, pp. 212–223. ACM Press, New York (1998)
Reinders, J.: Intel Threading Building Blocks. Technical report, O’Reilly Media Inc. (2007)
T.X.D. Team: Report on the Experimental Language X10. Technical report, IBM (2006)
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. J. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007)
The Fortress Language Specification. Version 1.0 B (2007)
OpenMP Application Program Interface, Version 3.0. OpenMP Architecture Review Board (2008)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 101–110. Springer, Heidelberg (2008)
Shah, S., Haab, G., Petersen, P., Throop, J.: Flexible Control Structures for Parallelism in OpenMP. In: 1st European Workshop OpenMP, pp. 1219–1239 (1999)
Teruel, X., Martorell, X., Duran, A., Ferrer, R., Ayguadé, E.: Support for OpenMP Tasks in Nanos v4. In: Proc. Conf. Center for Advanced Studies on Collaborative Research, pp. 256–259. ACM Press, New York (2007)
Teruel, X., Unnikrishnan, P., Martorell, X., et al.: Openmp tasks in ibm XL compilers. In: Proc. of the 2008 conference of the center for advanced studies on collaborative research, pp. 207–221. ACM Press, New York (2008)
Altevogt, P.: IBM BladeCenter QS21 Hardware Performance. IBM Technical White Paper WP101245 (2008)
Leijen, D., Hall, J.: Optimize Managed Code for Multi-Core Machines. J. MSDN Magazine, 1098–1116 (2007)
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. In: International Conference on Object Oriented Programming, Systems, Languages and Applications, pp. 227–242. ACM Press, New York (2009)
Balart, J., Duran, A., Gonza‘lez, M., Martorell, X., et al.: Nanos Mercurium: A Research Compiler for OpenMP. In: 6th European Workshop OpenMP, pp. 103–109 (2004)
Ayguadé, E., Duran, A., Hoeflinger, J., et al.: An Experimental Evaluation of the New OpenMP Tasking Model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008)
Cody, A., James, L., Lei, H., Barbara, C.: OpenMP 3.0 Tasking Implementation in OpenUH. In: 2nd Open64 Workshop at CGO (2009)
Rico, A., Ramirez, A., Valero, M.: Available task-level parallelism on the cell BE. J. Scientific Programming 17, 59–76 (2009)
Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: CellSs: a programming model for the Cell BE Architecture. In: Proc. of the 2006 ACM/IEEE Conference on Supercomputing. ACM Press, New York (2006)
Certner, O., Li, Z., Palatin, P., et al.: A Practical Approach for Reconciling High and Predictable Performance in Non-Regular Programs. In: 1st Workshop on Programmability Issues for Multi-Core Computers, pp. 740–745. ACM Press, New York (2008)
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: Proc. of the 2008 ACM/IEEE Conf. on Supercomputing, pp. 1–11. IEEE Press, Los Alamitos (2008)
Martorell, X., Labarta, J., Navarro, N., Ayguad´e, E.: Nano-Threads Library Design, Implementation and Evaluation. Technical Report UPC-DAC-1995-33, DAC/UPC (1995)
Cong, G., Kodali, S., Krishnamoorthy, S., et al.: Solving large, irregular graph problems using adaptive work-stealing. In: Proc. of the International Conference on Parallel Processing, pp. 536–545. IEEE Press, New York (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cao, Q., Hu, C., He, H., Huang, X., Li, S. (2010). Support for OpenMP Tasks on Cell Architecture. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13136-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-13136-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13135-6
Online ISBN: 978-3-642-13136-3
eBook Packages: Computer ScienceComputer Science (R0)