Abstract
In this work we present a highly efficient implementation of OpenMP tasks. It is based on a runtime infrastructure architected for data locality, a crucial prerequisite for exploiting the NUMA nature of modern multicore multiprocessors. In addition, we employ fast work-stealing structures, based on a novel, efficient and fair blocking algorithm. Synthetic benchmarks show up to a 6-fold increase in throughput (tasks completed per second), while for a task-based OpenMP application suite we measured up to 87% reduction in execution times, as compared to other OpenMP implementations.
This work has been supported in part by the General Secretariat for Research and Technology and the European Commission (ERDF) through the Artemisia SMECY project (grant 100230).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agathos, S.N., Hadjidoukas, P.E., Dimakopoulos, V.V.: Design and Implementation of OpenMP Tasks in the OMPi Compiler. In: Proc. PCI 2011, 15th Panhellenic Conference on Informatics, pp. 265–269. IEEE, Kastoria (2011)
Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An Experimental Evaluation of the New OpenMP Tasking Model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) LCPC 2007. LNCS, vol. 5234, pp. 63–77. Springer, Heidelberg (2008)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)
Addison, C., LaGrone, J., Huang, L., Chapman, B.: OpenMP 3.0 tasking implementation in OpenUH. In: Proc. Open64 Workshop in Conjunction with the Int’l Symposium on Code Generation and Optimization, Seattle, USA (March 2009)
Chase, D., Lev, Y.: Dynamic circular work-stealing deque. In: Proc. SPAA 2005, 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 21–28. ACM, Las Vegas (2005)
Dimakopoulos, V.V., Leontiadis, E., Tzoumas, G.: A portable C compiler for OpenMP V.2.0. In: Proc. EWOMP 2003, 5th European Workshop on OpenMP, Aachen, Germany, pp. 5–11 (September 2003)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP Task Scheduling Strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguadé, E.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: Proc. ICPP 2009, 38th Int’l Conference on Parallel Processing, Vienna, Austria, pp. 124–131 (September 2009)
Fatourou, P., Kallimanis, N.D.: A highly-efficient wait-free universal construction. In: Proc. SPAA 2011, Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, pp. 325–334. ACM, San Jose (2011)
Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proc. PPoPP 2012, 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 257–266. ACM, New Orleans (2012)
Korch, M., Rauber, T.: A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles. Concurr. Comput.: Pract. Exper. 16(1), 1–47 (2003)
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proc. PODC 1996, 15th Annual ACM Symposium on Principles of Distributed Computing, pp. 267–275. ACM, Philadelphia (1996)
OpenMP ARB: OpenMP Application Program Interface V3.1 (July 2011)
Reinders, J.: Intel threading building blocks, 1st edn. O’Reilly & Associates, Inc., Sebastopol (2007)
Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Proc. PPoPP 2010, 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 179–190. ACM, Bangalore (2010)
Teruel, X., Unnikrishnan, P., Martorell, X., Ayguade, E., Silvera, R., Zhang, G., Tiotto, E.: OpenMP tasks in IBM XL compilers. In: Proc. CASCON 2008, 2008 Conference of the Center for Advanced Studies on Collaborative Research, Ontario, Canada, pp. 207–221 (October 2008)
Teruel, X., Martorell, X., Duran, A., Ferrer, R., Ayguadé, E.: Support for OpenMP tasks in Nanos v4. In: CASCON, pp. 256–259 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agathos, S.N., Kallimanis, N.D., Dimakopoulos, V.V. (2012). Speeding Up OpenMP Tasking. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_64
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)