libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms

Broquedis, François; Gautier, Thierry; Danjean, Vincent

doi:10.1007/978-3-642-30961-8_8

François Broquedis¹⁹,
Thierry Gautier²⁰ &
Vincent Danjean²¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7312))

Included in the following conference series:

International Workshop on OpenMP

1728 Accesses
29 Citations

Abstract

To efficiently exploit high performance computing platforms, applications currently have to express more and more finer-grain parallelism. The OpenMP standard allows programmers to do so since version 3.0 and the introduction of task parallelism. Even if this evolution stands as a necessary step towards scalability over shared memory machines holding hundreds of cores, the current specification of OpenMP lacks ways of expressing dependencies between tasks, forcing programmers to make unnecessary use of synchronization degrading overall performance. This paper introduces libKOMP, an OpenMP runtime system based on the X-Kaapi library that outperforms popular OpenMP implementations on current task-based OpenMP benchmarks, but also provides OpenMP programmers with new ways of expressing data-flow parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agathos, S.N., Hadjidoukas, P.E., Dimakopoulos, V.V.: Design and implementation of openmp tasks in the ompi compiler. In: Angelidis, P., Michalas, A. (eds.) Panhellenic Conference on Informatics, pp. 265–269. IEEE (2011), http://dblp.uni-trier.de/db/conf/pci/pci2011.html#AgathosHD11
Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Roman, J., Thibault, S., Tomov, S.: Dynamically scheduled Cholesky factorization on multicore architectures with GPU accelerators. In: Symposium on Application Accelerators in High Performance Computing (SAAHPC), Knoxville, USA (July 2010)
Google Scholar
Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theor. Comp. Sys. 34(2), 115–144 (2001)
Article MathSciNet MATH Google Scholar
Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Ortí, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-02303-3_13
Chapter Google Scholar
Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using smpss. Concurr. Comput.: Pract. Exper. 21, 2438–2456 (2009)
Article Google Scholar
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37(1), 55–69 (1996), citeseer.nj.nec.com/article/blumofe95cilk.html
Article Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 38–53 (2009)
Article MathSciNet Google Scholar
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21, 291–312 (2007), http://dl.acm.org/citation.cfm?id=1286120.1286123
Article Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not. 40, 519–538 (2005)
Article Google Scholar
Dumitrescu, B., Doreille, M., Roch, J.L., Trystram, D.: Two-dimensional block partitionings for the parallel sparse cholesky factorization. Numerical Algorithms 16, 17–38 (1997)
Article MathSciNet MATH Google Scholar
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of OpenMP Task Scheduling Strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 100–110. Springer, Heidelberg (2008)
Chapter Google Scholar
Duran, A., Perez, J.M., Ayguadé, E., Badia, R.M., Labarta, J.: Extending the OpenMP Tasking Model to Allow Dependent Tasks. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 111–122. Springer, Heidelberg (2008)
Chapter Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: A set of benchmarks targeting the exploitation of task parallelism in openmp. In: International Conference on Parallel Processing, ICPP 2009, pp. 124–131. IEEE (2009)
Google Scholar
Faucher, V.: Advanced Parallel Computing for Explosive Fluid-Structure Interaction. In: COMPDYN 2011, Corfu, Greece (May 2011)
Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI 1998, pp. 212–223. ACM, New York (1998)
Chapter Google Scholar
Galilée, F., Roch, J.L., Cavalheiro, G.G.H., Doreille, M.: Athapascan-1: On-line building data flow graph in a parallel language. In: Proceedings of PACT 1998, p. 88. IEEE Computer Society, Washington, DC (1998)
Google Scholar
Gautier, T., Besseron, X., Pigeon, L.: Kaapi: a thread scheduling runtime system for data flow computations on cluster of multi-processors. In: PASCO 2007 (2007)
Google Scholar
Gautier, T., Roch, J.L., Wagner, F.: Fine grain distributed implementation of a dataflow language with provable performances. In: Workshop PAPP 2007 - Practical Aspects of High-Level Parallel Programming in (ICCS2007). IEEE, Beijing (2007)
Google Scholar
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2010, pp. 355–364. ACM, New York (2010)
Chapter Google Scholar
Hendler, D., Shavit, N.: Non-blocking steal-half work queues. In: PODC 2002: Proceedings of the Twenty-First Annual Symposium on Principles of Distributed Computing, pp. 280–289. ACM, New York (2002)
Chapter Google Scholar
Hermann, E., Raffin, B., Faure, F., Gautier, T., Allard, J.: Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 235–246. Springer, Heidelberg (2010)
Chapter Google Scholar
Kurzak, J., Ltaief, H., Dongarra, J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exper. 22, 15–44 (2010)
Article Google Scholar
LaGrone, J., Aribuki, A., Addison, C., Chapman, B.: A Runtime Implementation of OpenMP Tasks. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 165–178. Springer, Heidelberg (2011), http://dl.acm.org/citation.cfm?id=2023025.2023042
Chapter Google Scholar
Le Mentec, F., Danjean, V., Gautier, T.: X-Kaapi C programming interface. Tech. Rep. RT-0417, INRIA (December 2011)
Google Scholar
Le Mentec, F., Gautier, T., Danjean, V.: The X-Kaapi’s Application Programming Interface. Part I: Data Flow Programming. Tech. Rep. RT-0418, INRIA (December 2011)
Google Scholar
Michael, M.M., Vechev, M.T., Saraswat, V.A.: Idempotent work stealing. SIGPLAN Not. 44, 45–54 (2009)
Article Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2011, pp. 49–56. ACM, New York (2011), http://doi.acm.org/10.1145/1988796.1988804
Chapter Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore numa systems. International Journal of High Performance Computing Applications (2012)
Google Scholar
OpenMP Architecture Review Board (1997-2008), http://www.openmp.org
Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in TBB. In: IPDPS (2008)
Google Scholar
Tchiboukdjian, M., Danjean, V., Gautier, T., Le Mentec, F., Raffin, B.: A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores. In: Guarracino, M.R., Vivien, F., Träff, J.L., Cannatoro, M., Danelutto, M., Hast, A., Perla, F., Knüpfer, A., Di Martino, B., Alexander, M. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 99–107. Springer, Heidelberg (2011)
Chapter Google Scholar
Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J.-L., Bernard, J.: A Tighter Analysis of Work Stealing. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 291–302. Springer, Heidelberg (2010)
Chapter Google Scholar
Traoré, D., Roch, J.-L., Maillard, N., Gautier, T., Bernard, J.: Deque-Free Work-Optimal Parallel STL Algorithms. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 887–897. Springer, Heidelberg (2008), http://www.caos.uab.es/europar2008/
Chapter Google Scholar
YarKhan, A., Kurzak, J., Dongarra, J.: Quark users’ guide: Queueing and runtime for kernels. Tech. Rep. ICL-UT-11-02. University of Tennessee (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

MOAIS Team, LIG, INPG, Grenoble, France
François Broquedis
MOAIS Team, LIG, INRIA, Grenoble, France
Thierry Gautier
MOAIS Team, LIG, UJF, Grenoble, France
Vincent Danjean

Authors

François Broquedis
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Danjean
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Houston, Houston, TX, USA
Barbara M. Chapman
CASPUR, Via dei Tizii, 6, 00185, Rome, Italy
Federico Massaioli & Marco Rorro &
Center for Information Services and High Performance Computing (ZIH), Dresden University of Technology, 01062, Dresden, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Broquedis, F., Gautier, T., Danjean, V. (2012). libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds) OpenMP in a Heterogeneous World. IWOMP 2012. Lecture Notes in Computer Science, vol 7312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30961-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-30961-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30960-1
Online ISBN: 978-3-642-30961-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics