A Quantitative Analysis of OpenMP Task Runtime Systems

Hunold, Sascha; Kraßnitzer, Klaus

doi:10.1007/978-3-031-31180-2_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13852))

Included in the following conference series:

International Symposium on Benchmarking, Measuring and Optimization

365 Accesses

Abstract

Although OpenMP is heavily used to parallelize for-loops, it also supports task-parallel programming, which is important for parallelizing irregular applications. In this work, we focus on the performance of OpenMP runtime systems for task-based applications. In particular, we investigate the performance of different OpenMP runtime systems when scheduling a large set independent tasks of different granularity. To that end, we propose a new OpenMP benchmark, which features profiling and tracing options that help developers to reason about the observed performance differences. We compare the execution times measured for a variety of compilers, such as gcc, icc, clang, aocc, and pgcc, for both homogeneous and heterogeneous workloads. Our study shows that there are significant performance differences between the different OpenMP implementations. We also show that the performance attainable with a compiler strongly depends on the machine architecture, the number of threads, the thread-pinning strategy, and the task granularity.

K. Kraßnitzer—This work was partially supported by the Austrian Science Fund (FWF): project P 33884-N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/parlab-tuwien/omp-task-bench.

References

Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_24
Chapter Google Scholar
Chasapis, D., et al.: PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Trans. Archit. Code Optim. 12(4), 1–22 (2016). https://doi.org/10.1145/2829952
Article Google Scholar
Clet-Ortega, J., Carribault, P., Pérache, M.: Evaluation of OpenMP task scheduling algorithms for large NUMA architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 596–607. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_50
Chapter Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguadé, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the ICPP, pp. 124–131. IEEE Computer Society (2009). https://doi.org/10.1109/ICPP.2009.64
Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015)
Book MATH Google Scholar
Gautier, T., Perez, C., Richard, J.: On the impact of OpenMP task granularity. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 205–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_14
Chapter Google Scholar
Graham, R.L., Lawler, E.L., Lenstra, J.K., Kan, A.R.: Optimization and approximation in deterministic sequencing and scheduling: a survey. Ann. Discrete Math. 5, 287–326 (1979)
Article MathSciNet MATH Google Scholar
Huynh, A., Helm, C., Iwasaki, S., Endo, W., Namsraijav, B., Taura, K.: TP-PARSEC: a task parallel PARSEC benchmark suite. J. Inf. Process. 27, 211–220 (2019). https://doi.org/10.2197/ipsjjip.27.211
Article Google Scholar
Jain, R.: The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling. Wiley (1991)
Google Scholar
Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). https://doi.org/10.1177/1094342011434065
Article Google Scholar
Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th SOSP, pp. 69–84. ACM (2013). https://doi.org/10.1145/2517349.2522716
Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_11
Chapter Google Scholar
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_14
Chapter Google Scholar
Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Program. 46(2), 173–197 (2018). https://doi.org/10.1007/s10766-016-0484-8
Article Google Scholar
Zhan, X., Bao, Y., Bienia, C., Li, K.: PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. SIGARCH Comput. Archit. News 44(5), 1–16 (2016). https://doi.org/10.1145/3053277.3053279

Download references

Acknowledgments

We thank Lukas Briem for helping to implement the heterogeneous workloads.

Author information

Authors and Affiliations

Research Group for Parallel Computing, Faculty of Informatics, TU Wien, Vienna, Austria
Sascha Hunold & Klaus Kraßnitzer

Authors

Sascha Hunold
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Kraßnitzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sascha Hunold .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Ana Gainaru
ETH Zurich, Zürich, Switzerland
Ce Zhang
Chinese Academy of Sciences, Beijing, China
Chunjie Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hunold, S., Kraßnitzer, K. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In: Gainaru, A., Zhang, C., Luo, C. (eds) Benchmarking, Measuring, and Optimizing. Bench 2022. Lecture Notes in Computer Science, vol 13852. Springer, Cham. https://doi.org/10.1007/978-3-031-31180-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-31180-2_1
Published: 13 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31179-6
Online ISBN: 978-3-031-31180-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics