Skip to main content

A Quantitative Analysis of OpenMP Task Runtime Systems

  • Conference paper
  • First Online:
Benchmarking, Measuring, and Optimizing (Bench 2022)

Abstract

Although OpenMP is heavily used to parallelize for-loops, it also supports task-parallel programming, which is important for parallelizing irregular applications. In this work, we focus on the performance of OpenMP runtime systems for task-based applications. In particular, we investigate the performance of different OpenMP runtime systems when scheduling a large set independent tasks of different granularity. To that end, we propose a new OpenMP benchmark, which features profiling and tracing options that help developers to reason about the observed performance differences. We compare the execution times measured for a variety of compilers, such as gcc, icc, clang, aocc, and pgcc, for both homogeneous and heterogeneous workloads. Our study shows that there are significant performance differences between the different OpenMP implementations. We also show that the performance attainable with a compiler strongly depends on the machine architecture, the number of threads, the thread-pinning strategy, and the task granularity.

K. Kraßnitzer—This work was partially supported by the Austrian Science Fund (FWF): project P 33884-N.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/parlab-tuwien/omp-task-bench.

References

  1. Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for OpenMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_24

    Chapter  Google Scholar 

  2. Chasapis, D., et al.: PARSECSs: evaluating the impact of task parallelism in the PARSEC benchmark suite. ACM Trans. Archit. Code Optim. 12(4), 1–22 (2016). https://doi.org/10.1145/2829952

    Article  Google Scholar 

  3. Clet-Ortega, J., Carribault, P., Pérache, M.: Evaluation of OpenMP task scheduling algorithms for large NUMA architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 596–607. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_50

    Chapter  Google Scholar 

  4. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguadé, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: Proceedings of the ICPP, pp. 124–131. IEEE Computer Society (2009). https://doi.org/10.1109/ICPP.2009.64

  5. Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015)

    Book  MATH  Google Scholar 

  6. Gautier, T., Perez, C., Richard, J.: On the impact of OpenMP task granularity. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 205–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_14

    Chapter  Google Scholar 

  7. Graham, R.L., Lawler, E.L., Lenstra, J.K., Kan, A.R.: Optimization and approximation in deterministic sequencing and scheduling: a survey. Ann. Discrete Math. 5, 287–326 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  8. Huynh, A., Helm, C., Iwasaki, S., Endo, W., Namsraijav, B., Taura, K.: TP-PARSEC: a task parallel PARSEC benchmark suite. J. Inf. Process. 27, 211–220 (2019). https://doi.org/10.2197/ipsjjip.27.211

    Article  Google Scholar 

  9. Jain, R.: The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling. Wiley (1991)

    Google Scholar 

  10. Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). https://doi.org/10.1177/1094342011434065

    Article  Google Scholar 

  11. Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th SOSP, pp. 69–84. ACM (2013). https://doi.org/10.1145/2517349.2522716

  12. Schuchart, J., Nachtmann, M., Gracia, J.: Patterns for OpenMP task data dependency overhead measurements. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 156–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_11

    Chapter  Google Scholar 

  13. Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP tasking implementations on NUMA architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_14

    Chapter  Google Scholar 

  14. Yang, J., He, Q.: Scheduling parallel computations by work stealing: a survey. Int. J. Parallel Program. 46(2), 173–197 (2018). https://doi.org/10.1007/s10766-016-0484-8

    Article  Google Scholar 

  15. Zhan, X., Bao, Y., Bienia, C., Li, K.: PARSEC3.0: a multicore benchmark suite with network stacks and SPLASH-2X. SIGARCH Comput. Archit. News 44(5), 1–16 (2016). https://doi.org/10.1145/3053277.3053279

Download references

Acknowledgments

We thank Lukas Briem for helping to implement the heterogeneous workloads.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sascha Hunold .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hunold, S., Kraßnitzer, K. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In: Gainaru, A., Zhang, C., Luo, C. (eds) Benchmarking, Measuring, and Optimizing. Bench 2022. Lecture Notes in Computer Science, vol 13852. Springer, Cham. https://doi.org/10.1007/978-3-031-31180-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31180-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31179-6

  • Online ISBN: 978-3-031-31180-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics