Extending OpenMP for Machine Learning-Driven Adaptation

Liao, Chunhua; Wang, Anjia; Georgakoudis, Giorgis; de Supinski, Bronis R.; Yan, Yonghong; Beckingsale, David; Gamblin, Todd

doi:10.1007/978-3-030-97759-7_3

Chunhua Liao¹¹,
Anjia Wang¹²,
Giorgis Georgakoudis¹¹,
Bronis R. de Supinski¹¹,
Yonghong Yan¹²,
David Beckingsale¹¹ &
…
Todd Gamblin¹¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 13194))

Included in the following conference series:

International Workshop on Accelerator Programming Using Directives

367 Accesses
1 Citations

Abstract

OpenMP 5.0 introduced the directive to support compile-time selection from a set of directive variants based on OpenMP context. OpenMP 5.1 extended context information to include user-defined conditions that enable user-guided runtime adaptation. However, defining conditions that capture the complex interactions between applications and hardware platforms to select an optimized variant is challenging for programmers. This paper explores a novel approach to automate runtime adaptation through machine learning. We design a new directive to describe semantics for model-driven adaptation and also develop a prototype implementation. Using the Smith-Waterman algorithm as a use-case, our experiments demonstrate that the proposed adaptive OpenMP extension automatically chooses the code variants that deliver the best performance in heterogeneous platforms that consist of CPU and GPU processing capabilities. Using decision tree models for tuning has an accuracy of up to 93.1% in selecting the optimal variant, with negligible runtime overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ansel, J., et al.: OpenTuner: an extensible framework for program autotuning. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp. 303–316 (2014)
Google Scholar
Ashouri, A.H., Killian, W., Cavazos, J., Palermo, G., Silvano, C.: A survey on compiler autotuning using machine learning. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
Article Google Scholar
Bari, M.A.S., et al.: ARCS: adaptive runtime configuration selection for power-constrained OpenMP applications. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 461–470, September 2016. https://doi.org/10.1109/CLUSTER.2016.39
Beckingsale, D., Pearce, O., Laguna, I., Gamblin, T.: Apollo: reusable models for fast, dynamic tuning of input-dependent code. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 307–316. IEEE (2017)
Google Scholar
Chen, C., Chame, J., Hall, M.: CHiLL: a framework for composing high-level loop transformations. Technical report, Citeseer (2008)
Google Scholar
Chen, R.S., Hollingsworth, J.K.: ANGEL: a hierarchical approach to multi-objective online auto-tuning. In: Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, pp. 1–8 (2015)
Google Scholar
Culjak, I., Abram, D., Pribanic, T., Dzapo, H., Cifrek, M.: A brief introduction to OpenCV. In: 2012 Proceedings of the 35th International Convention MIPRO, pp. 1725–1730. IEEE (2012)
Google Scholar
Cummins, C., Petoumenos, P., Wang, Z., Leather, H.: End-to-end deep learning of optimization heuristics. In: 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 219–232. IEEE (2017)
Google Scholar
Fursin, G., et al.: Milepost GCC: machine learning enabled self-tuning compiler. Int. J. Parallel Prog. 39(3), 296–327 (2011)
Article Google Scholar
Grewe, D., Wang, Z., O’Boyle, M.F.: Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10. IEEE (2013)
Google Scholar
Hayashi, A., Ishizaki, K., Koblents, G., Sarkar, V.: Machine-learning-based performance heuristics for runtime CPU/GPU selection. In: Proceedings of the Principles and Practices of Programming on the Java Platform, pp. 27–36 (2015)
Google Scholar
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status (2014)
Google Scholar
Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 15–28. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13217-9_2
Chapter Google Scholar
Liao, C., Quinlan, D.J., Vuduc, R., Panas, T.: Effective source-to-source outlining to support whole program empirical optimization. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 308–322. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13374-9_21
Chapter Google Scholar
Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_7
Chapter Google Scholar
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 45–55. IEEE (2009)
Google Scholar
Nugteren, C., Codreanu, V.: CLTune: a generic auto-tuner for OpenCL kernels. In: 2015 IEEE 9th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, pp. 195–202. IEEE (2015)
Google Scholar
OpenMP Architecture Review Board: OpenMP Application Programming Interface 5.0, November 2018. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
OpenMP Architecture Review Board: OpenMP Application Programming Interface 5.1, November 2020. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-1.pdf
Pennycook, S.J., Sewall, J.D., Hammond, J.R.: Evaluating the impact of proposed OpenMP 5.0 features on performance, portability and productivity. In: 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 37–46, November 2018. https://doi.org/10.1109/P3HPC.2018.00007
Quinlan, D., Liao, C.: The ROSE source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in Conjunction with PACT, vol. 2011, p. 1. Citeseer (2011)
Google Scholar
Smith, T.F., Waterman, M.S., et al.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Sreenivasan, V., Javali, R., Hall, M., Balaprakash, P., Scogland, T.R.W., de Supinski, B.R.: A framework for enabling OpenMP autotuning. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 50–60. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_4
Chapter Google Scholar
Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, SC 2002, pp. 1–11. IEEE Computer Society Press, Washington, DC (2002)
Google Scholar
Wang, Z., O’Boyle, M.: Machine learning in compiler optimization. Proc. IEEE 106(11), 1879–1901 (2018)
Article Google Scholar
Wood, C., et al.: Artemis: automatic runtime tuning of parallel execution parameters using machine learning. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 453–472. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_24
Chapter Google Scholar
Yan, Y., Wang, A., Liao, C., Scogland, T.R.W., de Supinski, B.R.: Extending OpenMP Metadirective semantics for runtime adaptation. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 201–214. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_14
Chapter Google Scholar
Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: parameterized optimizations for empirical tuning. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Google Scholar

Download references

Acknowledgment

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-826432). The OpenMP language extension work was supported by the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research. The compiler and runtime work were supported by LLNL-LDRD 21-ERD-018.

Author information

Authors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
Chunhua Liao, Giorgis Georgakoudis, Bronis R. de Supinski, David Beckingsale & Todd Gamblin
University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Anjia Wang & Yonghong Yan

Authors

Chunhua Liao
View author publications
You can also search for this author in PubMed Google Scholar
Anjia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Giorgis Georgakoudis
View author publications
You can also search for this author in PubMed Google Scholar
Bronis R. de Supinski
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Yan
View author publications
You can also search for this author in PubMed Google Scholar
David Beckingsale
View author publications
You can also search for this author in PubMed Google Scholar
Todd Gamblin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunhua Liao .

Editor information

Editors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Sridutt Bhalachandra
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Christopher Daley
Oak Ridge National Laboratory, Oak Ridge, DE, USA
Verónica Melesse Vergara

Artifact Availability Statement

Summary of the Experiments Reported: We ran Smith Waterman algorithm on LLNL’s Corona and Pascal supercomputer. The detailed software configurations are given in the experiment section of the paper.

Software Artifact Availability: All author-created software artifacts are maintained in a public repository under an OSI-approved license.

Hardware Artifact Availability: There are no author-created hardware artifacts.

Data Artifact Availability: All author-created data artifacts are maintained in a public repository under an OSI-approved license.

Proprietary Artifacts: None of the associated artifacts, author-created or otherwise, are proprietary.

List of URLs and/or DOIs where artifacts are available:

https://doi.org/10.5281/zenodo.5706501

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, C. et al. (2022). Extending OpenMP for Machine Learning-Driven Adaptation. In: Bhalachandra, S., Daley, C., Melesse Vergara, V. (eds) Accelerator Programming Using Directives. WACCPD 2021. Lecture Notes in Computer Science(), vol 13194. Springer, Cham. https://doi.org/10.1007/978-3-030-97759-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-97759-7_3
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97758-0
Online ISBN: 978-3-030-97759-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extending OpenMP for Machine Learning-Driven Adaptation

Abstract

Access this chapter

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Artifact Availability Statement

Artifact Availability Statement

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation