skip to main content
10.1145/3492805.3492806acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA

Published:07 January 2022Publication History

ABSTRACT

Performance Portability frameworks are becoming more central and essential in heterogeneous computing systems. However, the developer toolbox lacks the tools to assess the performance portability degree of these frameworks.

This article presents a new definition and a metric for evaluating the performance portability of high-level parallel programming models. Using the new metric, the performance portability of OpenACC, OpenMP, Kokkos and RAJA were evaluated based on 324 case studies in various application domains, CPUs and GPUs architectures, and high-performance compilers. The results show that the four performance portability frameworks achieve impressive performance portability of over 80% with no significant differences between different architectures and compilers.

References

  1. [1] Sutter H., Welcome to the Jungle, http://herbsutter.com/welcome-to-the-jungle/, 2012.Google ScholarGoogle Scholar
  2. [2] OpenACC: Directive-Based Parallel Programming Model for Accelerators. Available: http://www.openacc.org (2018).Google ScholarGoogle Scholar
  3. [3] OpenMP. OpenMP 4.5 Specifications.http://www.openmp.org/specifications/. Accessed: 2017-02-11.Google ScholarGoogle Scholar
  4. [4] H. Carter Edwards, Christian R. Trott and Daniel Sundrland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, Journal of Parallel and Distributed Computing, 2014.Google ScholarGoogle Scholar
  5. [5] R. D. Hornung, and J. A. Keasler. 2014. The RAJA Portability Layer: Overview and Status. LLNL-TR-661403.Google ScholarGoogle Scholar
  6. [6] William D. Gropp, Performance, Portability, and Dreams, Dagstuhl Seminar 17431, October 22-27, 2017.Google ScholarGoogle Scholar
  7. [7] A. Marowka, Pitfalls and Issues of Manycore Programming, Advances in Computers, Volume 79, pages 71-117, 2010.Google ScholarGoogle Scholar
  8. [8] http://performanceportability.org/perfport/definition/Google ScholarGoogle Scholar
  9. [9] DOE Centers of Excellence Performance Portability Meeting,April 19-21, 2016, Glendale, AZ, Post-meeting Report.Google ScholarGoogle Scholar
  10. [10] V. Artigues, K. Kormann, M. Rampp, and K. Reuter. Evaluation of performance portability frameworks for the implementation of a particle-in-cell code. Concurrency Computat. Pract. Exper., page e5640, 2019.Google ScholarGoogle Scholar
  11. [11]Asahi Y., Latu G., Grandgirard V., Bigot J. (2020) Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App. In: Wienke S., Bhalachandra S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science, vol 12017. Springer, Cham.Google ScholarGoogle Scholar
  12. [12] Deakin T., Price J., Martineau M., McIntosh-Smith S. (2016) GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models. In: Taufer M., Mohr B., Kunkel J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science, vol 9945. Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Eichstaedt J, Vymazal M, Moxey D, Peiro Jet al., 2020, A comparison of the shared-memory parallel programming models OpenMP, OpenACC and Kokkos in the context of implicit solvers for high-order FEM, Computer Physics Communications, Vol: 255, Pages: 1-15.Google ScholarGoogle Scholar
  14. [14] Gayatri R., Yang C., Kurth T., Deslippe J. (2019) A Case Study for Performance Portability Using OpenMP 4.5. In: Chandrasekaran S., Juckeland G., Wienke S. (eds) Accelerator Programming Using Directives. WACCPD 2018. Lecture Notes in Computer Science, vol 11381. Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] J. A. Herdman et al., Accelerating Hydrocodes with OpenACC, OpenCL and CUDA, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, 2012, pp. 465-471.Google ScholarGoogle Scholar
  16. [16] R. O. Kirk, G. R. Mudalige, I. Z. Reguly, S. A. Wright, M. J. Martineau and S. A. Jarvis, Achieving Performance Portability for a Heat Conduction Solver Mini-Application on Modern Multi-core Systems, 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, 2017, pp. 834-841Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] John Gounley, Amanda Randles and Jeffrey S. Vetter, Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures. J. Parallel Distributed Comput. 129: 1-13 (2019)Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] M. Martineau, S. McIntosh-Smith and W. Gaudin, Evaluating OpenMP 4.0’s Effectiveness as a Heterogeneous Parallel Programming Model, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 2016, pp. 338-347.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] I. Z. Reguly, Performance Portability of Multi-Material Kernels, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA, 2019, pp. 26-35.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Y. Wei et al., Performance and Portability Studies with OpenACC Accelerated Version of GTC-P, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Guangzhou, 2016, pp. 13-18,Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Sabne A., Sakdhnagool P., Lee S., Vetter J.S. (2015) Evaluating Performance Portability of OpenACC. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, Cham.Google ScholarGoogle Scholar
  22. [22] S. Lee and J. S. Vetter, OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 1-11,Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Balogh G.D., Reguly I.Z., Mudalige G.R. (2018) Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2017. Lecture Notes in Computer Science, vol 10724. Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Bonati, C., Coscetti, S., D’Elia, M., Mesiti, M., Negro, F., Calore, E., Schifano, S.F., Silvi, G., Tripiccione, R. Design and optimization of a portable LQCD Monte Carlo code using OpenACC. Int. J. Mod. Phys. C 2017, 28.Google ScholarGoogle Scholar
  25. [25] Calore E., Kraus J., Schifano S.F., Tripiccione R. (2015) Accelerating Lattice Boltzmann Applications with OpenACC. In: Traff J., Hunold S., Versaci F. (eds) Euro-Par 2015: Parallel Processing. Euro-Par 2015. Lecture Notes in Computer Science, vol 9233. Springer, Berlin, Heidelberg.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Xu R., Tian X., Chandrasekaran S., Yan Y., Chapman B. (2015) NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model. In: Brodman J., Tu P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science, vol 8967. Springer, ChamGoogle ScholarGoogle Scholar
  27. [27] J. A. Herdman, W. P. Gaudin, O. Perks, D. A. Beckingsale, A. C. Mallinson and S. A. Jarvis, Achieving Portability and Performance through OpenACC, 2014 First Workshop on Accelerator Programming using Directives, New Orleans, LA, 2014, pp. 19-26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kuan, L., J. Neves, F. Pratas, P. Tomas, and L. Sousa. 2014. Accelerating Phylogenetic Inference on GPUs: An OpenACC and CUDA comparison. 2nd International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO), Granada, SPAIN, April, 07-09. 1: 589-600.Google ScholarGoogle Scholar
  29. [29] M. G. Lopez et al., Towards Achieving Performance Portability Using Directives for Accelerators, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 13-24.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] M. Martineau, S. McIntosh-Smith, M. Boulton, W. Gaudin, An Evaluation of Emerging Many-Core Parallel Programming Models, 7th International Workshop on Programming Models and Applications for Multicores and Manycores, 2016.Google ScholarGoogle Scholar
  31. [31] Gong, J., Markidis, S., Laure, E. et al. Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. J Supercomputing 72, 4160-4180 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] T. Hoshino, N. Maruyama, S. Matsuoka and R. Takaki, CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, 2013, pp. 136-143,Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] A. Lashgar and A. Baniasadi, Employing software-managed caches in OpenACC: Opportunities and benefits, ACM Trans. Model. Perform. Eval. Comput. Syst., vol. 1, no. 1, pp. 2:1-2:34, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Niemeyer, K.E., Sung, C. Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67, 528-564 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Norman M, Larkin J, Vose A, et al. (2015) A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel. Journal of Computational Science 9: 1-6.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Mudalige G.R., Reguly I.Z., Giles M.B., Mallinson A.C., Gaudin W.P., Herdman J.A. (2015) Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems. In: Jarvis S., Wright S., Hammond S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science, vol 8966. Springer, Cham.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Hernandez O., Ding W., Chapman B., Kartsaklis C., Sankaran R., Graham R. (2012) Experiences with High-Level Programming Directives for Porting Applications to GPUs. In: Keller R., Kramer D., Weiss JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] H. C. Edwards and C. R. Trott, Kokkos: Enabling Performance Portability Across Manycore Architectures, 2013 Extreme Scaling Workshop (xsw 2013), Boulder, CO, 2013, pp. 18-24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] A. Hayashi, J. Shirako, E. Tiotto, R. Ho and V. Sarkar, Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator Model on a POWER8+GPU Platform, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, UT, 2016, pp. 68-78.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] A. Hsu, D. N. Asanza, J. A. Schoonover, Z. Jibben, N. N. Carlson and R. Robey, Performance Portability Challenges for Fortran Applications, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 47-58.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Law, T.R., Kevis, R., Powell, S., Dickson, J., Maheswaran, S., Herdman, J.A., Jarvis, S.A.: Performance portability of an unstructured hydrodynamics mini-application. In: Proceedings of 2018 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC). ACM, New York, NY, USA (2018).Google ScholarGoogle Scholar
  42. [42] Martineau M., McIntosh-Smith S. (2017) The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs.In: de Supinski B., Olivier S., Terboven C., Chapman B., M?ller M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science, vol 10468. Springer, Cham.Google ScholarGoogle Scholar
  43. [43] Martineau M., Price J., McIntosh-Smith S., Gaudin W. (2016) Pragmatic Performance Portability with OpenMP 4.x. In: Maruyama N., de Supinski B., Wahib M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science, vol 9903. Springer, Cham.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] S. J. Pennycook, J. D. Sewall and J. R. Hammond, Evaluating the Impact of Proposed OpenMP 5.0 Features on Performance, Portability and Productivity, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 37-46Google ScholarGoogle Scholar
  45. [45] S. L. Harrell et al., Effective Performance Portability,” 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 2018, pp. 24-36.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Tandon Suyash, N. Stegmeier, Vasu Jaganath, Jennifer Ranta, R. Ratnasingam, Elizabeth Carlson, J. Loiseau, Vinay Ramakrishnaiah and Robert S. Pavel. Enabling code portability of a parallel and distributed smooth-particle hydrodynamics application, FleCSPH. (2019).Google ScholarGoogle Scholar
  47. [47] T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Bowen Alpern and Larry Carter, Towards a Model for Portable Parallel Performance: Exposing the Memory Hierarchy,In T. Hey, J. Ferrante (Eds.), Portability and Performance of Parallel Processing, Wiley, New York, 1994, pp. 21-41.Google ScholarGoogle Scholar
  49. [49] S. J. Pennycook, J. D. Sewall, and V. W. Lee, A Metric for Performance Portability, arXiv preprint arXiv:1611.07409, 2016.Google ScholarGoogle Scholar
  50. [50] Ami Marowka, Toward a Better Performance Portability Metric, In Proceeding of 29th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP 2021), Valladolid, Spain, March 10-12, 2021.Google ScholarGoogle Scholar
  51. [51] Ami Marowka, Raw Data and Statistics of case studies for Performance Portability Research,https://www.dropbox.com/s/1g9q0s2ymqq9003/Zmy.pdf?dl=0Google ScholarGoogle Scholar
  52. [52] Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification, 2020. [Online]. Available: https://www.khronos.org/news/press/Google ScholarGoogle Scholar
  53. [53] https://www.oneapi.io/Google ScholarGoogle Scholar

Index Terms

  1. On the Performance Portability of OpenACC, OpenMP, Kokkos and RAJA
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          HPCAsia '22: International Conference on High Performance Computing in Asia-Pacific Region
          January 2022
          145 pages
          ISBN:9781450384988
          DOI:10.1145/3492805

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 January 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate69of143submissions,48%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format