Skip to main content

A Beginner’s Guide to Estimating and Improving Performance Portability

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11203))

Included in the following conference series:

Abstract

Given the increasing diversity of multi- and many-core processors, portability is a desirable feature of applications designed and implemented for such platforms. Portability is unanimously seen as a productivity enabler, but it is also considered a major performance blocker. Thus, performance portability has emerged as the property of an application to preserve similar form and similar performance on a set of platforms; a first metric, based on extensive evaluation, has been proposed to quantify performance portability for a given application on a set of given platforms.

In this work, we explore the challenges and limitations of this performance portability metric (PPM) on two levels. We first use 5 OpenACC applications and 3 platforms, and we demonstrate how to compute and interpret PPM in this context. Our results indicate specific challenges in parameter selection and results interpretation. Second, we use controlled experiments to assess the impact of platform-specific optimizations on both performance and performance portability. Our results illustrate, for our 5 OpenACC applications, a clear tension between performance improvement and performance portability improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All applications use single precision floating point data types, with Diffusion Operator being the only exception.

  2. 2.

    rmfarber, https://github.com/rmfarber/ParallelProgrammingWithOpenACC.

  3. 3.

    Oak Ridge Leadership Computing Facility, https://github.com/olcf/vector_addition_tutorials.

  4. 4.

    yuhc, https://github.com/yuhc/gpu-rodinia [3].

  5. 5.

    In all notations with a subscript m or peak, m stands for measured, and peak represents a form of peak performance.

References

  1. Bal, H., et al.: A medium-scale distributed system for computer science research: infrastructure for the long term. Computer 49(5), 54–63 (2016)

    Article  Google Scholar 

  2. Bauer, S.: Accelerator Offloading mit GCC (in German) (2016). https://www.heise.de/developer/artikel/Accelerator-Offloading-mit-GCC-3317330.html?seite=3. Accessed Apr 2018

  3. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)

    Google Scholar 

  4. Fabeiro, J.F.: Tools for improving performance portability in heterogeneous environments. Ph.D. thesis, Department of Computer Engineering, University of A Coruña, July 2017

    Google Scholar 

  5. Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP), pp. 216–225. IEEE (2011)

    Google Scholar 

  6. Intel. Intel Math Kernel Library. https://software.intel.com/en-us/mkl. Accessed Apr 2018

  7. Martineau, M., McIntosh-Smith, S., Gaudin, W.: Assessing the performance portability of modern parallel programming models using TeaLeaf. Concurrency Comput.: Pract. Exp. 29(15), e4117 (2017)

    Article  Google Scholar 

  8. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995

    Google Scholar 

  9. McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_4

    Chapter  Google Scholar 

  10. NVIDIA. cuBLAS. https://developer.nvidia.com/cublas. Accessed Apr 2018

  11. NVIDIA. CUDA C Programming Guide (2018). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Accessed Apr 2018

  12. Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. 73(11), 1439–1450 (2013)

    Article  Google Scholar 

  13. Pennycook, S.J., Sewall, J.D., Lee, V.W.: A Metric for Performance Portability. arXiv preprint arXiv:1611.07409 (2016)

  14. Pennycook, S.J., Sewall, J.D., Lee, V.W.: A metric for performance portability. CoRR, abs/1611.07409 (2016)

    Google Scholar 

  15. Pennycook, S.J., Sewall, J.D., Lee, V.W.: Implications of a metric for performance portability. Future Gener. Comput. Syst. (2017)

    Google Scholar 

  16. Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2010) (2010)

    Google Scholar 

  17. Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, Washington, DC, USA, pp. 116–125. IEEE Computer Society (2012)

    Google Scholar 

  18. Stratton, J.A., Kim, H., Jablin, T.B., Hwu, W.W.: Performance portability in accelerated parallel kernels. Center for Reliable and High-Performance Computing (2013)

    Google Scholar 

  19. UK-MAC. TeaLeaf (2017). http://uk-mac.github.io/TeaLeaf/

  20. van der Sanden, J.: Evaluating the performance portability of OpenCL. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011)

    Google Scholar 

  21. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  22. Zhang, Y., Sinclair, M., Chien, A.A.: Improving performance portability in OpenCL programs. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 136–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_11

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank Jason Sewall and John Pennycook for their help in designing our experiments and interpreting the results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henk Dreuning .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dreuning, H., Heirman, R., Varbanescu, A.L. (2018). A Beginner’s Guide to Estimating and Improving Performance Portability. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02465-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02464-2

  • Online ISBN: 978-3-030-02465-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics