A Beginner’s Guide to Estimating and Improving Performance Portability

Dreuning, Henk; Heirman, Roel; Varbanescu, Ana Lucia

doi:10.1007/978-3-030-02465-9_52

Henk Dreuning¹⁶,
Roel Heirman¹⁶ &
Ana Lucia Varbanescu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11203))

Included in the following conference series:

International Conference on High Performance Computing

1354 Accesses
5 Citations

Abstract

Given the increasing diversity of multi- and many-core processors, portability is a desirable feature of applications designed and implemented for such platforms. Portability is unanimously seen as a productivity enabler, but it is also considered a major performance blocker. Thus, performance portability has emerged as the property of an application to preserve similar form and similar performance on a set of platforms; a first metric, based on extensive evaluation, has been proposed to quantify performance portability for a given application on a set of given platforms.

In this work, we explore the challenges and limitations of this performance portability metric (PPM) on two levels. We first use 5 OpenACC applications and 3 platforms, and we demonstrate how to compute and interpret PPM in this context. Our results indicate specific challenges in parameter selection and results interpretation. Second, we use controlled experiments to assess the impact of platform-specific optimizations on both performance and performance portability. Our results illustrate, for our 5 OpenACC applications, a clear tension between performance improvement and performance portability improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All applications use single precision floating point data types, with Diffusion Operator being the only exception.
2.
rmfarber, https://github.com/rmfarber/ParallelProgrammingWithOpenACC.
3.
Oak Ridge Leadership Computing Facility, https://github.com/olcf/vector_addition_tutorials.
4.
yuhc, https://github.com/yuhc/gpu-rodinia [3].
5.
In all notations with a subscript m or peak, m stands for measured, and peak represents a form of peak performance.

References

Bal, H., et al.: A medium-scale distributed system for computer science research: infrastructure for the long term. Computer 49(5), 54–63 (2016)
Article Google Scholar
Bauer, S.: Accelerator Offloading mit GCC (in German) (2016). https://www.heise.de/developer/artikel/Accelerator-Offloading-mit-GCC-3317330.html?seite=3. Accessed Apr 2018
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)
Google Scholar
Fabeiro, J.F.: Tools for improving performance portability in heterogeneous environments. Ph.D. thesis, Department of Computer Engineering, University of A Coruña, July 2017
Google Scholar
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP), pp. 216–225. IEEE (2011)
Google Scholar
Intel. Intel Math Kernel Library. https://software.intel.com/en-us/mkl. Accessed Apr 2018
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Assessing the performance portability of modern parallel programming models using TeaLeaf. Concurrency Comput.: Pract. Exp. 29(15), e4117 (2017)
Article Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
Google Scholar
McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_4
Chapter Google Scholar
NVIDIA. cuBLAS. https://developer.nvidia.com/cublas. Accessed Apr 2018
NVIDIA. CUDA C Programming Guide (2018). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Accessed Apr 2018
Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. 73(11), 1439–1450 (2013)
Article Google Scholar
Pennycook, S.J., Sewall, J.D., Lee, V.W.: A Metric for Performance Portability. arXiv preprint arXiv:1611.07409 (2016)
Pennycook, S.J., Sewall, J.D., Lee, V.W.: A metric for performance portability. CoRR, abs/1611.07409 (2016)
Google Scholar
Pennycook, S.J., Sewall, J.D., Lee, V.W.: Implications of a metric for performance portability. Future Gener. Comput. Syst. (2017)
Google Scholar
Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2010) (2010)
Google Scholar
Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, Washington, DC, USA, pp. 116–125. IEEE Computer Society (2012)
Google Scholar
Stratton, J.A., Kim, H., Jablin, T.B., Hwu, W.W.: Performance portability in accelerated parallel kernels. Center for Reliable and High-Performance Computing (2013)
Google Scholar
UK-MAC. TeaLeaf (2017). http://uk-mac.github.io/TeaLeaf/
van der Sanden, J.: Evaluating the performance portability of OpenCL. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Zhang, Y., Sinclair, M., Chien, A.A.: Improving performance portability in OpenCL programs. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 136–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_11
Chapter Google Scholar

Download references

Acknowledgements

We would like to thank Jason Sewall and John Pennycook for their help in designing our experiments and interpreting the results.

Author information

Authors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Henk Dreuning, Roel Heirman & Ana Lucia Varbanescu

Authors

Henk Dreuning
View author publications
You can also search for this author in PubMed Google Scholar
Roel Heirman
View author publications
You can also search for this author in PubMed Google Scholar
Ana Lucia Varbanescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henk Dreuning .

Editor information

Editors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
John Shalf
Swiss National Supercomputing Centre, Lugano, Switzerland
Sadaf Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dreuning, H., Heirman, R., Varbanescu, A.L. (2018). A Beginner’s Guide to Estimating and Improving Performance Portability. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-030-02465-9_52
Published: 25 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics