Abstract
Heterogeneous computer environments are becoming commonplace so it is increasingly important to understand how and where we could execute a given algorithm the most efficiently. In this paper we propose a methodology that uses both static source code metrics, and dynamic execution time, power, and energy measurements to build gain ratio prediction models. These models are trained on special benchmarks that have both sequential and parallel implementations and can be executed on various computing elements, e.g., on CPUs, GPUs, or FPGAs. After they are built, however, they can be applied to a new system using only the system’s static source code metrics which are much more easily computable than any dynamic measurement. We found that while estimating a continuous gain ratio is a much harder problem, we could predict the gain category (e.g., “slight improvement” or “large deterioration”) of porting to a specific configuration significantly more accurately than a random choice, using static information alone. We also conclude based on our benchmarks that parallelized implementations are less maintainable, thereby supporting the need for automatic transformations.
Similar content being viewed by others
References
(2014) NVIDIA Management Library (NVML)—Reference Manual. NVIDIA Corporation, TRM-06719-001 _vR331
(2014) PicoScope 4000 Series (A API)—Programmers Guide. Pico Technology Ltd., ps4000apg.en r1
(2015) AMD GPU Performance API—User Guide. Advanced Micro Devices, Inc., v2.15
(2015) ARM DS-5 Version 5.21—Streamline User Guide. ARM, ARM DUI0482S
(2015) Intel 64 and IA-32 Architectures Software Developer’s Manual: vol 3B. Intel Corporation, Order Number 253669
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Bán D, Ferenc R, Siket I, Kiss Á (2015) Prediction models for performance, power, and energy efficiency of software executed on heterogeneous hardware. In: Proceedings of the 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). IEEE, pp 178–183
Bán D, Ferenc R, Siket I, Kiss Á, Gyimóthy T (2017) Performance, power, and energy prediction models. http://www.inf.u-szeged.hu/~ferenc/papers/PerformancePowerEnergyModels/
Bán D, Sipka R, Dobi I (2017) Tagged parallel benchmarks. https://github.com/sed-inf-u-szeged/TaggedParallelBenchmarks
Brandolese C, Fornaciari W, Salice F, Sciuto D (2001) Source-level execution time estimation of C programs. In: Proceedings of the Ninth International Symposium on Hardware/Software Codesign (CODES). ACM, New York, NY, USA, pp 98–103
Brown KJ, Sujeeth AK, Lee HJ, Rompf T, Chafi H, Odersky M, Olukotun K (2011) A heterogeneous parallel framework for domain-specific languages. In: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, pp 89–100
Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. IEEE International Symposium on Workload Characterization (IISWC). IEEE Computer Society, Washington, DC, USA, pp 44–54
Ferenc R et al (2014) Static analysis techniques for AIR generation. Deliverable D2.2, REPARA
Ferenc R et al (2015) Maintainability models of heterogeneous programming models. Deliverable D7.4, REPARA
Fursin G, Kashnikov Y, Memon AW, Chamski Z, Temam O, Namolaru M, Yom-Tov E, Mendelson B, Zaks A, Courtois E, Bodin F, Barnard P, Ashton E, Bonilla E, Thomson J, Williams CKI, O’Boyle M (2011) Milepost GCC: machine learning enabled self-tuning compiler. Int J Parallel Program 39:296–327
Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10
Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: Proceedings of the 20th International Conference Compiler Construction (CC). Springer, Berlin, Heidelberg, pp 286–305
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. In: SIGKDD Explorations, ACM, vol 11, pp 10–18
Kiss Á, Molnár P, Sipka R (2017) RMeasure performance and energy monitoring library. https://github.com/sed-inf-u-szeged/RMeasure
Kuperberg M, Krogmann K, Reussner R (2008) Performance prediction for black-box components using reengineered parametric behaviour models. In: Proceedings of the 11th International Symposium on Component-Based Software Engineering. Springer, pp 48–63
Li D, de Supinski B, Schulz M, Cameron K, Nikolopoulos D (2010) Hybrid MPI/OpenMP power-aware computing. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS). IEEE, pp 1–12
Ma X, Dong M, Zhong L, Deng Z (2009) Statistical power consumption analysis and modeling for GPU-based computing. In: In Proceedings of SOSP Workshop on Power-aware Computing and Systems (HotPower)’09
Marin G, Mellor-Crummey J (2004) Cross-architecture performance predictions for scientific applications using parameterized models. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems. ACM, pp 2–13
Osmulski T, Muehring JT, Veale B, West JM, Li H, Vanichayobon S, Ko SH, Antonio JK, Dhall SK (2000) A probabilistic power prediction tool for the Xilinx 4000-series FPGA. In: Proceedings of the IPDPS 2000 Workshops on Parallel and Distributed Processing. Springer, pp 776–783
Pflüger D, Pfander D (2016) Computational efficiency vs. maintainability and portability. Experiences with the sparse grid code sg++. In: Proceedings of the Fourth International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SE-HPCCSE). IEEE, pp 17–25
Pouchet LN (2011) Polybench: the polyhedral benchmark suite. http://www-roc.inria.fr/~pouchet/software/polybench
Sánchez LM et al (2014) Target platform description specification. Deliverable D3.1, REPARA
Shen J, Fang J, Sips H, Varbanescu A (2012) Performance gaps between OpenMP and OpenCL for multi-core CPUs. 41st International Conference on Parallel Processing Workshops (ICPPW). IEEE Computer Society, Washington, DC, USA, pp 116–125
Stratton JA, Rodrigues C, Sung IJ, Obeid N, Chang LW, Anssari N, Liu GD, Mei W, Hwu W (2012) Parboil: a revised benchmark suite for scientific and commercial throughput computing. Technical report, University of Illinois at Urbana-Champaign
Takizawa H, Sato K, Kobayashi H (2008) SPRAT: Runtime processor selection for energy-aware computing. In: IEEE International Conference on Cluster Computing. IEEE, pp 386–393
Van Der Vaart A (1998) Asymptotic statistics, Cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge
Yang L, Ma X, Mueller F (2005) Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing. IEEE Computer Society, Washington, DC, USA, p 40
Acknowledgements
The authors would like to thank Péter Molnár and Róbert Sipka for their extensive help with dynamic measurements. This work was supported by the European Union FP7 Project “REPARA—Reengineering and Enabling Performance And poweR of Applications” (Project No. 609666), and by the EU-funded Hungarian national Grant GINOP-2.3.2-15-2016-00037 titled “Internet of Living Things.”
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bán, D., Ferenc, R., Siket, I. et al. Prediction models for performance, power, and energy efficiency of software executed on heterogeneous hardware. J Supercomput 75, 4001–4025 (2019). https://doi.org/10.1007/s11227-018-2252-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2252-6