Abstract
Iterative and time-step methods are spread far and wide in several mathematics and physics domains. At the same time, modern computers include multicore CPUs along with GPUs, so it is important to use all their computing capabilities for their efficient use. Aiming to improve performance of this kind of numerical methods, we introduce in this work a new heterogeneous parallelism CPU + GPU scheme which we call IHP. This new scheme has the advantage of being self-balanced and able to dynamically distribute the workload between CPU and GPU according to their performance on the fly. Also, it can be used with several contending technologies, like CUDA and OpenCL for GPUs or OpenMP and Intel TBB for CPUs. As a case in point, we analyse an image denoising problem based on time-step diffusion methods for brightness and chromaticity. Results show execution significant improvements in execution time using this scheme, with a minimal overhead.
Similar content being viewed by others
References
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669
Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55. https://doi.org/10.1109/99.660313
Feichtinger C, Habich J, Köstler H, Rüde U, Aoki T (2015) Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU–GPU clusters. Parallel Comput 46:1–13. https://doi.org/10.1016/j.parco.2014.12.003
GCC Developer Community (2017) GCC, the GNU compiler collection. https://gcc.gnu.org/. Accessed 15 Dec 2018
Hore A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10. IEEE Computer Society, IEEE Computer Society, Washington, DC, USA, pp 2366–2369. DOIurlhttps://doi.org/10.1109/ICPR.2010.579
Intel Corporation (2017) Intel core i5-7600 processor. https://ark.intel.com/products/97150/Intel-Core-i5-7600-Processor-6M-Cache-up-to-4_10-GHz. Accessed 08 Jan 2019
Kaleem R, Barik R, Shpeisman T, Hu C, Lewis BT, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp 151–162. https://doi.org/10.1145/2628071.2628088
Khan MAI, Delbosc N, Noakes CJ, Summers J (2015) Real-time flow simulation of indoor environments using lattice Boltzmann method. Build Simul 8(4):405–414. https://doi.org/10.1007/s12273-015-0232-9
Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460. https://doi.org/10.1016/j.jpdc.2009.01.006
Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) Diffusion methods for image denoising using IHP. https://gitlab.citius.usc.es/ruben.laso/diffusion-methods-ihp-opencl. Accessed 24 Mar 2020
Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) IHP: iterative heterogeneous parallelism. https://gitlab.citius.usc.es/ruben.laso/ihp. Accessed 24 Mar 2020
Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97
Micikevicius P (2009) 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2. ACM, New York, pp 79–84. https://doi.org/10.1145/1513895.1513905
Mittal S, Vetter JS (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35. https://doi.org/10.1145/2788396
Navarro A, Corbera F, Rodriguez A, Vilches A, Asenjo R (2019) Heterogeneous parallel\_for template for CPU–GPU chips. Int J Parallel Program 47(2):213–233. https://doi.org/10.1007/s10766-018-0555-0
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53. https://doi.org/10.1145/1365490.1365500
NVIDIA Corporation (2018) CUDA compiler driver NVCC, reference guide. https://docs.nvidia.com/pdf/CUDA_Compiler_Driver_NVCC.pdf. Accessed 7 May 2019
NVIDIA Corporation (2018) Geforce GTX 1050 Ti. https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1050-ti/specifications. Accessed 8 Jan 2019
Papadrakakis M, Stavroulakis G, Karatarakis A (2011) A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures. Comput Methods Appl Mech Eng 200(13):1490–1508. https://doi.org/10.1016/j.cma.2011.01.013
Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, GPGPU ’16. ACM, New York, pp 42–51. https://doi.org/10.1145/2884045.2884051
Pheatt C (2008) Intel threading building blocks. J Comput Sci Coll 23(4):298–298
Shams R, Sadeghi P (2011) On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters. J Parallel Distrib Comput 71(4):584–593. https://doi.org/10.1016/j.jpdc.2010.10.011
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73. https://doi.org/10.1109/MCSE.2010.69
Tang B, Sapiro G, Caselles V (2001) Color image enhancement via chromaticity diffusion. IEEE Trans Image Process 10(5):701–707. https://doi.org/10.1109/83.918563
Vilches A, Asenjo R, Navarro A, Corbera F, Gran R, Garzarán M (2015) Adaptive partitioning for irregular applications on heterogeneous CPU–GPU chips. Proc Comput Sci 51:140–149. https://doi.org/10.1016/j.procs.2015.05.213
Viñas M, Bozkus Z, Fraguela BB (2013) Exploiting heterogeneous parallelism with the heterogeneous programming library. J Parallel Distrib Comput 73(12):1627–1638. https://doi.org/10.1016/j.jpdc.2013.07.013
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Acknowledgements
This work has received financial support from the Ministerio de Economía, Industria y Competitividad within the project TIN2016-76373-P. It was also funded by the Consellería de Cultura, Educación e Ordenación Universitaria of Xunta de Galicia (accr. 2019-2022, ED431G2019/04 and reference competitive group 2019-2021, ED431C 2018/19). Thanks to Rafael Asenjo and Department of Computer Architecture of Universidad de Málaga for providing us the source code of LogFit and their help.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Laso, R., Cabaleiro, J.C., Rivera, F.F. et al. IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study. J Supercomput 77, 95–110 (2021). https://doi.org/10.1007/s11227-020-03260-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03260-8