IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study

Laso, Ruben; Cabaleiro, José C.; Rivera, Francisco F.; Muñiz, M. Carmen; Álvarez-Dios, José A.

doi:10.1007/s11227-020-03260-8

IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study

Published: 26 March 2020

Volume 77, pages 95–110, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

315 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Iterative and time-step methods are spread far and wide in several mathematics and physics domains. At the same time, modern computers include multicore CPUs along with GPUs, so it is important to use all their computing capabilities for their efficient use. Aiming to improve performance of this kind of numerical methods, we introduce in this work a new heterogeneous parallelism CPU + GPU scheme which we call IHP. This new scheme has the advantage of being self-balanced and able to dynamically distribute the workload between CPU and GPU according to their performance on the fly. Also, it can be used with several contending technologies, like CUDA and OpenCL for GPUs or OpenMP and Intel TBB for CPUs. As a case in point, we analyse an image denoising problem based on time-step diffusion methods for brightness and chromaticity. Results show execution significant improvements in execution time using this scheme, with a minimal overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deconvolution of Huge 3-D Images: Parallelization Strategies on a Multi-GPU System

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

Article 15 October 2014

CUDA-accelerated fast Sauvola’s method on Kepler architecture

Article 23 September 2014

References

Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669
Article Google Scholar
Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55. https://doi.org/10.1109/99.660313
Article Google Scholar
Feichtinger C, Habich J, Köstler H, Rüde U, Aoki T (2015) Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU–GPU clusters. Parallel Comput 46:1–13. https://doi.org/10.1016/j.parco.2014.12.003
Article MathSciNet Google Scholar
GCC Developer Community (2017) GCC, the GNU compiler collection. https://gcc.gnu.org/. Accessed 15 Dec 2018
Hore A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10. IEEE Computer Society, IEEE Computer Society, Washington, DC, USA, pp 2366–2369. DOIurlhttps://doi.org/10.1109/ICPR.2010.579
Intel Corporation (2017) Intel core i5-7600 processor. https://ark.intel.com/products/97150/Intel-Core-i5-7600-Processor-6M-Cache-up-to-4_10-GHz. Accessed 08 Jan 2019
Kaleem R, Barik R, Shpeisman T, Hu C, Lewis BT, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp 151–162. https://doi.org/10.1145/2628071.2628088
Khan MAI, Delbosc N, Noakes CJ, Summers J (2015) Real-time flow simulation of indoor environments using lattice Boltzmann method. Build Simul 8(4):405–414. https://doi.org/10.1007/s12273-015-0232-9
Article Google Scholar
Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460. https://doi.org/10.1016/j.jpdc.2009.01.006
Article Google Scholar
Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) Diffusion methods for image denoising using IHP. https://gitlab.citius.usc.es/ruben.laso/diffusion-methods-ihp-opencl. Accessed 24 Mar 2020
Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) IHP: iterative heterogeneous parallelism. https://gitlab.citius.usc.es/ruben.laso/ihp. Accessed 24 Mar 2020
Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97
Article MathSciNet Google Scholar
Micikevicius P (2009) 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2. ACM, New York, pp 79–84. https://doi.org/10.1145/1513895.1513905
Mittal S, Vetter JS (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35. https://doi.org/10.1145/2788396
Article Google Scholar
Navarro A, Corbera F, Rodriguez A, Vilches A, Asenjo R (2019) Heterogeneous parallel\_for template for CPU–GPU chips. Int J Parallel Program 47(2):213–233. https://doi.org/10.1007/s10766-018-0555-0
Article Google Scholar
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53. https://doi.org/10.1145/1365490.1365500
Article Google Scholar
NVIDIA Corporation (2018) CUDA compiler driver NVCC, reference guide. https://docs.nvidia.com/pdf/CUDA_Compiler_Driver_NVCC.pdf. Accessed 7 May 2019
NVIDIA Corporation (2018) Geforce GTX 1050 Ti. https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1050-ti/specifications. Accessed 8 Jan 2019
Papadrakakis M, Stavroulakis G, Karatarakis A (2011) A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures. Comput Methods Appl Mech Eng 200(13):1490–1508. https://doi.org/10.1016/j.cma.2011.01.013
Article MathSciNet MATH Google Scholar
Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, GPGPU ’16. ACM, New York, pp 42–51. https://doi.org/10.1145/2884045.2884051
Pheatt C (2008) Intel threading building blocks. J Comput Sci Coll 23(4):298–298
Google Scholar
Shams R, Sadeghi P (2011) On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters. J Parallel Distrib Comput 71(4):584–593. https://doi.org/10.1016/j.jpdc.2010.10.011
Article MATH Google Scholar
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73. https://doi.org/10.1109/MCSE.2010.69
Article Google Scholar
Tang B, Sapiro G, Caselles V (2001) Color image enhancement via chromaticity diffusion. IEEE Trans Image Process 10(5):701–707. https://doi.org/10.1109/83.918563
Article MATH Google Scholar
Vilches A, Asenjo R, Navarro A, Corbera F, Gran R, Garzarán M (2015) Adaptive partitioning for irregular applications on heterogeneous CPU–GPU chips. Proc Comput Sci 51:140–149. https://doi.org/10.1016/j.procs.2015.05.213
Article Google Scholar
Viñas M, Bozkus Z, Fraguela BB (2013) Exploiting heterogeneous parallelism with the heterogeneous programming library. J Parallel Distrib Comput 73(12):1627–1638. https://doi.org/10.1016/j.jpdc.2013.07.013
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar

Download references

Acknowledgements

This work has received financial support from the Ministerio de Economía, Industria y Competitividad within the project TIN2016-76373-P. It was also funded by the Consellería de Cultura, Educación e Ordenación Universitaria of Xunta de Galicia (accr. 2019-2022, ED431G2019/04 and reference competitive group 2019-2021, ED431C 2018/19). Thanks to Rafael Asenjo and Department of Computer Architecture of Universidad de Málaga for providing us the source code of LogFit and their help.

Author information

Authors and Affiliations

CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Ruben Laso, José C. Cabaleiro & Francisco F. Rivera
Departamento Matemática Aplicada, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
M. Carmen Muñiz & José A. Álvarez-Dios

Authors

Ruben Laso
View author publications
You can also search for this author in PubMed Google Scholar
José C. Cabaleiro
View author publications
You can also search for this author in PubMed Google Scholar
Francisco F. Rivera
View author publications
You can also search for this author in PubMed Google Scholar
M. Carmen Muñiz
View author publications
You can also search for this author in PubMed Google Scholar
José A. Álvarez-Dios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruben Laso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laso, R., Cabaleiro, J.C., Rivera, F.F. et al. IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study. J Supercomput 77, 95–110 (2021). https://doi.org/10.1007/s11227-020-03260-8

Download citation

Published: 26 March 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11227-020-03260-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study

Abstract

Access this article

Similar content being viewed by others

Deconvolution of Huge 3-D Images: Parallelization Strategies on a Multi-GPU System

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

CUDA-accelerated fast Sauvola’s method on Kepler architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study

Abstract

Access this article

Similar content being viewed by others

Deconvolution of Huge 3-D Images: Parallelization Strategies on a Multi-GPU System

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

CUDA-accelerated fast Sauvola’s method on Kepler architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation