Skip to main content
Log in

IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Iterative and time-step methods are spread far and wide in several mathematics and physics domains. At the same time, modern computers include multicore CPUs along with GPUs, so it is important to use all their computing capabilities for their efficient use. Aiming to improve performance of this kind of numerical methods, we introduce in this work a new heterogeneous parallelism CPU + GPU scheme which we call IHP. This new scheme has the advantage of being self-balanced and able to dynamically distribute the workload between CPU and GPU according to their performance on the fly. Also, it can be used with several contending technologies, like CUDA and OpenCL for GPUs or OpenMP and Intel TBB for CPUs. As a case in point, we analyse an image denoising problem based on time-step diffusion methods for brightness and chromaticity. Results show execution significant improvements in execution time using this scheme, with a minimal overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669

    Article  Google Scholar 

  2. Dagum L, Menon R (1998) OpenMP: an industry-standard API for shared-memory programming. IEEE Comput Sci Eng 5(1):46–55. https://doi.org/10.1109/99.660313

    Article  Google Scholar 

  3. Feichtinger C, Habich J, Köstler H, Rüde U, Aoki T (2015) Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU–GPU clusters. Parallel Comput 46:1–13. https://doi.org/10.1016/j.parco.2014.12.003

    Article  MathSciNet  Google Scholar 

  4. GCC Developer Community (2017) GCC, the GNU compiler collection. https://gcc.gnu.org/. Accessed 15 Dec 2018

  5. Hore A, Ziou D (2010) Image quality metrics: PSNR vs. SSIM. In: Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR ’10. IEEE Computer Society, IEEE Computer Society, Washington, DC, USA, pp 2366–2369. DOIurlhttps://doi.org/10.1109/ICPR.2010.579

  6. Intel Corporation (2017) Intel core i5-7600 processor. https://ark.intel.com/products/97150/Intel-Core-i5-7600-Processor-6M-Cache-up-to-4_10-GHz. Accessed 08 Jan 2019

  7. Kaleem R, Barik R, Shpeisman T, Hu C, Lewis BT, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), pp 151–162. https://doi.org/10.1145/2628071.2628088

  8. Khan MAI, Delbosc N, Noakes CJ, Summers J (2015) Real-time flow simulation of indoor environments using lattice Boltzmann method. Build Simul 8(4):405–414. https://doi.org/10.1007/s12273-015-0232-9

    Article  Google Scholar 

  9. Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460. https://doi.org/10.1016/j.jpdc.2009.01.006

    Article  Google Scholar 

  10. Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) Diffusion methods for image denoising using IHP. https://gitlab.citius.usc.es/ruben.laso/diffusion-methods-ihp-opencl. Accessed 24 Mar 2020

  11. Laso R, Cabaleiro JC, Rivera FF, Muñiz MC, Álvarez-Dios JA (2019) IHP: iterative heterogeneous parallelism. https://gitlab.citius.usc.es/ruben.laso/ihp. Accessed 24 Mar 2020

  12. Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97

    Article  MathSciNet  Google Scholar 

  13. Micikevicius P (2009) 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-2. ACM, New York, pp 79–84. https://doi.org/10.1145/1513895.1513905

  14. Mittal S, Vetter JS (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):69:1–69:35. https://doi.org/10.1145/2788396

    Article  Google Scholar 

  15. Navarro A, Corbera F, Rodriguez A, Vilches A, Asenjo R (2019) Heterogeneous parallel\_for template for CPU–GPU chips. Int J Parallel Program 47(2):213–233. https://doi.org/10.1007/s10766-018-0555-0

    Article  Google Scholar 

  16. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53. https://doi.org/10.1145/1365490.1365500

    Article  Google Scholar 

  17. NVIDIA Corporation (2018) CUDA compiler driver NVCC, reference guide. https://docs.nvidia.com/pdf/CUDA_Compiler_Driver_NVCC.pdf. Accessed 7 May 2019

  18. NVIDIA Corporation (2018) Geforce GTX 1050 Ti. https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-1050-ti/specifications. Accessed 8 Jan 2019

  19. Papadrakakis M, Stavroulakis G, Karatarakis A (2011) A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures. Comput Methods Appl Mech Eng 200(13):1490–1508. https://doi.org/10.1016/j.cma.2011.01.013

    Article  MathSciNet  MATH  Google Scholar 

  20. Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, GPGPU ’16. ACM, New York, pp 42–51. https://doi.org/10.1145/2884045.2884051

  21. Pheatt C (2008) Intel threading building blocks. J Comput Sci Coll 23(4):298–298

    Google Scholar 

  22. Shams R, Sadeghi P (2011) On optimization of finite-difference time-domain (FDTD) computation on heterogeneous and GPU clusters. J Parallel Distrib Comput 71(4):584–593. https://doi.org/10.1016/j.jpdc.2010.10.011

    Article  MATH  Google Scholar 

  23. Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73. https://doi.org/10.1109/MCSE.2010.69

    Article  Google Scholar 

  24. Tang B, Sapiro G, Caselles V (2001) Color image enhancement via chromaticity diffusion. IEEE Trans Image Process 10(5):701–707. https://doi.org/10.1109/83.918563

    Article  MATH  Google Scholar 

  25. Vilches A, Asenjo R, Navarro A, Corbera F, Gran R, Garzarán M (2015) Adaptive partitioning for irregular applications on heterogeneous CPU–GPU chips. Proc Comput Sci 51:140–149. https://doi.org/10.1016/j.procs.2015.05.213

    Article  Google Scholar 

  26. Viñas M, Bozkus Z, Fraguela BB (2013) Exploiting heterogeneous parallelism with the heterogeneous programming library. J Parallel Distrib Comput 73(12):1627–1638. https://doi.org/10.1016/j.jpdc.2013.07.013

    Article  Google Scholar 

  27. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

Download references

Acknowledgements

This work has received financial support from the Ministerio de Economía, Industria y Competitividad within the project TIN2016-76373-P. It was also funded by the Consellería de Cultura, Educación e Ordenación Universitaria of Xunta de Galicia (accr. 2019-2022, ED431G2019/04 and reference competitive group 2019-2021, ED431C 2018/19). Thanks to Rafael Asenjo and Department of Computer Architecture of Universidad de Málaga for providing us the source code of LogFit and their help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruben Laso.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laso, R., Cabaleiro, J.C., Rivera, F.F. et al. IHP: a dynamic heterogeneous parallel scheme for iterative or time-step methods—image denoising as case study. J Supercomput 77, 95–110 (2021). https://doi.org/10.1007/s11227-020-03260-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03260-8

Keywords

Navigation