Accelerating block-matching and 3D filtering method for image denoising on GPUs

Honzátko, David; Kruliš, Martin

doi:10.1007/s11554-017-0737-9

Accelerating block-matching and 3D filtering method for image denoising on GPUs

Original Research Paper
Published: 29 November 2017

Volume 16, pages 2273–2287, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

1537 Accesses
25 Citations
Explore all metrics

Abstract

Denoising photographs and video recordings is an important task in the domain of image processing. In this paper, we focus on block-matching and 3D filtering (BM3D) algorithm, which uses self-similarity of image blocks to improve the noise-filtering process. Even though this method has achieved quite impressive results in the terms of denoising quality, it is not being widely used. One of the reasons is a fact that the method is extremely computationally demanding. In this paper, we present a CUDA-accelerated implementation which increased the image processing speed significantly and brings the BM3D method much closer to real applications. The GPU implementation of the BM3D algorithm is not as straightforward as the implementation of simpler image processing methods, and we believe that some parts (especially the block-matching) can be utilized separately or provide guidelines for similar algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast deconvolution-based approach for single-image super-resolution with GPU acceleration

Article 16 June 2015

Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion

Article 08 February 2017

Improved BM3D method with modified block-matching and multi-scaled images

Article 21 February 2022

Notes

http://opencv.org/.
The two main phases were originally denoted steps [9]. We have decided to change the original terminology to avoid ambiguity of the term ‘step’ as it would become rather overused in the detailed description.
The patch size is typically relatively small. We use $k^\mathbf{hard } = 8$ in our experiments.
We use $n^\mathbf{hard }=39$ in our experiments.
We use $\tau ^\mathbf{hard } = 2500$ in our experiments.
We use $p^\mathbf{hard }=3$ in our experiments.
$\tau _{3D}$ usually comprises a 2D transform applied to each patch and 1D transform applied to the 3rd dimension of the 3D group (across the patches), while different transforms can be combined. 2D Cosine transform and 1D Walsh–Hadamard transform are used in our work.
We use $\lambda _{3D}^\mathbf{hard } = 2.7$ in our experiments.
We use $N^\mathbf{wien }=32$ in our experiments.
In our experiments, we compose $\tau _{3D}^\mathbf{wien }$ from the same transformations as in the first phase (i.e., 2D Cosine transform and 1D Walsch–Hadamard transform).
Bath area was empirically selected as $256\times 128$ pixels.
Using default parameters, the selected number of threads on presented GPU architectures is 640 in the first phase and 320 in the second phase.
Let us remember that $1 \le p \le k$ holds.
In the implementation, we have decided to use 16 bits for distance and $2\times 8$ bits for offsets, thus reducing the precision of the distance. One of the reasons is that in the future the distance could be saved as floating-point with half precision instead of 16-bit integer.
Unlike the image size, choice of $\sigma$ has very little influence on the execution time.

References

Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, pp. 60–65. IEEE (2005)
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based mrfs for image restoration. In: German Conference on Pattern Recognition, pp. 271–281. Springer (2013)
Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5261–5269 (2015)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising with block-matching and 3D filtering. In: Proceeding SPIE, Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning Electronic Imaging, p. 606414. International Society for Optics and Photonics (2006)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance–chrominance space. Image Process. 1, I-313 (2007)
Google Scholar
Facciolo, G., Limare, N., Meinhardt-Llopis, E.: Integral images for block matching. Image Process. Line 4, 344–369 (2014). https://doi.org/10.5201/ipol.2014.57
Article Google Scholar
Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869 (2014)
Huang, K., Zhang, D., Wang, K.: Non-local means denoising algorithm accelerated by gpu. In: Sixth International Symposium on Multispectral Image Processing and Pattern Recognition, p. 749711. International Society for Optics and Photonics (2009)
Lebrun, M.: An analysis and implementation of the bm3d image denoising method. Image Proces. Line 2, 175 (2012)
Article Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Non-local sparse models for image restoration. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2272–2279. IEEE (2009)
Márques, A., Pardo, A.: Implementation of non local means filter in gpus. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 407–414. Springer (2013)
NVIDIA.: Kepler GPU Architecture. http://www.nvidia.com/object/nvidia-kepler.html (2017). Accessed 10 Nov 2017
NVIDIA.: Maxwell GPU Architecture. http://developer.nvidia.com/maxwell-compute-architecture (2017). Accessed 10 Nov 2017
NVIDIA.: Pascal GPU Architecture. https://developer.nvidia.com/pascal (2017). Accessed 10 Nov 2017
NVIDIA.: CUDA C Best Practices Guide. http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ (2017). Accessed 10 Nov 2017
CUDA Nvidia. CUFFT library. https://developer.nvidia.com/cufft (2010). Accessed 27 Nov 2017
Sarjanoja, S.: Opencl implementation of bm3d image denoising algorithm. https://github.com/Sampas/bm3dcl (2015). Accessed 10 Nov 2017
Sarjanoja, S., Boutellier, J., Hannuksela, J.: Bm3d image denoising using heterogeneous computing platforms. In: 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8. IEEE (2015)
Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2774–2781 (2014)
Zheng, Z., Xu, W., Mueller, K.: Performance tuning for cuda-accelerated neighborhood denoising filters. In: Workshop on High Performance Image Reconstruction (HPIR) (2011)
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE (2011)

Download references

Acknowledgements

This paper was supported by Czech Science Foundation (GAČR), Project Number P103-14-14292P, and by Specific Research Project SVV-2017-260451.

Author information

Authors and Affiliations

Parallel Architectures/Algorithms/Applications Research Group, Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, Prague, Czech Republic
David Honzátko & Martin Kruliš

Authors

David Honzátko
View author publications
You can also search for this author inPubMed Google Scholar
Martin Kruliš
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Martin Kruliš.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Honzátko, D., Kruliš, M. Accelerating block-matching and 3D filtering method for image denoising on GPUs. J Real-Time Image Proc 16, 2273–2287 (2019). https://doi.org/10.1007/s11554-017-0737-9

Download citation

Received: 10 April 2017
Accepted: 19 November 2017
Published: 29 November 2017
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11554-017-0737-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating block-matching and 3D filtering method for image denoising on GPUs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A fast deconvolution-based approach for single-image super-resolution with GPU acceleration

Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion

Improved BM3D method with modified block-matching and multi-scaled images

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now