Abstract
Denoising photographs and video recordings is an important task in the domain of image processing. In this paper, we focus on block-matching and 3D filtering (BM3D) algorithm, which uses self-similarity of image blocks to improve the noise-filtering process. Even though this method has achieved quite impressive results in the terms of denoising quality, it is not being widely used. One of the reasons is a fact that the method is extremely computationally demanding. In this paper, we present a CUDA-accelerated implementation which increased the image processing speed significantly and brings the BM3D method much closer to real applications. The GPU implementation of the BM3D algorithm is not as straightforward as the implementation of simpler image processing methods, and we believe that some parts (especially the block-matching) can be utilized separately or provide guidelines for similar algorithms.












Similar content being viewed by others
Notes
The two main phases were originally denoted steps [9]. We have decided to change the original terminology to avoid ambiguity of the term ‘step’ as it would become rather overused in the detailed description.
The patch size is typically relatively small. We use \(k^\mathbf{hard } = 8\) in our experiments.
We use \(n^\mathbf{hard }=39\) in our experiments.
We use \(\tau ^\mathbf{hard } = 2500\) in our experiments.
We use \(p^\mathbf{hard }=3\) in our experiments.
\(\tau _{3D}\) usually comprises a 2D transform applied to each patch and 1D transform applied to the 3rd dimension of the 3D group (across the patches), while different transforms can be combined. 2D Cosine transform and 1D Walsh–Hadamard transform are used in our work.
We use \(\lambda _{3D}^\mathbf{hard } = 2.7\) in our experiments.
We use \(N^\mathbf{wien }=32\) in our experiments.
In our experiments, we compose \(\tau _{3D}^\mathbf{wien }\) from the same transformations as in the first phase (i.e., 2D Cosine transform and 1D Walsch–Hadamard transform).
Bath area was empirically selected as \(256\times 128\) pixels.
Using default parameters, the selected number of threads on presented GPU architectures is 640 in the first phase and 320 in the second phase.
Let us remember that \(1 \le p \le k\) holds.
In the implementation, we have decided to use 16 bits for distance and \(2\times 8\) bits for offsets, thus reducing the precision of the distance. One of the reasons is that in the future the distance could be saved as floating-point with half precision instead of 16-bit integer.
Unlike the image size, choice of \(\sigma\) has very little influence on the execution time.
References
Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, pp. 60–65. IEEE (2005)
Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based mrfs for image restoration. In: German Conference on Pattern Recognition, pp. 271–281. Springer (2013)
Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5261–5269 (2015)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising with block-matching and 3D filtering. In: Proceeding SPIE, Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning Electronic Imaging, p. 606414. International Society for Optics and Photonics (2006)
Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance–chrominance space. Image Process. 1, I-313 (2007)
Facciolo, G., Limare, N., Meinhardt-Llopis, E.: Integral images for block matching. Image Process. Line 4, 344–369 (2014). https://doi.org/10.5201/ipol.2014.57
Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869 (2014)
Huang, K., Zhang, D., Wang, K.: Non-local means denoising algorithm accelerated by gpu. In: Sixth International Symposium on Multispectral Image Processing and Pattern Recognition, p. 749711. International Society for Optics and Photonics (2009)
Lebrun, M.: An analysis and implementation of the bm3d image denoising method. Image Proces. Line 2, 175 (2012)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Non-local sparse models for image restoration. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2272–2279. IEEE (2009)
Márques, A., Pardo, A.: Implementation of non local means filter in gpus. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 407–414. Springer (2013)
NVIDIA.: Kepler GPU Architecture. http://www.nvidia.com/object/nvidia-kepler.html (2017). Accessed 10 Nov 2017
NVIDIA.: Maxwell GPU Architecture. http://developer.nvidia.com/maxwell-compute-architecture (2017). Accessed 10 Nov 2017
NVIDIA.: Pascal GPU Architecture. https://developer.nvidia.com/pascal (2017). Accessed 10 Nov 2017
NVIDIA.: CUDA C Best Practices Guide. http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ (2017). Accessed 10 Nov 2017
CUDA Nvidia. CUFFT library. https://developer.nvidia.com/cufft (2010). Accessed 27 Nov 2017
Sarjanoja, S.: Opencl implementation of bm3d image denoising algorithm. https://github.com/Sampas/bm3dcl (2015). Accessed 10 Nov 2017
Sarjanoja, S., Boutellier, J., Hannuksela, J.: Bm3d image denoising using heterogeneous computing platforms. In: 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8. IEEE (2015)
Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2774–2781 (2014)
Zheng, Z., Xu, W., Mueller, K.: Performance tuning for cuda-accelerated neighborhood denoising filters. In: Workshop on High Performance Image Reconstruction (HPIR) (2011)
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE (2011)
Acknowledgements
This paper was supported by Czech Science Foundation (GAČR), Project Number P103-14-14292P, and by Specific Research Project SVV-2017-260451.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Honzátko, D., Kruliš, M. Accelerating block-matching and 3D filtering method for image denoising on GPUs. J Real-Time Image Proc 16, 2273–2287 (2019). https://doi.org/10.1007/s11554-017-0737-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-017-0737-9