Skip to main content

Advertisement

Log in

Accelerating block-matching and 3D filtering method for image denoising on GPUs

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Denoising photographs and video recordings is an important task in the domain of image processing. In this paper, we focus on block-matching and 3D filtering (BM3D) algorithm, which uses self-similarity of image blocks to improve the noise-filtering process. Even though this method has achieved quite impressive results in the terms of denoising quality, it is not being widely used. One of the reasons is a fact that the method is extremely computationally demanding. In this paper, we present a CUDA-accelerated implementation which increased the image processing speed significantly and brings the BM3D method much closer to real applications. The GPU implementation of the BM3D algorithm is not as straightforward as the implementation of simpler image processing methods, and we believe that some parts (especially the block-matching) can be utilized separately or provide guidelines for similar algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://opencv.org/.

  2. The two main phases were originally denoted steps [9]. We have decided to change the original terminology to avoid ambiguity of the term ‘step’ as it would become rather overused in the detailed description.

  3. The patch size is typically relatively small. We use \(k^\mathbf{hard } = 8\) in our experiments.

  4. We use \(n^\mathbf{hard }=39\) in our experiments.

  5. We use \(\tau ^\mathbf{hard } = 2500\) in our experiments.

  6. We use \(p^\mathbf{hard }=3\) in our experiments.

  7. \(\tau _{3D}\) usually comprises a 2D transform applied to each patch and 1D transform applied to the 3rd dimension of the 3D group (across the patches), while different transforms can be combined. 2D Cosine transform and 1D Walsh–Hadamard transform are used in our work.

  8. We use \(\lambda _{3D}^\mathbf{hard } = 2.7\) in our experiments.

  9. We use \(N^\mathbf{wien }=32\) in our experiments.

  10. In our experiments, we compose \(\tau _{3D}^\mathbf{wien }\) from the same transformations as in the first phase (i.e., 2D Cosine transform and 1D Walsch–Hadamard transform).

  11. Bath area was empirically selected as \(256\times 128\) pixels.

  12. Using default parameters, the selected number of threads on presented GPU architectures is 640 in the first phase and 320 in the second phase.

  13. Let us remember that \(1 \le p \le k\) holds.

  14. In the implementation, we have decided to use 16 bits for distance and \(2\times 8\) bits for offsets, thus reducing the precision of the distance. One of the reasons is that in the future the distance could be saved as floating-point with half precision instead of 16-bit integer.

  15. Unlike the image size, choice of \(\sigma\) has very little influence on the execution time.

References

  1. Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, pp. 60–65. IEEE (2005)

  2. Chen, Y., Pock, T., Ranftl, R., Bischof, H.: Revisiting loss-specific training of filter-based mrfs for image restoration. In: German Conference on Pattern Recognition, pp. 271–281. Springer (2013)

  3. Chen, Y., Yu, W., Pock, T.: On learning optimized reaction diffusion processes for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5261–5269 (2015)

  4. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising with block-matching and 3D filtering. In: Proceeding SPIE, Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning Electronic Imaging, p. 606414. International Society for Optics and Photonics (2006)

  5. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance–chrominance space. Image Process. 1, I-313 (2007)

    Google Scholar 

  6. Facciolo, G., Limare, N., Meinhardt-Llopis, E.: Integral images for block matching. Image Process. Line 4, 344–369 (2014). https://doi.org/10.5201/ipol.2014.57

    Article  Google Scholar 

  7. Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2862–2869 (2014)

  8. Huang, K., Zhang, D., Wang, K.: Non-local means denoising algorithm accelerated by gpu. In: Sixth International Symposium on Multispectral Image Processing and Pattern Recognition, p. 749711. International Society for Optics and Photonics (2009)

  9. Lebrun, M.: An analysis and implementation of the bm3d image denoising method. Image Proces. Line 2, 175 (2012)

    Article  Google Scholar 

  10. Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Non-local sparse models for image restoration. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2272–2279. IEEE (2009)

  11. Márques, A., Pardo, A.: Implementation of non local means filter in gpus. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 407–414. Springer (2013)

  12. NVIDIA.: Kepler GPU Architecture. http://www.nvidia.com/object/nvidia-kepler.html (2017). Accessed 10 Nov 2017

  13. NVIDIA.: Maxwell GPU Architecture. http://developer.nvidia.com/maxwell-compute-architecture (2017). Accessed 10 Nov 2017

  14. NVIDIA.: Pascal GPU Architecture. https://developer.nvidia.com/pascal (2017). Accessed 10 Nov 2017

  15. NVIDIA.: CUDA C Best Practices Guide. http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ (2017). Accessed 10 Nov 2017

  16. CUDA Nvidia. CUFFT library. https://developer.nvidia.com/cufft (2010). Accessed 27 Nov 2017

  17. Sarjanoja, S.: Opencl implementation of bm3d image denoising algorithm. https://github.com/Sampas/bm3dcl (2015). Accessed 10 Nov 2017

  18. Sarjanoja, S., Boutellier, J., Hannuksela, J.: Bm3d image denoising using heterogeneous computing platforms. In: 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP), pp. 1–8. IEEE (2015)

  19. Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2774–2781 (2014)

  20. Zheng, Z., Xu, W., Mueller, K.: Performance tuning for cuda-accelerated neighborhood denoising filters. In: Workshop on High Performance Image Reconstruction (HPIR) (2011)

  21. Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486. IEEE (2011)

Download references

Acknowledgements

This paper was supported by Czech Science Foundation (GAČR), Project Number P103-14-14292P, and by Specific Research Project SVV-2017-260451.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Kruliš.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Honzátko, D., Kruliš, M. Accelerating block-matching and 3D filtering method for image denoising on GPUs. J Real-Time Image Proc 16, 2273–2287 (2019). https://doi.org/10.1007/s11554-017-0737-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-017-0737-9

Keywords