Video denoising using shape-adaptive sparse representation over similar spatio-temporal patches

https://doi.org/10.1016/j.image.2011.04.005Get rights and content

Abstract

We present an effective patch-based video denoising algorithm that exploits both local and nonlocal correlations. The method groups 3D shape-adaptive patches, whose surrounding cubic neighborhoods along spatial and temporal dimensions have been found similar by patch clustering. Such grouping results in 4D data structures with arbitrary shapes. Since the obtained 4D groups are highly correlated along all the dimensions, they can be represented very sparsely with a 4D shape-adaptive DCT. The noise can be effectively attenuated by transform shrinkage. Experimental results on a wide range of videos show that this algorithm provides significant improvement over the state-of-the-art denoising algorithms in terms of both objective metric and subjective visual quality.

Highlights

► Effective patch-based video denoising algorithm exploits local and nonlocal correlations. ► Adaptive spatio-temporal neighborhood structure is searched according to local video content. ► Similar structures are stacked together for higher nonlocal correlations. ► Patch array is transformed by SA-DCT and has sparse representation in transform domain. ► Noise is attenuated by collaborative spectrum shrinkage with iterative Wiener filtering.

Introduction

In the digital imaging process, noise will be inevitably introduced into the captured images or videos (image sequences). To reduce the influence of noises, many denoising algorithms have been proposed. Among the various methods, wavelet-based methods [1], [2], [3], [4], [5], [6], [7] and patch-based nonlocal methods [8], [9], [10], [11], [12], [13] have been extensively developed and achieve the state-of-the-art denoising performance.

In the wavelet-based methods, the wavelet transform is applied on the input noisy signal. The transformed coefficients are modified to remove noises, and then transformed back to the spatial domain. Under the multiscale analysis framework, numerous powerful directional transforms are introduced into the denoising algorithms, such as dual-tree discrete wavelets (DDWT) [2], curvelet [14], contourlet [15], and ridgelet [16]. Moreover, many advanced coefficient modification strategies [3], [4] are designed to exploit the coefficient dependency, which have successfully improved the denoising performance. Recently, the image and video denoising method using the adaptive dual-tree discrete wavelet packets (ADDWP) [7] is investigated to combine the multiscale directional transform and the advanced coefficient modification, which has been proved to be effective and provided promising results. In ADDWP, the optimal wavelet packet basis is adaptively selected according to the local image characteristics. Such transform is more efficient to represent the directional features of images, while the nonlocal correlations in the images/videos are not exploited sufficiently.

In recent years, patch-based model has attracted increasingly more attention. Examples of such successful advances for denoising can be seen in the nonlocal means [8], [9], space-time adaptive filtering [10], BM3D [11], [12], and patch-based video processing [13]. In these approaches, motion-related temporal dependency is implicitly characterized by clustering similar fragments among multiple video frames.

The VBM3D method [12] is a very efficient and powerful video denoiser through a grouping and collaborative filtering procedure. The mutually similar 2D image blocks in a number of adjacent frames are stacked together into 3D arrays. Collaborative filtering produces individual estimates of all grouped blocks by jointly filtering. For video denoising, 2D blocks could be extended to 3D patches, so that motion-related temporal dependency can be effectively exploited. The benefits of such extension have been demonstrated in the previous work [13], [17]. The space-time adaptive filtering [10] considers a simple hyper-cube space-time volume to be the neighborhood shape. The spatial and temporal extents are alternatively increased until a stopping rule is satisfied, to determine the filtering window at each pixel. Then the estimate is obtained by a weighted average of data in the adaptive neighborhood. Although the size of neighborhood is chosen adaptively, the processing based on rectangular patches restricts its efficiency in reconstructing small details, sharp edges and textures. These subtle characteristics can heavily affect the subjective visual quality and further video processing. Based on image content, adaptive spatio-temporal structure is designed in [18] for video denoising and denoised by pointwise non-parametric regression approach. The pointwise SA-DCT filter [19] is used to apply shape-adaptive discrete cosine transform (SA-DCT) on arbitrary-shape neighborhoods. Such transform achieves remarkable preservation of edges and singularities. The efficiency of incorporating shape adaptation into patch-based model has been demonstrated in image denoising [20]. How to adaptively choose the size and shape of 3D patches for collaborative filtering is still an open issue in video denoising.

In this paper, we investigate shape adaptation for patch-based video denoising. In the new approach, we combine the patch clustering and SA-DCT filter to obtain better performance. The mutually similar spatio-temporal adaptive neighborhood structures are searched and stacked together, which results in 4D data array with high correlation. Then we apply 4D shape-adaptive transform to obtain sparser representation and subsequently attenuate the noise by spectrum shrinkage with iterative Wiener filtering. The estimates are returned to their original locations and aggregated with other inference results from overlapped patches by a weighted average. The algorithm is iteratively performed to further improve the performance. Our major contribution in this paper is to present the video denoising algorithm with 3D shape adaptation for collaborative filtering, and demonstrate its superior performance on various test videos.

The rest of the paper is organized as follow. The main parts of the proposed shape adaptation video denoising algorithm are introduced in Section 2. In Section 3, we discuss the parameters optimization and present the experimental results. We finally conclude the paper in Section 4.

Section snippets

Shape-adaptive patch-based denoising algorithm

There are assumptions that natural videos have high intra-patch and inter-patch correlations. Mutually similar patches in the video are abundant, and the content of small patches is locally highly correlated. Based on these assumptions, grouping similar patches exploits the redundancy that exists widely in natural videos, while shape adaptation makes the texture pattern in each patch nearly constant or very homogeneous. As a result, each grouped 4D data array is characterized by high

Effects of patch size

In the first experiment, we want to investigate the effects of patch size on the denoising performance. Intuitively, the selection of patch size is related to the image pattern. For texture region, the patch should be small to preserve the subtle details. If the pattern is smooth, the patch size should be larger. In our algorithm, the fixed-size patches are used to cluster the similar group and the adaptive shape is selected within them. Too small patch may result in failure to capture the

Conclusion

In this paper, an effective video denoising algorithm is proposed based on nonlocal and shape-adaptive patch-based video modeling. Compared with VBM3D [12], it offers extensions that the units used are 3D spatial and temporal patches and the grouped similar patches are further refined so that their shapes are adaptive to the video content. A 4D shape-adaptive transform is then applied on each group to produce a very sparse representation that is amiable to denoising. These extensions lead to

Acknowledgments

This research is supported by the National Basic Research Program of China (973 Program, No. 2010CB731800), the Chinese National Science Foundation for Outstanding Scholarship (Grant No: 60625102), and the Key Project of National Natural Sciences Foundation of China (Grant No: 60532030).

References (26)

  • J. Boulanger et al.

    Space-time adaptive for patch-based image sequence restoration

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • K. Dabov et al.

    Image denoising by sparse 3D transform-domain collaborative filtering

    IEEE Transactions on Image Processing

    (2007)
  • K. Dabov, A. Foi, K. Egiazarian, Video denoising by sparse 3D transform-domain collaborative filtering, in: Fifteenth...
  • Cited by (16)

    • Image denoising based on iterative generalized cross-validation and fast translation invariant

      2015, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Image denoising is a problem with great application prospects. Methods ranging from color image denoising [1] to more professional fields, such as medical image and remote-sensing image denoising [2,3], or to an increasingly prevalent issue of video denoising [4] are all, without exception, based on the most fundamental gray-scale image denoising. Traditionally, median filters and mean filters are widely used to reduce noise.

    • Enhancing dynamic videos for surveillance and robotic applications: The robust bilateral and temporal filter

      2014, Signal Processing: Image Communication
      Citation Excerpt :

      Spatiotemporal filters are a natural evolution of the image filters since videos can be contaminated by several types of noise, for instance, Gaussian noise, impulsive noise and quantization noise [8]. Different spatiotemporal approaches can be found in the literature [28,8,12,27,2] and they can be classified as a pixel domain technique (the denoising is done by a weighted averaging) or a transform domain technique (the denoising is conducted in a different space representation followed by an inverse transformation that is performed in the end in order to convert the space back to the pixel domain). The non-local means filter (NLM) is extended to image sequences in [5].

    • Sparse representation and learning in visual recognition: Theory and applications

      2013, Signal Processing
      Citation Excerpt :

      Their experiments on image clustering show that SCC provides better representation in the sense of semantic structure. Not limited to recognition tasks, recently the sparsity-based visual analysis has boosted various signal/image applications such as image synthesis [179,180], animation [181,182] and denoising [183,184]. Recently, Ikehata et al. [185] proposed sparse regression to handle the non-Lambertian corruptions in a photometric stereo task.

    View all citing articles on Scopus
    View full text