Signal-dependent noise removal for color videos using temporal and cross-channel priors

https://doi.org/10.1016/j.jvcir.2016.01.009Get rights and content

Highlights

  • We report a novel color video denoising method that outperforms state-of-the-arts.

  • We propose a new cross-channel prior to suppress color fringing artifacts.

  • We use the temporal prior to separate thin structures from large noise.

  • We model video noise as signal dependent with a Poisson–Gaussian noise model.

  • We incorporate the two priors and noise model into a joint optimization framework.

Abstract

Noise widely exists in video acquisition, and is especially large under low illumination conditions. Existing video denoising methods are usually at the risk of losing perceptually crucial scene details and introducing unpleasant artifacts. Inspired by high sensitivity of human vision system to thin structures and color aberration in natural images, we incorporate two video priors into a joint optimization framework besides the constraint from the adopted Poisson–Gaussian noise model: (i) we force the motion compensated frames to be a low rank matrix to separate thin structures from large noise. (ii) we utilize the consistency of image pixel gradients in different color channels as a cross channel prior to eliminate color fringing artifacts. To solve this non-convex optimization model, we derive a numerical algorithm via the augmented Lagrangian multiplier method. The effectiveness of our approach is validated by a series of experiments, with both objective and subjective evaluations.

Introduction

Because of insufficient photons in low illumination imaging, a captured video V̂ is often contaminated by large sensor noise (or photon shot noise). The degeneration can be described as a general additive model V̂=V+N, where V is the latent video and N denotes the noise. Due to the wide existence of noise and great challenge to remove it, video denoising has been extensively studied in the fields of signal processing and computer vision. Mathematically, noise removal can be summarized as an ill-posed task that separates noise (N) from latent signal (V) by introducing proper priors.

Comparing to single image denoising, video denoising benefits from the local similarity along temporal dimension of videos, i.e., there exist only small intensity changes within a temporal neighborhood. Protter et al. [1] exploit the spatio-temporal smoothness by representing videos with an overcomplete dictionary and corresponding coefficients. The coefficients’ sparsity serves as an important prior for the latent noise-free videos. Rosales-Silva et al. [2] conduct impulsive noise reduction in color videos by first extract scene motion and noise level from the last denoised frame, and then use these information and fuzzy directional filter to denoise the next frame. Rahman et al. [3], Varghese et al. [4] and Yang et al. [5] adopt sparse 3D wavelet coefficients to model the spatio-temporal redundancy, and perform denoising in the wavelet domain. Some other approaches [6], [7], [8], [9], [10] use motion compensation to register neighboring video frames, and apply specific filters either in wavelet space [7], [8], [9] or spatial space [6], [10] to suppress video noise. Dai et al. [11], [12] tell that motion compensation using inter-channel correlation obtains superior performance to those using only intra-channel cues.

Besides local similarity, nonlocal similarity is also an important prior for video denoising. Buades et al. [13] present a unified denoising theory using nonlocal similarities and prove the advantages statistically. With a similar spirit but different methodologies, many approaches exploiting nonlocal similarities are proposed, such as [14], [15], [16]. In comparison, VBM3D [17] is of high efficiency and effectiveness among all the nonlocal video denoising approaches. It regularizes the 3D grouping of mutually similar nonlocal patches in the energy spectrum domain for denoising. Later, Maggioni et al. propose BM4D [18], which extends a similar paradigm from 2D patch to 3D volumetric data, and is also applicable for video denoising by regarding videos as volumetric data. By replacing the voxels in BM4D with a sequence of video blocks following the motion trajectory, BM4D is extended to VBM4D [19] to raise the performance further. To avoid color artifacts caused by independent processing among different color channels, CVBM3D [20] utilizes identical structures across color channels by transforming input noisy RGB video to a luminance-chrominance color space, and reuses the motion estimation and grouping from the luminance to denoise both the luminance and two chrominance channels, in a similar way to VBM3D. Although with good performance and high efficiency [21], [22], VBM3D and its extensions all assume Gaussian white noise in videos, instead of signal dependent noise in real captured videos [23]. Thus they suffer from performance degeneration in real applications.

Either using local or nonlocal similarities, filtering based approaches run the risk of losing thin structures during noise suppression. Differently, some researchers incorporate low rank prior into an optimization framework to regularize the objective function. As studied in [24], using low rank prior is effective in preserving crucial image details that tend to be smoothed out by filtering. For example, Ji et al. [25], [26] propose to stack similar image patches within a spatiotemporal neighborhood into a stack, and force its latent noise-free component to be a low rank matrix. Later, Barzigar et al. [27] adopt a similar strategy but use a low-rank matrix completion algorithm based on matrix decomposition to address more complex noise. In spite of the progress, these primary studies also cannot handle signal dependent noise. Besides, using only rank minimization on the registered patches limits final denoising performance, so introducing other priors (e.g., intra-frame redundancy, global structure) is a promising research direction.

As for non-Gaussian CCD noise of raw images/videos, the most widely used model is the Poisson–Gaussian model described in [23], based on which Foi et al. [23], [28] perform effective signal dependent noise prediction. To remove such signal dependent noise, Danielyan et al. [29] and Boracchi et al. [30] transform the image patch stack to the frequency domain and enhance the coefficients’ sparseness. Later, Foi et al. [31] extend the denoising method to deal with clipped images, where signal magnitudes exceeding the imaging system’s acquisition range are truncated. There are also several approaches separating latent signal and noise based on their specific statistical properties. Zhang et al. [32] apply PCA and tensor analysis to remove noise from multi-view noisy images. Keigo et al. [33] model a noise-free image as linear combinations of similar noisy patches, and propose a probabilistical model based denoising method. In all, real CCD noise cannot be accurately represented by Gaussian white noise, and this motivates us to concentrate on signal dependent noise removal to achieve better denoising performance.

Adopting optimization methodology, this paper focuses on signal dependent noise removal from color videos. A series of subjective experiments [34], [35] tell that humans are sensitive to degeneration of thin structures (e.g., blur, ringing, distortion) and color aberrations (such as color fringing, smearing), and the widely used evaluation metric Structure Similarity (SSIM) [36] also validates this point. Unfortunately, these two kinds of degenerations are not well addressed in current denoising approaches: (i) thin structure are removed together with high frequency noise, and thus the perceived quality of denoising results is largely contaminated; (ii) color fringing artifacts arise due to that three color channels are enhanced separately and cross channel consistency is neglected. Two representative examples for these degenerations are shown in Fig. 1(a) and (b), from which we can clearly see that the visual quality suffers in both cases. To address the above disadvantages, this paper proposes two strategies as follows.

As known, usual video frames are temporally redundant. In this paper, we model this redundancy by first searching for correspondence between neighboring frames, and then aligning and stacking them into an intrinsic low rank matrix. The development of registration techniques [37], [38], [39], [40], [41] largely benefits building inter-frame correspondence. In implementation, the low rank constraint is represented by a convex nuclear norm regulation [42], and the pixels in occlusion regions are treated as missing entries which can be recovered by appropriate optimization.

Although also using the low rank prior to suppress video noise, our approach largely differentiates from the above mentioned denoising methods in that we register and stack the whole frames instead of small image patches to form a low rank matrix. Thus it owns advantages in two aspects: the registration is more robust to noise due to the constraints from global optical flow field; no additional effort is necessary to address blocking artifacts. Besides, we also utilize intra-frame priors including a cross-channel prior and a total variation prior, so much less number of frames are needed to produce satisfying results than the other low rank based techniques.

The studies in [43] tell that there exists large consistency among the structures of different color channels in a natural image. Cho et al. [44], [45] propose a novel deconvolution method utilizing locally learned gradient statistics. The promising results provide insight in using gradient information across different color channels for high quality video denosing. So far, there are several attempts [43], [46], [47] defining and applying various cross channel priors. Joshi et al. [46] propose a color gradient definition which is a linear blend of two base color channels, and use it to reduce color fringing in debluring and denoising. To deal with more complex large-area fringing, Heide et al. [43] directly match the one-step pixel gradients normalized by corresponding pixel intensities among different channels. This provides better localization and therefore suppresses color fringing better than those based on local statistics. Similarly, Guichard et al. [47] force three color channels to share similar sharpness. In spite of the efforts for exploring priors describing common structures of different color channels, the descriptors are still sensitive to intensity differences among different color channels. Therefore, this paper further proposes a novel gradient definition to describe the structure consistency across different color channels, which is independent of pixel intensities, and thus own much superior across channel consistency than previous work.

In mathematics, the proposed denoising technique is modeled as a unified optimization framework incorporating both the above two priors as well as a signal dependent noise constraint derived from the Poisson–Gaussian model [23]. The noise parameters are CCD specific and can be easily calibrated by off-the-shelf methods [48]. Although the tight coupling between latent noise free videos and signal dependent noise largely complicate the optimization, this paper provides an effective numeric solution via convex optimization. In conclusion, the proposed approach contributes mainly in following aspects:

  • Dealing with signal dependent video noise, which is more consistent with real imaging than commonly assumed Gaussian white noise.

  • Introducing a global low rank constraint for temporally aligned frame stack to preserve thin structures that are prone to lost during noise suppression.

  • Proposing a new gradient definition as the cross channel prior, which is more effective to regulate structure consistency across different color channels.

  • Designing an optimization formulation and a convex algorithm to simultaneously handle latent images’ spatiotemporal redundancy and noise’s nonlinearity.

The remainder of this paper is organized as follows: modeling and derivation of the optimization algorithm are explained in Section 2. Then, we conduct a series of experiments to validate the proposed approach in Section 3. Finally, we conclude this paper with some discussions in Section 4.

Section snippets

Modeling

In this section, we first explain the adopted noise model, flexible handling of outlier regions and the proposed cross channel prior. Then we define an unified optimization model incorporating these three constraints to remove signal dependent noise from color videos.

Experiments

In this section, we apply our algorithm to synthetic noise videos from three public video clips including two slow-motion sequences ‘Bus’ and ‘Mobile’, and a fast-motion video ‘Football’, as shown in Fig. 8. Noisy sequences are generated by adding signal dependent Poisson–Gaussian noise to the noise-free sequences with α=0.22,β=6.4×10-5 (except for the experiment studying influences from noise levels in Section 3.1). The number of frames for alignment is chosen according to specific video

Conclusions and discussions

This paper draws inspirations from the phenomenon that thin structures and cross channel consistency are crucial to videos’ perceived quality for human, and proposes an optimization framework incorporating temporal redundancy, cross channel consistency and constraint of signal dependent noise simultaneously.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61120106003, 61305026 and 61327902).

References (57)

  • A.J. Rosales-Silva et al.

    Fuzzy directional (fd) filter for impulsive noise reduction in colour video sequences

    J. Visual Commun. Image Represent.

    (2012)
  • M. Ghoniem et al.

    simplification and inpainting using discrete regularization on graphs

    Signal Process.

    (2010)
  • A. Foi

    Clipped noisy images: heteroskedastic modeling and practical denoising

    Signal Process.

    (2009)
  • M. Protter et al.

    Image sequence denoising via sparse and redundant representations

    IEEE Trans. Image Process.

    (2009)
  • S. Rahman et al.

    Video denoising based on inter-frame statistical modeling of wavelet coefficients

    IEEE Trans. Circ. Syst. Video Technol.

    (2007)
  • G. Varghese et al.

    Video denoising based on a spatiotemporal gaussian scale mixture model

    IEEE Trans. Circ. Syst. Video Technol.

    (2010)
  • J. Yang et al.

    Image and video denoising using adaptive dual-tree discrete wavelet packets

    IEEE Trans. Circ. Syst. Video Technol.

    (2009)
  • L. Guo et al.

    Temporal video denoising based on multihypothesis motion compensation

    IEEE Trans. Circ. Syst. Video Technol.

    (2007)
  • L. Jovanov et al.

    Combined wavelet-domain and motion-compensated video denoising based on video codec motion estimation methods

    IEEE Trans. Circ. Syst. Video Technol.

    (2009)
  • F. Luisier et al.

    Sure-let for orthonormal wavelet-domain video denoising

    IEEE Trans. Circ. Syst. Video Technol.

    (2010)
  • S. Yu et al.

    Video denoising using motion compensated 3-d wavelet transform with integrated recursive temporal filtering

    IEEE Trans. Circ. Syst. Video Technol.

    (2010)
  • L. Guo et al.

    Integration of recursive temporal lmmse denoising filter into video codec

    IEEE Trans. Circ. Syst. Video Technol.

    (2010)
  • J. Dai et al.

    Color video denoising based on combined interframe and intercolor prediction

    IEEE Trans. Circ. Syst. Video Technol.

    (2013)
  • J. Dai, O. Au, W. Yang, C. Pang, F. Zou, X. Wen, Color video denoising based on adaptive color space conversion, in:...
  • A. Buades et al.

    Nonlocal image and movie denoising

    Int. J. Comput. Vision

    (2008)
  • J.S.D. Bonet, Noise reduction through detection of signal redundancy, Tech. rep., Rethinking Artificial Intelligence,...
  • H. Zhang et al.

    Image and video restorations via nonlocal kernel regression

    IEEE Trans. Cybernet.

    (2013)
  • K. Dabov, A. Foi, K. Egiazarian, Video denoising by sparse 3D transform-domain collaborative filtering, in: EUSIPCO,...
  • M. Maggioni et al.

    Nonlocal transform-domain filter for volumetric data denoising and reconstruction

    IEEE Trans. Image Process.

    (2013)
  • M. Maggioni et al.

    Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms

    IEEE Trans. Image Process.

    (2012)
  • M. Maggioni, A. Danielyan, K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image and video denoising by sparse 3d...
  • M.T. Maggioni, Video filtering using separable four-dimensional nonlocal spatiotemporal transforms, Master’s thesis,...
  • M.T. Maggioni, Adaptive nonlocal signal restoration and enhancement techniques for high-dimensional data, Ph.D. thesis,...
  • A. Foi et al.

    Practical Poissonian–Gaussian noise modeling and fitting for single-image raw-data

    IEEE Trans. Image Process.

    (2008)
  • J. Suo et al.

    Joint non-Gaussian denoising and superresolving of raw high frame rate videos

    IEEE Trans. Image Process.

    (2014)
  • H. Ji, C. Liu, Z. Shen, Y. Xu, Robust video denoising using low rank matrix completion, in: CVPR, 2010, pp....
  • H. Ji et al.

    Robust video restoration by joint sparse and low rank matrix approximation

    SIAM J. Imaging Sci.

    (2011)
  • N. Barzigar, A. Roozgard, S. Cheng, P. Verma, An efficient video denoising method using decomposition approach for...
  • Cited by (0)

    This paper has been recommended for acceptance by Yehoshua Zeevi.

    View full text