Signal-dependent noise removal for color videos using temporal and cross-channel priors☆
Introduction
Because of insufficient photons in low illumination imaging, a captured video is often contaminated by large sensor noise (or photon shot noise). The degeneration can be described as a general additive model , where is the latent video and denotes the noise. Due to the wide existence of noise and great challenge to remove it, video denoising has been extensively studied in the fields of signal processing and computer vision. Mathematically, noise removal can be summarized as an ill-posed task that separates noise () from latent signal () by introducing proper priors.
Comparing to single image denoising, video denoising benefits from the local similarity along temporal dimension of videos, i.e., there exist only small intensity changes within a temporal neighborhood. Protter et al. [1] exploit the spatio-temporal smoothness by representing videos with an overcomplete dictionary and corresponding coefficients. The coefficients’ sparsity serves as an important prior for the latent noise-free videos. Rosales-Silva et al. [2] conduct impulsive noise reduction in color videos by first extract scene motion and noise level from the last denoised frame, and then use these information and fuzzy directional filter to denoise the next frame. Rahman et al. [3], Varghese et al. [4] and Yang et al. [5] adopt sparse 3D wavelet coefficients to model the spatio-temporal redundancy, and perform denoising in the wavelet domain. Some other approaches [6], [7], [8], [9], [10] use motion compensation to register neighboring video frames, and apply specific filters either in wavelet space [7], [8], [9] or spatial space [6], [10] to suppress video noise. Dai et al. [11], [12] tell that motion compensation using inter-channel correlation obtains superior performance to those using only intra-channel cues.
Besides local similarity, nonlocal similarity is also an important prior for video denoising. Buades et al. [13] present a unified denoising theory using nonlocal similarities and prove the advantages statistically. With a similar spirit but different methodologies, many approaches exploiting nonlocal similarities are proposed, such as [14], [15], [16]. In comparison, VBM3D [17] is of high efficiency and effectiveness among all the nonlocal video denoising approaches. It regularizes the 3D grouping of mutually similar nonlocal patches in the energy spectrum domain for denoising. Later, Maggioni et al. propose BM4D [18], which extends a similar paradigm from 2D patch to 3D volumetric data, and is also applicable for video denoising by regarding videos as volumetric data. By replacing the voxels in BM4D with a sequence of video blocks following the motion trajectory, BM4D is extended to VBM4D [19] to raise the performance further. To avoid color artifacts caused by independent processing among different color channels, CVBM3D [20] utilizes identical structures across color channels by transforming input noisy RGB video to a luminance-chrominance color space, and reuses the motion estimation and grouping from the luminance to denoise both the luminance and two chrominance channels, in a similar way to VBM3D. Although with good performance and high efficiency [21], [22], VBM3D and its extensions all assume Gaussian white noise in videos, instead of signal dependent noise in real captured videos [23]. Thus they suffer from performance degeneration in real applications.
Either using local or nonlocal similarities, filtering based approaches run the risk of losing thin structures during noise suppression. Differently, some researchers incorporate low rank prior into an optimization framework to regularize the objective function. As studied in [24], using low rank prior is effective in preserving crucial image details that tend to be smoothed out by filtering. For example, Ji et al. [25], [26] propose to stack similar image patches within a spatiotemporal neighborhood into a stack, and force its latent noise-free component to be a low rank matrix. Later, Barzigar et al. [27] adopt a similar strategy but use a low-rank matrix completion algorithm based on matrix decomposition to address more complex noise. In spite of the progress, these primary studies also cannot handle signal dependent noise. Besides, using only rank minimization on the registered patches limits final denoising performance, so introducing other priors (e.g., intra-frame redundancy, global structure) is a promising research direction.
As for non-Gaussian CCD noise of raw images/videos, the most widely used model is the Poisson–Gaussian model described in [23], based on which Foi et al. [23], [28] perform effective signal dependent noise prediction. To remove such signal dependent noise, Danielyan et al. [29] and Boracchi et al. [30] transform the image patch stack to the frequency domain and enhance the coefficients’ sparseness. Later, Foi et al. [31] extend the denoising method to deal with clipped images, where signal magnitudes exceeding the imaging system’s acquisition range are truncated. There are also several approaches separating latent signal and noise based on their specific statistical properties. Zhang et al. [32] apply PCA and tensor analysis to remove noise from multi-view noisy images. Keigo et al. [33] model a noise-free image as linear combinations of similar noisy patches, and propose a probabilistical model based denoising method. In all, real CCD noise cannot be accurately represented by Gaussian white noise, and this motivates us to concentrate on signal dependent noise removal to achieve better denoising performance.
Adopting optimization methodology, this paper focuses on signal dependent noise removal from color videos. A series of subjective experiments [34], [35] tell that humans are sensitive to degeneration of thin structures (e.g., blur, ringing, distortion) and color aberrations (such as color fringing, smearing), and the widely used evaluation metric Structure Similarity (SSIM) [36] also validates this point. Unfortunately, these two kinds of degenerations are not well addressed in current denoising approaches: (i) thin structure are removed together with high frequency noise, and thus the perceived quality of denoising results is largely contaminated; (ii) color fringing artifacts arise due to that three color channels are enhanced separately and cross channel consistency is neglected. Two representative examples for these degenerations are shown in Fig. 1(a) and (b), from which we can clearly see that the visual quality suffers in both cases. To address the above disadvantages, this paper proposes two strategies as follows.
As known, usual video frames are temporally redundant. In this paper, we model this redundancy by first searching for correspondence between neighboring frames, and then aligning and stacking them into an intrinsic low rank matrix. The development of registration techniques [37], [38], [39], [40], [41] largely benefits building inter-frame correspondence. In implementation, the low rank constraint is represented by a convex nuclear norm regulation [42], and the pixels in occlusion regions are treated as missing entries which can be recovered by appropriate optimization.
Although also using the low rank prior to suppress video noise, our approach largely differentiates from the above mentioned denoising methods in that we register and stack the whole frames instead of small image patches to form a low rank matrix. Thus it owns advantages in two aspects: the registration is more robust to noise due to the constraints from global optical flow field; no additional effort is necessary to address blocking artifacts. Besides, we also utilize intra-frame priors including a cross-channel prior and a total variation prior, so much less number of frames are needed to produce satisfying results than the other low rank based techniques.
The studies in [43] tell that there exists large consistency among the structures of different color channels in a natural image. Cho et al. [44], [45] propose a novel deconvolution method utilizing locally learned gradient statistics. The promising results provide insight in using gradient information across different color channels for high quality video denosing. So far, there are several attempts [43], [46], [47] defining and applying various cross channel priors. Joshi et al. [46] propose a color gradient definition which is a linear blend of two base color channels, and use it to reduce color fringing in debluring and denoising. To deal with more complex large-area fringing, Heide et al. [43] directly match the one-step pixel gradients normalized by corresponding pixel intensities among different channels. This provides better localization and therefore suppresses color fringing better than those based on local statistics. Similarly, Guichard et al. [47] force three color channels to share similar sharpness. In spite of the efforts for exploring priors describing common structures of different color channels, the descriptors are still sensitive to intensity differences among different color channels. Therefore, this paper further proposes a novel gradient definition to describe the structure consistency across different color channels, which is independent of pixel intensities, and thus own much superior across channel consistency than previous work.
In mathematics, the proposed denoising technique is modeled as a unified optimization framework incorporating both the above two priors as well as a signal dependent noise constraint derived from the Poisson–Gaussian model [23]. The noise parameters are CCD specific and can be easily calibrated by off-the-shelf methods [48]. Although the tight coupling between latent noise free videos and signal dependent noise largely complicate the optimization, this paper provides an effective numeric solution via convex optimization. In conclusion, the proposed approach contributes mainly in following aspects:
- •
Dealing with signal dependent video noise, which is more consistent with real imaging than commonly assumed Gaussian white noise.
- •
Introducing a global low rank constraint for temporally aligned frame stack to preserve thin structures that are prone to lost during noise suppression.
- •
Proposing a new gradient definition as the cross channel prior, which is more effective to regulate structure consistency across different color channels.
- •
Designing an optimization formulation and a convex algorithm to simultaneously handle latent images’ spatiotemporal redundancy and noise’s nonlinearity.
The remainder of this paper is organized as follows: modeling and derivation of the optimization algorithm are explained in Section 2. Then, we conduct a series of experiments to validate the proposed approach in Section 3. Finally, we conclude this paper with some discussions in Section 4.
Section snippets
Modeling
In this section, we first explain the adopted noise model, flexible handling of outlier regions and the proposed cross channel prior. Then we define an unified optimization model incorporating these three constraints to remove signal dependent noise from color videos.
Experiments
In this section, we apply our algorithm to synthetic noise videos from three public video clips including two slow-motion sequences ‘Bus’ and ‘Mobile’, and a fast-motion video ‘Football’, as shown in Fig. 8. Noisy sequences are generated by adding signal dependent Poisson–Gaussian noise to the noise-free sequences with (except for the experiment studying influences from noise levels in Section 3.1). The number of frames for alignment is chosen according to specific video
Conclusions and discussions
This paper draws inspirations from the phenomenon that thin structures and cross channel consistency are crucial to videos’ perceived quality for human, and proposes an optimization framework incorporating temporal redundancy, cross channel consistency and constraint of signal dependent noise simultaneously.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Nos. 61120106003, 61305026 and 61327902).
References (57)
- et al.
Fuzzy directional (fd) filter for impulsive noise reduction in colour video sequences
J. Visual Commun. Image Represent.
(2012) - et al.
simplification and inpainting using discrete regularization on graphs
Signal Process.
(2010) Clipped noisy images: heteroskedastic modeling and practical denoising
Signal Process.
(2009)- et al.
Image sequence denoising via sparse and redundant representations
IEEE Trans. Image Process.
(2009) - et al.
Video denoising based on inter-frame statistical modeling of wavelet coefficients
IEEE Trans. Circ. Syst. Video Technol.
(2007) - et al.
Video denoising based on a spatiotemporal gaussian scale mixture model
IEEE Trans. Circ. Syst. Video Technol.
(2010) - et al.
Image and video denoising using adaptive dual-tree discrete wavelet packets
IEEE Trans. Circ. Syst. Video Technol.
(2009) - et al.
Temporal video denoising based on multihypothesis motion compensation
IEEE Trans. Circ. Syst. Video Technol.
(2007) - et al.
Combined wavelet-domain and motion-compensated video denoising based on video codec motion estimation methods
IEEE Trans. Circ. Syst. Video Technol.
(2009) - et al.
Sure-let for orthonormal wavelet-domain video denoising
IEEE Trans. Circ. Syst. Video Technol.
(2010)
Video denoising using motion compensated 3-d wavelet transform with integrated recursive temporal filtering
IEEE Trans. Circ. Syst. Video Technol.
Integration of recursive temporal lmmse denoising filter into video codec
IEEE Trans. Circ. Syst. Video Technol.
Color video denoising based on combined interframe and intercolor prediction
IEEE Trans. Circ. Syst. Video Technol.
Nonlocal image and movie denoising
Int. J. Comput. Vision
Image and video restorations via nonlocal kernel regression
IEEE Trans. Cybernet.
Nonlocal transform-domain filter for volumetric data denoising and reconstruction
IEEE Trans. Image Process.
Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms
IEEE Trans. Image Process.
Practical Poissonian–Gaussian noise modeling and fitting for single-image raw-data
IEEE Trans. Image Process.
Joint non-Gaussian denoising and superresolving of raw high frame rate videos
IEEE Trans. Image Process.
Robust video restoration by joint sparse and low rank matrix approximation
SIAM J. Imaging Sci.
Cited by (0)
- ☆
This paper has been recommended for acceptance by Yehoshua Zeevi.