Evaluation of objective quality measures for noise reduction in TV-systems

https://doi.org/10.1016/S0923-5965(03)00074-2Get rights and content

Abstract

In this paper, several state-of-the-art noise reduction algorithms are ranked using both subjective evaluation and 20 objective measures proposed in the literature. The goal of the study was to find the objective quality measure that best relates to the subjective assessment for noise reduction. A measurement set-up comprising a simulation of a TV-transmission channel has been used to provide a realistic assessment of the performance in TV-applications of algorithms reducing Gaussian white noise. Ranking results are given for the subjective evaluation and all objective measures. The correlation between the subjective evaluation results and the objective measures is found to be very low. Results show that even a combination of objective measures only approximates the subjective assessment of the quality of noise reduction algorithms to a limited extent.

Introduction

In every part of the video chain, from the source to the display, the video may be impaired by noise. Additive Gaussian white noise is the most important analogue noise type and it primarily enters the system during the analogue transmission phase from the broadcasting of the composite video signal to the reception of the signal at the user's premises. In the literature, many studies of noise reduction on images [12], [18] and, incidentally, on video [1], can be found. In these papers, many noise reduction methods are ranked using objective quality measures only and no limitation is put on algorithm complexity. In addition, no effort is taken to simulate the noise conditions that are important for consumer TV-applications. Drewery et al. [2], notably, proposed a subjective method for noise reduction performance evaluation. They used this method for the evaluation of a standard noise reduction algorithm suitable for consumer TV-applications. Subsequently, a limited comparison of the subjective results with the theoretically obtained amount of noise reduction was made. With this method it seems possible to estimate the amount of noise reduction in the case of this generic noise reduction algorithm applied to still images.

In our paper, a large number of state-of-the-art video noise reduction algorithms designed for consumer TV-applications (an application area posing severe demands on memory use and algorithm efficiency) are evaluated on their overall image quality using both a subjective ranking method and a large number of objective quality measures. The noise conditions of consumer TV-applications are carefully simulated, so that the evaluation of the performance of these algorithms can be done in a realistic way. The goal of the study was to find the objective quality measure that best relates to the subjective assessment for noise reduction.

Section snippets

Test procedure

The following noise reduction algorithms were included in the test: the spatial weighted aperture noise reduction algorithm (SWAN), proposed by Ojo et al. [13], the temporal dynamic noise reduction algorithm (DNR) [14], the edge preserving spatial noise reduction algorithm (EPRES) combined with the perception adaptive temporal noise reduction algorithm (PERCAT), proposed by Jostschulte et al. [9], and the fuzzy combined spatial and temporal noise reduction algorithm (FCST), proposed by Mancuso

Evaluation methods for noise reduction algorithms

Different methods to evaluate noise reduction algorithms can be distinguished: subjective, objective and hybrid evaluation methods. While subjective evaluation measures try to capture the human preference for a certain noise reduction algorithm through panel tests, objective and hybrid evaluation measures do this by using a formula describing luminance or signal characteristics. The difference between objective and hybrid evaluation measures is that hybrid evaluation measures try to incorporate

Subjective test set-up

The goal of the subjective tests was to evaluate both the quality performance and the noise removal capability of the noise reduction algorithms. The latter was included to detect the relation between the amount of noise that was removed and the resulting quality. Two subjective tests were necessary to limit the experimental time in each test to 40 min. In each subjective test, a set of algorithms (see Table 1) was evaluated by using a paired comparison method, i.e. each algorithm was compared

Ranking using subjective tests

The paired comparison data were translated into a z-scale ranking order of the algorithms using Thurstone's model [3]. Since it was found that the scores on quality and noise removal correlated highly, namely with correlation coefficient r=0.99, rankings are presented using the quality scores only. The overall quality results from the first test, combining the results for the two noise levels, are given in Fig. 3, for each of the three sequences separately. An analysis of variance (ANOVA) [10]

Ranking through objective measures

Twenty objective and hybrid quality measures have been used to rank the performance of the noise reduction algorithms. These quality measures have been selected, because they seemed the most promising for describing the quality of noise reduction performance. The objective measures used were: (P)SNR(I) [18], MD [5], [18], MSE [18], MB [5], [18], ST [5], [18], CPR [5], [18], FFNSA [12] and DPC [12]. The included hybrid measures were: weighted SNR and weighted PSNR using three different weighting

Correlation between subjective and objective ranking results

To determine which objective measure is best able to predict the subjective judgements on noise reduction algorithms, the correlation between the subjective quality scores and the values of the various objective quality measures have been calculated. As the subjective quality scores were highly correlated to the subjective noise scores, and hence, the same results would have been found when using the latter ones.

The correlation between the subjective quality and objective quality measures was

Conclusions

Several state-of-the-art noise reduction algorithms have been ranked on quality and on noise removal in a subjective evaluation and by using 20 objective quality measures. A measurement set-up comprising a simulation of a TV-transmission channel has been used to provide a realistic assessment of the performance of the noise reduction algorithms commonly applied in TV-applications for reducing Gaussian white noise. From the analysis of the objective quality measures and subjective quality

References (18)

  • J.C. Brailean, et al., Noise reduction filters for dynamic image sequences: a review, Proc. IEEE 83 (9) (September...
  • J.O. Drewery, et al., Video noise reduction, BBC Research Department Report BBC RD 1984/7, July...
  • P.G. Engeldrum, Psychometric Scaling: A Toolkit for Imaging Systems Development, Imcotek Press, Winchester, MA, 2000,...
  • T. Fujio, A universal noise weighting function and its application to high-definition television system design, NHK...
  • G. de Haan, Video Processing for multimedia systems, 2nd Edition, University Press Facilities, Eindhoven, 2001, pp....
  • G. de Haan, et al., Image data recursive noise filter with reduced temporal filtering of higher spatial frequencies,...
  • ITU, Methodology for the subjective assessment of the quality of television pictures, ITU-R Recommendation, BT.500-10,...
  • ITU, Specifications and alignment procedures for setting of brightness and contrast of displays, ITU-R Recommendation,...
  • K. Jostschulte, et al., Perception adaptive temporal TV-noise reduction using contour preserving prefilter techniques,...
There are more references available in the full text version of this article.

Cited by (0)

View full text