A new perceptual quality metric for compressed video based on mean squared error

https://doi.org/10.1016/j.image.2010.07.002Get rights and content

Abstract

A full reference objective quality metric that predicts perceived quality of compressed video is proposed. The metric estimates the mean opinion score (MOS) based on spatio-temporal activity and the mean squared error (MSE) between original and compressed video sequences. The proposed metric has been tested on multimedia sequences of common intermediate format (CIF) resolution compressed at a wide range of bit rates using the H264/AVC coding standard. This metric shows a higher correlation with subjective quality ratings (MOS) than popular metrics such as peak signal to noise ratio (PSNR), video structural similarity metric (VSSIM), national telecommunication and information administration’s video quality metric (NTIA/ITS VQM) and PSNRplus. This algorithm is particularly useful for real-time quality estimation of multimedia sequences compressed using block-based video coding schemes and all the parameters of the metric can be calculated automatically with a modest amount of processing overhead.

Introduction

Digital video compression algorithms use quality metrics to make optimum compression decisions and to measure the picture quality of compressed video sequences. Since the human observer is the end user of most video compression applications, a strong correlation of these quality metrics with perceived quality is important. A popular perceptual video quality metric is the subjective measurement of mean opinion score (MOS) [1]. MOS is considered to be an accurate way to determine the visual quality of video [2], [3]. However, it is expensive in terms of time and resources, and cannot be easily embedded into practical real-time video applications. Several objective video quality measures have been developed to predict subjective test results. These objective measures may be full-reference, reduced-reference or no-reference depending on the availability of an original reference signal for evaluation. Full reference algorithms make a prediction by comparing degraded video with the original video sequence. Mean squared error (MSE) and peak signal to noise ratio (PSNR) between the reference and distorted video data are simplistic but widely used full reference quality measures in video compression algorithms [4]. MSE is simple and fast to calculate, and is mathematically convenient to use. However, MSE is not representative of the distortions perceived by the human visual system (HVS) and therefore is not an accurate measure of perceived quality for compressed video [5], [6]. Hence several perceptual-based distortion measures have been proposed and analysed as alternatives to MSE and PSNR. These measures focus on modelling the known psycho-visual properties of the human visual system. To date, attempts to use these HVS-based metrics to measure real-time video quality have been limited by their accuracy and computational complexity.

In 2000 and 2003, the video quality experts group (VQEG) conducted independent tests to evaluate the performance of several objective video quality metrics including the National telecommunication and information administration’s video quality metric (NTIA/ITS VQM) [7] in the context of digital television broadcasting. Two large subjective tests were set up to compare the performance of these algorithms. These include the phase I and phase II tests on full reference television (FR-TV) video sequences [8], [9]. Based on these studies, the International Telecommunication Union (ITU) has standardized the recommendations ITU-T J.144 [10] and ITU-R BT.1683 [11] for estimating the perceptual video quality in digital television video sequences when the original video sequence is available (full reference models).

The VSSIM metric [12] gives a measure of similarity between the reference and processed video sequences. The luminance, contrast and structural components between the two sequences are measured and subjected to comparison functions at block-, frame- and sequence- levels before being pooled into an overall similarity measure. The metric has been demonstrated in [12] to perform better than the metrics reported in the VQEG phase I test on full reference television (FR-TV) video [8].

The NTIA/ITS VQM [7] algorithm has been standardised in Recommendation ITU-T J.144. This metric uses five video quality models to extract different parameters from both the original and compressed video sequences. Each model has been optimised for a specific application. We use the video conferencing model for comparison with the proposed metric because the parameters of this model are optimised using video quality and bit rate information from multimedia sequences. The metric divides both the reference and processed video sequences into spatio-temporal regions, i.e. regions of pixels that are spatially and temporally adjacent to each other. Various features are extracted from these spatio-temporal regions before computing the parameters required to make an estimation of the video quality.

Mean squared error (MSE) is a popular objective measure used in modern block-based video compression algorithms such as H264/AVC [13]. It is employed by the rate–distortion optimised mode selection process as a quality measure for choosing the best compression option that gives an optimal trade-off between picture quality and data rate [4]. The mode selection process involves minimising the rate–distortion cost J=D+λR where λ is the Lagrange multiplier, R is the rate and D is the MSE between original and reconstructed video data. While the general approach is to use MSE to choose the best coding option, MSE is a mathematical error measure which does not consider the human visual system and is therefore not an accurate measure of perceived quality for compressed video sequences. It may be possible to improve the subjective quality performance of a rate-constrained video codec by replacing MSE with a distortion metric that correlates more closely with subjective quality in the mode selection process.

Previous work has found that although the overall correlation between MSE and MOS is poor [1], there is a higher correlation between these parameters for a single sequence coded at several bit rates with the same codec [14]. This correlation decreases with the number of different video sequences added to the test data set. The authors of [14] have developed a method (PSNRplus) for increasing the correlation between subjective and predicted video quality by estimating the parameters of the linear regression line for each video sequence. The regression parameters are determined using two additional instances of the original video. Although this method produces improved results compared to previous methods in the literature, it requires every sequence to be coded three times in order to obtain the two additional instances hence making this technique unsuitable for real time applications.

The aim of this work is to develop a new perceptual video quality metric that can automatically predict the subjective quality of compressed video in real time, correlates well with mean opinion score (MOS) and can be easily incorporated into standard video compression algorithms in order to make coding decisions based on visual distortion rather than poorly correlated objective metrics. This paper is organised as follows: Section 2 describes the subjective evaluation process performed to obtain MOS and MSE values of various sequences used to develop the new metric. In Section 3, the proposed perceptual metric is described. Performance of the new metric is evaluated in Section 4. Section 5 contains conclusions and future work.

Section snippets

Video quality evaluation: MOS versus MSE

To investigate the correlation between MSE and subjective quality, we determine the variation of MOS with MSE across various video data using a training data set of eight different video sequences. Sequences of common intermediate format (CIF) resolution are used in building the proposed quality metric because it is a popular format in multimedia applications and is widely used in video quality evaluations [15], [16]. The sequences were 10 s in duration and coded using the H264/AVC compression

Proposed perceptual quality metric

The aim of the proposed metric is to (a) predict perceived video quality automatically, (b) be in close agreement with MOS and (c) maintain computational simplicity, with a view to incorporating the metric into mode-selection algorithms. Based on the relationship between MOS and MSE, as shown in Fig. 1, we propose the perceptual metric asMOSp=1ks(MSE)

This metric produces a predicted mean opinion score (MOSp) of a compressed sequence using the mean squared error (MSE) between the original and

Results

Performance of a perceptual quality metric depends on how well it correlates with subjective test results. Following the performance evaluation methods adopted by the video quality experts group (VQEG) [8], [9], we use three evaluation metrics to give quantitative measures of the performance of the proposed metric. The first metric is Pearson’s correlation coefficient which measures the accuracy of the new metric to predict the subjective results. For a set of N data pairs (xi, yi), Pearson’s

Conclusion and future work

A new full reference objective quality metric (MOSp) that estimates the mean opinion score (MOS) based on spatio-temporal activity and the mean squared error (MSE) between original and compressed video sequences has been proposed. Experimental results show that the new metric correlates well with subjective scores for a variety of multimedia sequences compressed at a wide range of bit rates. It outperforms popular video quality metrics such as PSNR, VSSIM, NTIA/ITS VQM (video conferencing

References (21)

  • Z. Wang et al.

    Video quality assessment based on structural distortion measurement

    IEEE Signal Proc. Image Commun.

    (February 2004)
  • Stefan Winkler

    Digital Video Quality: Vision Models and Metrics

    (2005)
  • ITU-R BT.500-11, Methodology for the subjective assessment of quality of television pictures, ITU-R Recommendations,...
  • ITU-T P.910, Subjective video quality assessment methods for multimedia applications, ITU-R Std., September...
  • A. Ortega et al.

    Rate–distortion methods for image and video compression

    IEEE Signal Proc.

    (1998)
  • B. Girod
  • S. Winkler

    A perceptual distortion metric for digital colour video

    Proc. SPIE

    (1999)
  • Stephen Wolf, Margaret Pinson, Video quality measurement techniques, NTIA Report 02-392, June...
  • Video Quality Experts Group, Final Report from the VQEG on the validation of Objective Models of Video Quality...
  • Video Quality Experts Group, Final Report from the VQEG on the validation of Objective Models of Video Quality...
There are more references available in the full text version of this article.

Cited by (4)

  • Color difference weighted adaptive residual preprocessing using perceptual modeling for video compression

    2015, Signal Processing: Image Communication
    Citation Excerpt :

    In addition to the psychometric analysis, there are a number of visual quality metrics that have been developed to enable the assessment of video quality degradation. Such algorithms include the Structural Similarity Image Measure (SSIM) [59], a version of SSIM enhanced for video analysis (VSSIM) [60] and a perceptual metric (MOSp) based on an approximation the mean-opinion-score (MOS) from mean square error [61]. Two of the compressed video sequences were evaluated using the video quality metrics and are shown in Tables 4 and 5.

  • Radiometric quality assessment of video sequences acquired from UAV photogrammetric systems

    2017, 10th International Conference on Environmental Engineering, ICEE 2017
  • Review of Standard Traditional Distortion Metrics and a need for Perceptual Distortion Metric at a (Sub) Macroblock Level

    2013, IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB
View full text