Skip to main content
Log in

Automatic prediction of perceptual quality of multimedia signals—a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We survey recent developments in multimedia signal quality assessment, including image, audio, video, and combined signals. Such an overview is timely given the recent explosion in all-digital sensory entertainment and communication devices pervading the consumer space. Owing to the sensory nature of these signals, perceptual models lie at the heart of multimedia signal quality assessment algorithms. We survey these models and recent competitive algorithms and discuss comparison studies that others have conducted. In this context we also describe existing signal quality assessment databases. We envision that the reader will gain a firmer understanding of the broad topic of multimedia quality assessment, of the various sub-disciplines corresponding to different signal types, how these signals types co-relate in producing an overall user experience, and what directions of research remain to be pursued.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Avcibas I, Sankur B, Sayood K (2002) Statistical evaluation of image quality measures. J Electron Imaging 11(2):206–223

    Article  Google Scholar 

  2. Barkowsky M, Bialkowski J, Bitto R, Kaup A (2007) Temporal registration using 3D phase correlation and a maximum likelihood approach in the perceptual evaluation of video quality. In: IEEE workshop on multimedia signal proc

  3. Beerends JG, Stemerdink JA (1992) A perceptual audio quality measure based on a psychoacoustic sound representation. J Audio Eng Soc 40(12):963–978

    Google Scholar 

  4. Born RT, Bradley DC (2005) Structure and function of visual area MT. Annu Rev Neurosci 28:157–189

    Article  Google Scholar 

  5. Brandenburg T, Sporer K (1992) NMR and masking flag: evaluation of quality using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurement

  6. Carnec M, Le Callet P, Barba D (2008) Objective quality assessment of color images based on a generic perceptual reduced reference. Signal Process Image Commun 23(4):239–256

    Article  Google Scholar 

  7. Chandler DM, Hemami SS (2007) VSNR: a wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans Image Process 16(9):2284–2298

    Article  MathSciNet  Google Scholar 

  8. Channappayya SS, Bovik AC, Caramanis C, Heath RW Jr (2008) Design of linear equalizers optimized for the structural similarity index. IEEE Trans Image Process 17(6):857–872

    Article  MathSciNet  Google Scholar 

  9. Channappayya SS, Bovik AC, Heath RW Jr (2008) Rate bounds on SSIM index of quantized images. IEEE Trans Image Process 17(9):1624–1639

    Article  MathSciNet  Google Scholar 

  10. Colomes C, Lever M, Rault J-B, Dehery Y-F, Faucon G (1995) A perceptual model applied to audio bit-rate reduction. J Audio Eng Soc 43(4):233–240

    Google Scholar 

  11. Creusere C (2003) Quantifying perceptual distortion in scalably compressed mpeg audio. In: Conference record of the thirty-seventh asilomar conference on signals, systems and computers, vol 1, pp 265–269

  12. Creusere C, Hardin J (2010) Assessing the quality of audio containing temporally varying distortions. IEEE Trans Speech Audio Lang Process PP(99):1–1

    Google Scholar 

  13. Daly S (1993) The visible difference predictor: An algorithm for the assessment of image fidelity. In: Watson AB (ed) Digital images and human vision. The MIT, pp 176–206

  14. Damera-Venkata N, Kite T, Geisler W, Evans B, Bovik A (2000) Image quality assessment based on a degradation model. IEEE Trans Image Process 9(4):636–650

    Article  Google Scholar 

  15. Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math 41(7):909–996

    Article  MATH  MathSciNet  Google Scholar 

  16. Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Opt Soc Am A (Opt Image Sci) 2(7):1160–1169

    Article  Google Scholar 

  17. De Simone F, Naccari M, Tagliasacchi M, Dufaux F, Tubaro S, Ebrahimi T (2009) Subjective assessment of H.264/AVC video sequences transmitted over a noisy channel. In: International workshop on quality of multimedia experience, pp 204–209

  18. Dehaene S (2003) The neural basis of the weber-fechner law: a logarithmic mental number line. Trends Cogn Sci 7(4):145–147

    Article  MathSciNet  Google Scholar 

  19. Dixon NF, Spitz L (1980) The detection of auditory visual desynchrony. Perception 9(6):719–721

    Article  Google Scholar 

  20. Final report from the video quality experts group on the validation of objective quality metrics for video quality assessment (2000) Available online: http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseI/COM-80E_final_report.pdf. Accessed June 2000

  21. Fleet DJ, Jepson AD (1990) Computation of component image velocity from local phase information. Int J Comput Vis 5(1):77–104

    Article  Google Scholar 

  22. Foley J (1994) Human luminance pattern-vision mechanisms: masking experiments require a new model. J Opt Soc Am A (Opt Image Sci) 11(6):1710–1719

    Article  Google Scholar 

  23. Fredericksen RE, Hess RF (1997) Temporal detection in human vision: dependence on stimulus energy. J Opt Soc Am A (Opt Image Sci Vis) 14(10):2557–2569

    Article  Google Scholar 

  24. George S, Zielinski S, Rumsey F (2006) Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans Speech Audio Lang Process 14(6):1994–2005

    Article  Google Scholar 

  25. Hands DS (2004) A basic multimedia quality model. IEEE Trans Multimedia 6(6):806–816

    Article  Google Scholar 

  26. Hekstra AP, Beerends JG, Ledermann D, de Caluwe FE, Kohler S, Koenen RH, Rihs S, Ehrsam M, Schlauss D (2002) PVQM—A perceptual video quality measure. Signal Process Image Commun 17:781–798

    Article  Google Scholar 

  27. Herre J, Eberlein E, Schott H, Schmidmer C (1992) Analysis tool for realtime measurements using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurement

  28. Hewage CTER, Worrall ST, Dogan S, Kondoz AM (2008) Prediction of stereoscopic video quality using objective quality models of 2-d video. Electron Lett 44(16):963–965

    Article  Google Scholar 

  29. Huber R, Kollmeier B (2006) PEMO-Q—A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans Speech Audio Lang Process 14(6):1902–1911

    Article  Google Scholar 

  30. Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203

    Article  Google Scholar 

  31. ITU-R Recommendation BT.500-11 (2000) Methodology for the subjective assessment of the quality of television pictures. International Telecommunications Union, Tech Rep

  32. ITU-T Recommendation P.800 (1996) Methods for subjective determination of transmission quality. International Telecommunications Union, Tech Rep

  33. Kandadai S, Hardin J, Creusere C (2008) Audio quality assessment using the mean structural similarity measure. In: IEEE international conference on acoustics, speech and signal processing, pp 221–224

  34. Karjalainen M (1985) A new auditory model for the evaluation of sound quality of audio systems. In: IEEE international conference on acoustics, speech, and signal processing, vol 10, pp 608–611

  35. Kelly DH (1984) Retinal inhomogeneity. i. spatiotemporal contrast sensitivity. J Opt Soc Am A 1(1):107–113

    Article  Google Scholar 

  36. Lambrecht CJvdB, Kunt M (1998) Characterization of human visual sensitivity for video imaging applications. Signal Process 67(3):255–269

    Article  MATH  Google Scholar 

  37. Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817

    Article  Google Scholar 

  38. Legge GE, Foley JM (1980) Contrast masking in human vision. J Opt Soc Am 70(12):1458–1471

    Article  Google Scholar 

  39. Lubin J (1993) The use of psychophysical data and models in the analysis of display system performance. In: Watson AB (ed) Digital images and human vision. The MIT, pp 163–178

  40. Malkowski M, Claben D (2008) Performance of video telephony services in UMTS using live measurements and network emulation. Wirel Pers Commun 1:19–32

    Article  Google Scholar 

  41. Mannos J, Sakrison D (1974) The effects of a visual fidelity criterion of the encoding of images. IEEE Trans Inf Theory 20(4):525–536

    Article  MATH  Google Scholar 

  42. Masry M, Hemami SS, Sermadevi Y (2006) A scalable wavelet-based video distortion metric and applications. IEEE Trans Circuits Syst Video Technol 16(2):260–273

    Article  Google Scholar 

  43. Mehrgardt S, Mellert V (1977) Transformation characteristics of the external human ear. J Acoust Soc Am 61(6):1567–1576

    Article  Google Scholar 

  44. Method for objective measurements of perceived audio quality. ITU Std. BS. 1387, 1999

  45. Moorthy A, Seshadrinathan K, Soundararajan R, Bovik AC (2010) Wireless video quality assessment: a study of subjective scores and objective algorithms. IEEE Trans Circuits Syst Video Technol 20(4):587–599

    Article  Google Scholar 

  46. Movshon JA, Newsome WT (1996) Visual response properties of striate cortical neurons projecting to Area MT in macaque monkeys. J Neurosci 16(23):7733–7741

    Google Scholar 

  47. Nachmias J, Sansbury RV (1974) Grating contrast: discrimination may be better than detection. Vis Res 14(10):1039–1042

    Article  Google Scholar 

  48. Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference (2004) International Telecommunications Union Std. ITU-T Rec J 144

  49. Paillard B, Mabilleau P, Morissette S, Soumagne J (1992) PERCEVAL: Perceptual evaluation of the quality of audio signals. J Audio Eng Soc 40(1/2):21–31

    Google Scholar 

  50. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunications Union Std., p 862, 2001

  51. Pinson MH, Wolf S (2004) A new standardized method for objectively measuring video quality. IEEE Trans Broadcast 50(3):312–322

    Article  Google Scholar 

  52. Ponomarenko N, Lukin V, Zelensky A, Egiazarian K, Carli M, Battisti F (2009) TID2008—a database for evaluation of full-reference visual quality assessment metrics. Adv Modern Radio-Electronics 10:30–45

    Google Scholar 

  53. Rajashekar U, van der Linde I, Bovik AC, Cormack LK (2008) GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process 17(4):564–573

    Article  MathSciNet  Google Scholar 

  54. Rihs S (1995) The influence of audio on perceived picture quality and subjective audio-video delay tolerance. RACE MOSAIC deliverable R211 180CESR007.B1, Tech. Rep

  55. Rix AW, Beerends JG, Kim D-S, Kroon P, Ghitza O (2006) Objective assessment of speech and audio quality—technology and applications. IEEE Trans Speech Audio Lang Process 14(6):1890–1901

    Article  Google Scholar 

  56. Rix AW, Hollier MP, Hekstra AP, Beerends JG (2002) Perceptual evaluation of speech quality (PESQ): the new ITU standard for end-to-end speech quality assessment part I–time-delay compensation. J Audio Eng Soc 50(10):755–764

    Google Scholar 

  57. Robson JG (1966) Spatial and temporal contrast-sensitivity functions of the visual system. J Opt Soc Am 56(8):1141–1142

    Article  Google Scholar 

  58. Ross J, Speed HD (1991) Contrast adaptation and contrast masking in human vision. Proc Biol Sci 246(1315):61–70

    Article  Google Scholar 

  59. Schober HAW, Hilz R (1965) Contrast sensitivity of the human eye for square-wave gratings. J Opt Soc Am 55(9):1086–1090

    Article  Google Scholar 

  60. Schroeder MR, Atal BS, Hall JL (1978) Optimizing digital speech coders by exploiting masking properties of the human ear. J Acoust Soc Am 64(S1):S139–S139

    Article  Google Scholar 

  61. Seshadrinathan K, Bovik AC (2007) A structural similarity metric for video based on motion models. In: IEEE intl. conf. on acoustics, speech, and signal proc

  62. Seshadrinathan K, Bovik AC (2008) Unifying analysis of full reference image quality assessment. In: IEEE intl. conf. on image proc. San Diego, CA, pp 1200–1203

  63. Seshadrinathan K, Bovik AC (2009) Video quality assessment. In: Bovik AC (ed) The essential guide to video processing, chapter 14. Academic, pp 417–436

  64. Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350

    Article  Google Scholar 

  65. Seshadrinathan K, Safranek RJ, Chen J, Pappas TN, Sheikh HR, Simoncelli EP, Wang Z, Bovik AC (2009) Image quality assessment. In: Bovik AC (ed) The essential guide to image processing, chapter 21. Academic, pp 553–596

  66. Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441

    Article  Google Scholar 

  67. Sheikh HR, Bovik AC (2006) An evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451

    Article  Google Scholar 

  68. Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444

    Article  Google Scholar 

  69. Simoncelli EP, Heeger DJ (1998) A model of neuronal responses in visual area MT. Vis Res 38(5):743–761

    Article  Google Scholar 

  70. Sporer T (1997) Objective audio signal evaluation-applied psychoacoustics for modeling the perceived quality of digital audio. In: Audio engineering society convention 103

  71. Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Sel Areas Commun 14(1):61–72

    Article  Google Scholar 

  72. Terhardt E (1979) Calculating virtual pitch. Hear Res 1(2):155–182

    Article  Google Scholar 

  73. Teo PC, Heeger DJ (1994) Perceptual image distortion. In: Proceedings of the IEEE international conference on image processing, vol 2. IEEE, pp 982–986

  74. The Video Quality Experts Group (2003) Final VQEG report on the validation of objective models of video quality assessment. Available online: http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseII. Accessed 25 August 2003

  75. Thiede E, Kabot T (1996) A new perceptual quality measure for bit-rate reduced audio. In: Audio engineering society convention 100

  76. Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C (2000) PEAQ—the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 48(1/2):3–29

    Google Scholar 

  77. Toet A, Lucassen MP (2003) A new universal colour image fidelity metric. Displays 24(4–5):197–207

    Article  Google Scholar 

  78. van den Branden Lambrecht CJ, Verscheure O (1996) Perceptual quality measure using a spatiotemporal model of the human visual system. In: Proc. SPIE, vol 2668, no. 1. SPIE, San Jose, pp 450–461

    Chapter  Google Scholar 

  79. Van der Weken D, Nachtegael M, Kerre EE (2004) Using similarity measures and homogeneity for the comparison of images. Image Vis Comput 22(9):695–702

    Article  Google Scholar 

  80. van Dijk AM, Martens J-B, Watson AB (1995) Quality asessment of coded images using numerical category scaling. In: Proc. SPIE—advanced image and video communications and storage technologies

  81. van Nes FL, Bouman MA (1967) Spatial modulation transfer in the human eye. J Opt Soc Am 57(3):401–406

    Article  Google Scholar 

  82. Wandell BA (1995) Foundations of vision. Sinauer Associates Inc., Sunderland

    Google Scholar 

  83. Wang S, Sekey A, Gersho A (1992) An objective measure for predicting subjective quality of speech coders. IEEE J Sel Areas Commun 10(5):819–829

    Article  Google Scholar 

  84. Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84

    Article  Google Scholar 

  85. Wang Z, Bovik AC (2006) Modern image quality assessment. Morgan and Claypool Publishing Co., New York

    Google Scholar 

  86. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  87. Wang Z, Li Q (2007) Video quality assessment using a statistical model of human visual speed perception. J Opt Soc Am A Opt Image Sci Vis 24(12):B61–B69

    Article  Google Scholar 

  88. Wang Z, Lu L, Bovik AC (2004) Video quality assessment based on structural distortion measurement. Signal Process Image Commun 19(2):121–132

    Article  Google Scholar 

  89. Wang Z, Simoncelli E, Bovik A, Matthews M (2003) Multiscale structural similarity for image quality assessment. In: IEEE asilomar conference on signals, systems and computers, pp 1398–1402

  90. Wang Z, Simoncelli EP (2005) Translation insensitive image similarity in complex wavelet domain. In: IEEE international conference on acoustics, speech, and signal processing, pp 573–576

  91. Watson AB (1987) The cortex transform: rapid computation of simulated neural images. Comput Vis Graph Image Process 39(3):311–327

    Article  Google Scholar 

  92. Watson AB (ed) (1993) Digital images and human vision. The MIT

  93. Watson AB, Hu J, McGowan JF III (2001) Digital video quality metric based on human vision. J Electron Imaging 10(1):20–29

    Article  Google Scholar 

  94. Winkler S (1999) Perceptual distortion metric for digital color video. In: Proc. SPIE human vision and electronic imaging, vol 3644, no 1. San Jose, CA, pp 175–184

  95. Winkler S (2005) Digital video quality. Wiley, New York

    Google Scholar 

  96. Zielinski SK, Rumsey F, Kassier R, Bech S (2005) Development and initial validation of a multichannel audio quality expert system. J Audio Eng Soc 53(1/2):4–21

    Google Scholar 

  97. Zwicker E (1961) Subdivision of the audible frequency range into critical bands (frequenzgruppen). J Acoust Soc Am 33(2):248–248

    Article  Google Scholar 

  98. Zwicker E, Scharf B (1965) A model of loudness summation. Psychol Rev 72(1):3–26

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kalpana Seshadrinathan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seshadrinathan, K., Bovik, A.C. Automatic prediction of perceptual quality of multimedia signals—a survey. Multimed Tools Appl 51, 163–186 (2011). https://doi.org/10.1007/s11042-010-0625-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0625-9

Keywords

Navigation