Skip to main content
Log in

Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Mean opinion score (MOS) has become a very popular indicator of perceived media quality. While there is a clear benefit to such a “reference quality indicator” and its widespread acceptance, MOS is often applied without sufficient consideration of its scope or limitations. In this paper, we critically examine MOS and the various ways it is being used today. We highlight common issues with both subjective and objective MOS and discuss a variety of alternative approaches that have been proposed for media quality measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. ITU-T Rec. P.10: Vocabulary for performance and quality of service (2006)

  2. Coren, S., Ward, L.M., Enns, J.T.: Sensation and Perception, 6th edn. Wiley, New York (2003)

    Google Scholar 

  3. Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, London (1966)

    Google Scholar 

  4. Lewis, J.R.: Psychometric properties of the mean opinion scale. In: Proceedings of HCI International, vol. 1, pp. 149–153. New Orleans (2001)

  5. ITU-R Rec. BT.500-13: Methodology for the subjective assessment of the quality of television pictures (2012)

  6. Watson, A., Sasse, A.: Measuring perceived quality of speech and video in multimedia conferencing applications. In: Proceedings of ACM Multimedia, Bristol (1998)

  7. ITU-T Rec. P.920: Interactive test methods for audiovisual communications (2000)

  8. Sefiridis, V., Ghanbari, M., Pearson, D.E.: Forgiveness effect in subjective assessment of packet video. Electron. Lett. 28(1), 2013–2014 (1992)

    Article  Google Scholar 

  9. Aldridge, R., Davidoff, J., Ghanbari, M., Hands, D., Pearson, D.: Measurement of scene-dependent quality variations in digitally coded television pictures. IEEE Proc. Vis. Signal Image Proc. 142, 149–154 (1995)

    Article  Google Scholar 

  10. ITU-T Rec. P.911: Subjective audiovisual quality assessment methods for multimedia applications (1998)

  11. Araujo, P., Frøyland, L.: Statistical approach to the rational selection of experimental subjects. Accred. Qual. Assur. 10(5), 185–189 (2004)

    Article  Google Scholar 

  12. Jumisko-Pyykkö, S., Häkkinen, J.: Profiles of the evaluators—impact of psychographic variables on the consumer-oriented quality assessment of mobile television. In: Proceedings of SPIE Multimedia on Mobile Devices, vol. 6821. San Jose (2008)

  13. Speranza, F., Poulin, F., Renaud, R., Caron, M., Dupras, J.: Objective and subjective quality assessment with expert and non-expert viewers. In: Proceedings of QoMEX, Trondheim, Norway (2010)

  14. Köster, O., Jessen, M., Khairi, F., Eckert, H.: Auditory-perceptual identification of voice quality by expert and non-expert listeners. In: Proceedings of International Congress of Phonetic Sciences, Saarbrücken, Germany (2007)

  15. Choi, H., Jeong, T., Lee, C.: Subjective video quality comparison using various displays. Opt. Eng. 48(4), 037002 (2009)

    Article  Google Scholar 

  16. Sullivan, M., Pratt, J., Kortum, P.: Practical issues in subjective video quality evaluation: Human factors vs. psychophysical image quality evaluation. In: Proceedings of uxTV, Silicon Valley (2008)

  17. Staelens, N., et al.: Assessing quality of experience of IPTV and video on demand services in real-life environments. IEEE Trans. Broadcast. 56(4), 458–466 (2010)

    Article  Google Scholar 

  18. ITU-T Rec. P.910: Subjective video quality assessment methods for multimedia applications (2008)

  19. ITU-T Rec. P.800: Methods for subjective determination of transmission quality (1996)

  20. ITU-R Rec. BS.1284: General methods for the subjective assessment of sound quality (2003)

  21. Corriveau, P.: Video quality testing. In: Wu, R., Rao, K.R. (eds.) Digital Video Image Quality and Perceptual Coding chapter 5. CRC Press, Boca Raton (2006)

    Google Scholar 

  22. Guilford, J.P.: Psychometric Methods, 2nd edn. McGraw-Hill, New York (1954)

    Google Scholar 

  23. Tominaga, T., Hayashi, T., Okamoto, J., Takahashi, A.: Performance comparisons of subjective quality assessment methods for mobile video. In: Proceedings of QoMEX, Trondheim, Norway (2010)

  24. Rouse, D.M., Pepion, R., Le Callet, P., Hemami, S.S.: Tradeoffs in subjective testing methods for image and video quality assessment. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol. 7527. San Jose (2010)

  25. Pinson, M., Wolf, S.: Comparing subjective video quality testing methodologies. In: Proceedings of SPIE VCIP, vol. 5150. Lugano (2003)

  26. ITU-R Report BT.1082-1: Studies toward the unification of picture assessment methodology (1990)

  27. Mullin, J., Smallwood, L., Watson, A., Wilson, G.M.: New techniques for assessing audio and video quality in real-time interactive communication. In: Proceedings of IHM-HCI, Lille. (2001)

  28. Winkler, S.: On the properties of subjective ratings in video quality experiments. In: Proceedings of QoMEX, San Diego (2009)

  29. Huynh-Thu, Q., Garcia, M.N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57(1), 1–14 (2011)

    Article  Google Scholar 

  30. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97 (1956)

    Article  Google Scholar 

  31. VQEG: Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment (2000)

  32. Winkler, S.: Digital Video Quality: Vision Models and Metrics. Wiley, Chichester (2005)

    Book  Google Scholar 

  33. Xu, J., Xing, L., Perkis, A., Jiang, Y.: On the properties of mean opinion scores for quality of experience management. In: Proceedings of IEEE International Symposium on Multimedia, Dana Point (2011)

  34. ITU-T Rec. G.107: The E-model: a computational model for use in transmission planning (2011)

  35. ITU-T Rec. G.1070: Opinion model for video-telephony applications (2007)

  36. Brunnström, K., Hands, D., Speranza, F., Webster, A.: VQEG validation and ITU standardization of objective perceptual video quality. IEEE Signal Process. Mag. 26(3), 96–101 (2009)

    Article  Google Scholar 

  37. Lin, W., Jay Kuo, C.C.: Perceptual visual quality metrics: a survey. J. Vis. Commun. Image Represent. 22(4), 297–312 (2011)

    Article  Google Scholar 

  38. Chikkerur, S., Sundaram, V., Reisslein, M., Karam, L.J.: Objective video quality assessment methods: a classification, review, and performance comparison. IEEE Trans. Broadcast. 57(2), 165–182 (2011)

    Article  Google Scholar 

  39. You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkis, A.: Perceptual-based quality assessment for audio–visual services: a survey. Sig. Process. Image Commun. 25(7), 482–501 (2010)

    Article  Google Scholar 

  40. ITU-T Rec. P.1401: Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models (2012)

  41. ITU-T Rec. P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)

  42. Mittal, A., Muralidhar, G.S., Ghosh, J., Bovik, A.C.: Blind image quality assessment without human training using latent quality factors. IEEE Signal Process. Lett. 19(2), 75–78 (2012)

    Article  Google Scholar 

  43. Xue, W., Zhang, L., Mou, X.: Learning without human scores for blind image quality assessment. In: Proceedings of CVPR, Portland (2013)

  44. Vranješ, M., Rimac-Drlje, S., Grgic, K.: Review of objective video quality metrics and performance comparison using different databases. Signal Process. Image Commun. 28(1), 1–19 (2013)

    Article  Google Scholar 

  45. Streijl, R., Winkler, S., Hands, D.: Perceptual quality measurement—towards a more efficient process for validating objective models. IEEE Signal Process. Mag. 27(4), 136–140 (2010)

    Article  Google Scholar 

  46. Sen, D.: Determining the dimensions of speech quality from PCA and MDS analysis of the diagnostic acceptability measure. In: Proceedings of MESAQIN, Prague (2011)

  47. Janowski, L., Papir, Z.: Modeling subjective tests of quality of experience with a generalized linear model. In: Proceedings of QoMEX, San Diego (2009)

  48. Carroll, R.J., Wu, C.F.J., Ruppert, D.: The effect of estimating weights in weighted least squares. J. Am. Stat. Assoc. 83(404), 1045–1054 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  49. Nachlieli, H., Shaked, D.: Measuring the quality of quality measures. IEEE Trans. Image Process. 20(1), 76–87 (2011)

    Article  MathSciNet  Google Scholar 

  50. Wu, O., Hu, W., Gao, J.: Learning to predict the perceived visual quality of photos. In: Proceedings of ICCV, Barcelona (2011)

  51. Brooks, A.C., Zhao, X., Pappas, T.N.: Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. IEEE Trans. Image Process. 17, 1261–1273 (2008)

    Article  MathSciNet  Google Scholar 

  52. Wang, Z., Simoncelli, E.P.: Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. J. Vis. 8(12), 1–13 (2008)

    Article  MATH  Google Scholar 

  53. Ciaramello, F.M., Reibman, A.R.: Systematic stress testing of image quality estimators. In: Proceedings of ICIP, Brussels (2011)

  54. Reibman, A.R.: A strategy to jointly test image quality estimators subjectively. In: Proceedings of ICIP, Orlando (2012)

  55. ATIS-0800041: Implementer’s guide to QoS metrics (2010)

  56. ATIS-0800008: QoS metrics for linear IPTV. Version 2 (2011)

  57. ITU-T Rec. P.800.2: Mean opinion score (MOS) interpretation and reporting (2013)

  58. de Koning, T.C.M., Veldhoven, P., Knoche, H., Kooij, R.E.: Of MOS and men: Bridging the gap between objective and subjective quality measurements in mobile TV. In: Proceedings of SPIE Multimedia on Mobile Devices, vol. 6507. San Jose (2007)

  59. Jumisko-Pyykkö, S., Malamal Vadakital, V.K., Hannuksela, M.M.: Acceptance threshold: a bidimensional research method for user-oriented quality evaluation studies. Int. J. Digit. Multimed. Broadcast. 2008, 712380 (2008). doi:10.1155/2008/712380

  60. Watson, A.B., Kreslake, L.: Measurement of visual impairment scales for digital video. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol. 4299. San Jose (2001)

  61. Jia, Y., Lin, W., Kassim, A.A.: Estimating just-noticeable distortion for video. IEEE Trans. Circuits Syst. Video Technol. 16(7), 820–829 (2006)

    Article  Google Scholar 

  62. Maloney, L.T., Yang, J.N.: Maximum likelihood difference scaling. J. Vis. 3(8), 573–585 (2003)

    Article  Google Scholar 

  63. Virtanen, M.T., Gleiss, N., Goldstein, M.: On the use of evaluative category scales in telecommunications. In: Proceedings of Human Factors in Telecommunications, pp. 253–260 (1995)

  64. Preminger, J.E., Van Tassell, D.J.: Quantifying the relationship between speech quality and speech intelligibility. J. Speech Hear. Res. 38, 714–725 (1995)

    Article  Google Scholar 

  65. Voiers, W.D.: Diagnostic acceptability measure for speech communication systems. In: Proceedings of ICASSP, Hartford (1977)

  66. Martens, J.B.: Multidimensional modeling of image quality. Proc. IEEE 90(1), 133–153 (2002)

    Article  MathSciNet  Google Scholar 

  67. Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005)

    Google Scholar 

  68. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press, Urbana (1957)

    Google Scholar 

  69. Hinterleitner, F., Norrenbrock, C.R., Möller, S.: Is intelligibility still the main problem? A review of perceptual quality dimensions of synthetic speech. In: Proceedings of ISCA Speech Synthesis Workshop, Barcelona (2013)

  70. Wältermann, M.: Dimension-based Quality Modeling of Transmitted Speech. Springer, Berlin (2013)

    Book  Google Scholar 

  71. Ghinea, G., Thomas, J.: Quality of perception: user quality of service in multimedia presentations. IEEE Trans. Multimed. 7(4), 786–789 (2005)

    Article  Google Scholar 

  72. ANSI T1.801.02: Digital transport of video teleconferencing/video telephony signals—performance terms, definitions, and examples (1996)

  73. Radun, J., et al.: Content and quality: interpretation-based estimation of image quality. ACM Trans. Appl. Percept. 4(4), 21 (2008)

    Article  Google Scholar 

  74. Strohmeier, D., Jumisko-Pyykkö, S., Kunze, K.: Open profiling of quality: a mixed method approach to understanding multimodal quality perception. Adv. Multimed. 2010, 658980 (2010)

    Article  Google Scholar 

  75. Egger, S., Ries, M., Reichl, P.: Quality-of-experience beyond MOS: experiences with a holistic user test methodology for interactive video services. In: Proceedings of 21st ITC Specialist Seminar, Miyazaki (2010)

  76. Yuen, M., Wu, H.R.: A survey of hybrid MC/DPCM/DCT video coding distortions. Signal Process. 70(3), 247–278 (1998)

    Article  MATH  Google Scholar 

  77. Hemami, S.S., Reibman, A.R.: No-reference image and video quality estimation: applications and human-motivated design. Signal Process. Image Commun. 25(7), 469–481 (2010)

    Article  Google Scholar 

  78. Reddy, A., Estrin, D., Govindan, R.: Large-scale fault isolation. IEEE J. Sel. Areas Commun. 18(5), 733–743 (2000)

    Article  Google Scholar 

  79. Mahimkar, A., et al.: Towards automated performance diagnosis in a large IPTV network. In: Proceedings of ACM SIGCOMM, Barcelona (2009)

  80. Keimel, C., Rothbucher, M., Shen, H., Diepold, K.: Video is a cube. IEEE Signal Process. Mag. 28(6), 41–49 (2011)

    Article  Google Scholar 

  81. Venkataraman, M., Chatterjee, M.: Evaluating quality of experience for streaming video in real time. In: Proceedings of GLOBECOM, Honolulu (2009)

  82. Zhai, G., Cai, J., Lin, W., Yang, X., Zhang, W., Etoh, M.: Cross-dimensional perceptual quality assessment for low bitrate videos. IEEE Trans. Multimed. 10(7), 1316–1324 (2008)

    Article  Google Scholar 

  83. TM Forum GB917: SLA Management Handbook, Release 3.1 (2012)

  84. Seo, S.S., Kwon, A., Kang, J.M., Hong, J.W.K.: OSLAM: towards ontology-based SLA management for IPTV services. In: Proceedings of ManFI Workshop, Dublin (2011)

  85. Broadband Forum: Triple-play service quality of experience (QoE) requirements. Technical Report TR-126 (2006)

  86. ITU-T Rec. G.1080: Quality of experience requirements for IPTV services (2008)

  87. Cermak, G.W.: Consumer opinions about frequency of artifacts in digital video. IEEE J. Sel. Topics Signal Process. 3(2), 336–343 (2009)

    Article  Google Scholar 

  88. Suresh, N., Jayant, H.: ‘Mean time between failures’: a subjectively meaningful video quality metric. In: Proceedings of ICASSP, Toulouse (2006)

  89. Shi, J., Zhou, S.: Quality control and improvement for multistage systems: a survey. IIE Trans. 41(9), 744–753 (2009)

    Article  Google Scholar 

  90. Knoche, H., de Meer, H., Kirsh, D.: Utility curves: Mean Opinion Scores considered biased. In: Proceedings of IWQoS, London (1999)

  91. McGurk, H., Macdonald, J.W.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)

    Article  Google Scholar 

  92. Laghari, K.U.R., Crespi, N., Connelly, K.: Toward total quality of experience: a QoE model in a communication ecosystem. IEEE Commun. Mag. 50(4), 58–65 (2012)

    Article  Google Scholar 

  93. Qualinet: Qualinet white paper on definitions of quality of experience (2013)

  94. Winkler, S.: Analysis of public image and video databases for quality assessment. IEEE J. Sel. Topics Signal Process. 6(6), 616–625 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

S. Winkler is supported by the research grant for ADSC’s Human Sixth Sense Programme from Singapore’s Agency for Science, Technology and Research (A*STAR).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Winkler.

Additional information

Communicated by B. Prabhakaran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Streijl, R.C., Winkler, S. & Hands, D.S. Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22, 213–227 (2016). https://doi.org/10.1007/s00530-014-0446-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0446-1

Keywords

Navigation