Abstract
Mean opinion score (MOS) has become a very popular indicator of perceived media quality. While there is a clear benefit to such a “reference quality indicator” and its widespread acceptance, MOS is often applied without sufficient consideration of its scope or limitations. In this paper, we critically examine MOS and the various ways it is being used today. We highlight common issues with both subjective and objective MOS and discuss a variety of alternative approaches that have been proposed for media quality measurement.
Similar content being viewed by others
References
ITU-T Rec. P.10: Vocabulary for performance and quality of service (2006)
Coren, S., Ward, L.M., Enns, J.T.: Sensation and Perception, 6th edn. Wiley, New York (2003)
Green, D.M., Swets, J.A.: Signal Detection Theory and Psychophysics. Wiley, London (1966)
Lewis, J.R.: Psychometric properties of the mean opinion scale. In: Proceedings of HCI International, vol. 1, pp. 149–153. New Orleans (2001)
ITU-R Rec. BT.500-13: Methodology for the subjective assessment of the quality of television pictures (2012)
Watson, A., Sasse, A.: Measuring perceived quality of speech and video in multimedia conferencing applications. In: Proceedings of ACM Multimedia, Bristol (1998)
ITU-T Rec. P.920: Interactive test methods for audiovisual communications (2000)
Sefiridis, V., Ghanbari, M., Pearson, D.E.: Forgiveness effect in subjective assessment of packet video. Electron. Lett. 28(1), 2013–2014 (1992)
Aldridge, R., Davidoff, J., Ghanbari, M., Hands, D., Pearson, D.: Measurement of scene-dependent quality variations in digitally coded television pictures. IEEE Proc. Vis. Signal Image Proc. 142, 149–154 (1995)
ITU-T Rec. P.911: Subjective audiovisual quality assessment methods for multimedia applications (1998)
Araujo, P., Frøyland, L.: Statistical approach to the rational selection of experimental subjects. Accred. Qual. Assur. 10(5), 185–189 (2004)
Jumisko-Pyykkö, S., Häkkinen, J.: Profiles of the evaluators—impact of psychographic variables on the consumer-oriented quality assessment of mobile television. In: Proceedings of SPIE Multimedia on Mobile Devices, vol. 6821. San Jose (2008)
Speranza, F., Poulin, F., Renaud, R., Caron, M., Dupras, J.: Objective and subjective quality assessment with expert and non-expert viewers. In: Proceedings of QoMEX, Trondheim, Norway (2010)
Köster, O., Jessen, M., Khairi, F., Eckert, H.: Auditory-perceptual identification of voice quality by expert and non-expert listeners. In: Proceedings of International Congress of Phonetic Sciences, Saarbrücken, Germany (2007)
Choi, H., Jeong, T., Lee, C.: Subjective video quality comparison using various displays. Opt. Eng. 48(4), 037002 (2009)
Sullivan, M., Pratt, J., Kortum, P.: Practical issues in subjective video quality evaluation: Human factors vs. psychophysical image quality evaluation. In: Proceedings of uxTV, Silicon Valley (2008)
Staelens, N., et al.: Assessing quality of experience of IPTV and video on demand services in real-life environments. IEEE Trans. Broadcast. 56(4), 458–466 (2010)
ITU-T Rec. P.910: Subjective video quality assessment methods for multimedia applications (2008)
ITU-T Rec. P.800: Methods for subjective determination of transmission quality (1996)
ITU-R Rec. BS.1284: General methods for the subjective assessment of sound quality (2003)
Corriveau, P.: Video quality testing. In: Wu, R., Rao, K.R. (eds.) Digital Video Image Quality and Perceptual Coding chapter 5. CRC Press, Boca Raton (2006)
Guilford, J.P.: Psychometric Methods, 2nd edn. McGraw-Hill, New York (1954)
Tominaga, T., Hayashi, T., Okamoto, J., Takahashi, A.: Performance comparisons of subjective quality assessment methods for mobile video. In: Proceedings of QoMEX, Trondheim, Norway (2010)
Rouse, D.M., Pepion, R., Le Callet, P., Hemami, S.S.: Tradeoffs in subjective testing methods for image and video quality assessment. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol. 7527. San Jose (2010)
Pinson, M., Wolf, S.: Comparing subjective video quality testing methodologies. In: Proceedings of SPIE VCIP, vol. 5150. Lugano (2003)
ITU-R Report BT.1082-1: Studies toward the unification of picture assessment methodology (1990)
Mullin, J., Smallwood, L., Watson, A., Wilson, G.M.: New techniques for assessing audio and video quality in real-time interactive communication. In: Proceedings of IHM-HCI, Lille. (2001)
Winkler, S.: On the properties of subjective ratings in video quality experiments. In: Proceedings of QoMEX, San Diego (2009)
Huynh-Thu, Q., Garcia, M.N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57(1), 1–14 (2011)
Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97 (1956)
VQEG: Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment (2000)
Winkler, S.: Digital Video Quality: Vision Models and Metrics. Wiley, Chichester (2005)
Xu, J., Xing, L., Perkis, A., Jiang, Y.: On the properties of mean opinion scores for quality of experience management. In: Proceedings of IEEE International Symposium on Multimedia, Dana Point (2011)
ITU-T Rec. G.107: The E-model: a computational model for use in transmission planning (2011)
ITU-T Rec. G.1070: Opinion model for video-telephony applications (2007)
Brunnström, K., Hands, D., Speranza, F., Webster, A.: VQEG validation and ITU standardization of objective perceptual video quality. IEEE Signal Process. Mag. 26(3), 96–101 (2009)
Lin, W., Jay Kuo, C.C.: Perceptual visual quality metrics: a survey. J. Vis. Commun. Image Represent. 22(4), 297–312 (2011)
Chikkerur, S., Sundaram, V., Reisslein, M., Karam, L.J.: Objective video quality assessment methods: a classification, review, and performance comparison. IEEE Trans. Broadcast. 57(2), 165–182 (2011)
You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkis, A.: Perceptual-based quality assessment for audio–visual services: a survey. Sig. Process. Image Commun. 25(7), 482–501 (2010)
ITU-T Rec. P.1401: Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models (2012)
ITU-T Rec. P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)
Mittal, A., Muralidhar, G.S., Ghosh, J., Bovik, A.C.: Blind image quality assessment without human training using latent quality factors. IEEE Signal Process. Lett. 19(2), 75–78 (2012)
Xue, W., Zhang, L., Mou, X.: Learning without human scores for blind image quality assessment. In: Proceedings of CVPR, Portland (2013)
Vranješ, M., Rimac-Drlje, S., Grgic, K.: Review of objective video quality metrics and performance comparison using different databases. Signal Process. Image Commun. 28(1), 1–19 (2013)
Streijl, R., Winkler, S., Hands, D.: Perceptual quality measurement—towards a more efficient process for validating objective models. IEEE Signal Process. Mag. 27(4), 136–140 (2010)
Sen, D.: Determining the dimensions of speech quality from PCA and MDS analysis of the diagnostic acceptability measure. In: Proceedings of MESAQIN, Prague (2011)
Janowski, L., Papir, Z.: Modeling subjective tests of quality of experience with a generalized linear model. In: Proceedings of QoMEX, San Diego (2009)
Carroll, R.J., Wu, C.F.J., Ruppert, D.: The effect of estimating weights in weighted least squares. J. Am. Stat. Assoc. 83(404), 1045–1054 (1988)
Nachlieli, H., Shaked, D.: Measuring the quality of quality measures. IEEE Trans. Image Process. 20(1), 76–87 (2011)
Wu, O., Hu, W., Gao, J.: Learning to predict the perceived visual quality of photos. In: Proceedings of ICCV, Barcelona (2011)
Brooks, A.C., Zhao, X., Pappas, T.N.: Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. IEEE Trans. Image Process. 17, 1261–1273 (2008)
Wang, Z., Simoncelli, E.P.: Maximum differentiation (MAD) competition: a methodology for comparing computational models of perceptual quantities. J. Vis. 8(12), 1–13 (2008)
Ciaramello, F.M., Reibman, A.R.: Systematic stress testing of image quality estimators. In: Proceedings of ICIP, Brussels (2011)
Reibman, A.R.: A strategy to jointly test image quality estimators subjectively. In: Proceedings of ICIP, Orlando (2012)
ATIS-0800041: Implementer’s guide to QoS metrics (2010)
ATIS-0800008: QoS metrics for linear IPTV. Version 2 (2011)
ITU-T Rec. P.800.2: Mean opinion score (MOS) interpretation and reporting (2013)
de Koning, T.C.M., Veldhoven, P., Knoche, H., Kooij, R.E.: Of MOS and men: Bridging the gap between objective and subjective quality measurements in mobile TV. In: Proceedings of SPIE Multimedia on Mobile Devices, vol. 6507. San Jose (2007)
Jumisko-Pyykkö, S., Malamal Vadakital, V.K., Hannuksela, M.M.: Acceptance threshold: a bidimensional research method for user-oriented quality evaluation studies. Int. J. Digit. Multimed. Broadcast. 2008, 712380 (2008). doi:10.1155/2008/712380
Watson, A.B., Kreslake, L.: Measurement of visual impairment scales for digital video. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol. 4299. San Jose (2001)
Jia, Y., Lin, W., Kassim, A.A.: Estimating just-noticeable distortion for video. IEEE Trans. Circuits Syst. Video Technol. 16(7), 820–829 (2006)
Maloney, L.T., Yang, J.N.: Maximum likelihood difference scaling. J. Vis. 3(8), 573–585 (2003)
Virtanen, M.T., Gleiss, N., Goldstein, M.: On the use of evaluative category scales in telecommunications. In: Proceedings of Human Factors in Telecommunications, pp. 253–260 (1995)
Preminger, J.E., Van Tassell, D.J.: Quantifying the relationship between speech quality and speech intelligibility. J. Speech Hear. Res. 38, 714–725 (1995)
Voiers, W.D.: Diagnostic acceptability measure for speech communication systems. In: Proceedings of ICASSP, Hartford (1977)
Martens, J.B.: Multidimensional modeling of image quality. Proc. IEEE 90(1), 133–153 (2002)
Borg, I., Groenen, P.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005)
Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press, Urbana (1957)
Hinterleitner, F., Norrenbrock, C.R., Möller, S.: Is intelligibility still the main problem? A review of perceptual quality dimensions of synthetic speech. In: Proceedings of ISCA Speech Synthesis Workshop, Barcelona (2013)
Wältermann, M.: Dimension-based Quality Modeling of Transmitted Speech. Springer, Berlin (2013)
Ghinea, G., Thomas, J.: Quality of perception: user quality of service in multimedia presentations. IEEE Trans. Multimed. 7(4), 786–789 (2005)
ANSI T1.801.02: Digital transport of video teleconferencing/video telephony signals—performance terms, definitions, and examples (1996)
Radun, J., et al.: Content and quality: interpretation-based estimation of image quality. ACM Trans. Appl. Percept. 4(4), 21 (2008)
Strohmeier, D., Jumisko-Pyykkö, S., Kunze, K.: Open profiling of quality: a mixed method approach to understanding multimodal quality perception. Adv. Multimed. 2010, 658980 (2010)
Egger, S., Ries, M., Reichl, P.: Quality-of-experience beyond MOS: experiences with a holistic user test methodology for interactive video services. In: Proceedings of 21st ITC Specialist Seminar, Miyazaki (2010)
Yuen, M., Wu, H.R.: A survey of hybrid MC/DPCM/DCT video coding distortions. Signal Process. 70(3), 247–278 (1998)
Hemami, S.S., Reibman, A.R.: No-reference image and video quality estimation: applications and human-motivated design. Signal Process. Image Commun. 25(7), 469–481 (2010)
Reddy, A., Estrin, D., Govindan, R.: Large-scale fault isolation. IEEE J. Sel. Areas Commun. 18(5), 733–743 (2000)
Mahimkar, A., et al.: Towards automated performance diagnosis in a large IPTV network. In: Proceedings of ACM SIGCOMM, Barcelona (2009)
Keimel, C., Rothbucher, M., Shen, H., Diepold, K.: Video is a cube. IEEE Signal Process. Mag. 28(6), 41–49 (2011)
Venkataraman, M., Chatterjee, M.: Evaluating quality of experience for streaming video in real time. In: Proceedings of GLOBECOM, Honolulu (2009)
Zhai, G., Cai, J., Lin, W., Yang, X., Zhang, W., Etoh, M.: Cross-dimensional perceptual quality assessment for low bitrate videos. IEEE Trans. Multimed. 10(7), 1316–1324 (2008)
TM Forum GB917: SLA Management Handbook, Release 3.1 (2012)
Seo, S.S., Kwon, A., Kang, J.M., Hong, J.W.K.: OSLAM: towards ontology-based SLA management for IPTV services. In: Proceedings of ManFI Workshop, Dublin (2011)
Broadband Forum: Triple-play service quality of experience (QoE) requirements. Technical Report TR-126 (2006)
ITU-T Rec. G.1080: Quality of experience requirements for IPTV services (2008)
Cermak, G.W.: Consumer opinions about frequency of artifacts in digital video. IEEE J. Sel. Topics Signal Process. 3(2), 336–343 (2009)
Suresh, N., Jayant, H.: ‘Mean time between failures’: a subjectively meaningful video quality metric. In: Proceedings of ICASSP, Toulouse (2006)
Shi, J., Zhou, S.: Quality control and improvement for multistage systems: a survey. IIE Trans. 41(9), 744–753 (2009)
Knoche, H., de Meer, H., Kirsh, D.: Utility curves: Mean Opinion Scores considered biased. In: Proceedings of IWQoS, London (1999)
McGurk, H., Macdonald, J.W.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)
Laghari, K.U.R., Crespi, N., Connelly, K.: Toward total quality of experience: a QoE model in a communication ecosystem. IEEE Commun. Mag. 50(4), 58–65 (2012)
Qualinet: Qualinet white paper on definitions of quality of experience (2013)
Winkler, S.: Analysis of public image and video databases for quality assessment. IEEE J. Sel. Topics Signal Process. 6(6), 616–625 (2012)
Acknowledgments
S. Winkler is supported by the research grant for ADSC’s Human Sixth Sense Programme from Singapore’s Agency for Science, Technology and Research (A*STAR).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by B. Prabhakaran.
Rights and permissions
About this article
Cite this article
Streijl, R.C., Winkler, S. & Hands, D.S. Mean opinion score (MOS) revisited: methods and applications, limitations and alternatives. Multimedia Systems 22, 213–227 (2016). https://doi.org/10.1007/s00530-014-0446-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-014-0446-1