skip to main content
10.1145/1631272.1631339acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A crowdsourceable QoE evaluation framework for multimedia content

Published: 19 October 2009 Publication History

Abstract

Until recently, QoE (Quality of Experience) experiments had to be conducted in academic laboratories; however, with the advent of ubiquitous Internet access, it is now possible to ask an Internet crowd to conduct experiments on their personal computers. Since such a crowd can be quite large, crowdsourcing enables researchers to conduct experiments with a more diverse set of participants at a lower economic cost than would be possible under laboratory conditions. However, because participants carry out experiments without supervision, they may give erroneous feedback perfunctorily, carelessly, or dishonestly, even if they receive a reward for each experiment.
In this paper, we propose a crowdsourceable framework to quantify the QoE of multimedia content. The advantages of our framework over traditional MOS ratings are: 1) it enables crowdsourcing because it supports systematic verification of participants' inputs; 2) the rating procedure is simpler than that of MOS, so there is less burden on participants; and 3) it derives interval-scale scores that enable subsequent quantitative analysis and QoE provisioning. We conducted four case studies, which demonstrated that, with our framework, researchers can outsource their QoE evaluation experiments to an Internet crowd without risking the quality of the results; and at the same time, obtain a higher level of participant diversity at a lower monetary cost.

References

[1]
H.264/AVC reference software JM 15.1. http://iphome.hhi.de/suehring/tml/.
[2]
O. Alonso, D. E. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. SIGIR Forum, 42(2):9--15, 2008.
[3]
A. Blanc, Y.-K. Liu, and A. Vahdat. Designing incentives for peer-to-peer routing. In Proceedings of IEEE INFOCOM 2005, pages 374--385, March 2005.
[4]
P. Bordia. Face-to-face versus computer-mediated communication: A synthesis of the experimental literature. Journal of Business Communication, 34(1):99--118, 1997.
[5]
D. Brabham. Crowdsourcing as a model for problem solving: An introduction and cases. Convergence, 14(1):75, 2008.
[6]
R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324--345, 1952.
[7]
K.-T. Chen, C.-Y. Huang, P. Huang, and C.-L. Lei. Quantifying Skype user satisfaction. In Proceedings of ACM SIGCOMM 2006, Pisa, Itlay, Sep 2006.
[8]
K.-T. Chen, C. C. Tu, and W.-C. Xiao. OneClick: A framework for measuring network quality of experience. In Proceedings of IEEE INFOCOM 2009, April 2009.
[9]
S. Choisel and F. Wickelmaier. Evaluation of multichannel reproduced sound: Scaling auditory attributes underlying listener preference. The Journal of the Acoustical Society of America, 121(1):388--400, 2007.
[10]
H. A. David. The Method of Paired Comparisons. Oxford University Press, 1988.
[11]
R. Dittrich, R. Hatzinger, and W. Katzenbeisser. Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. Journal of the Royal Statistical Society (Series C): Applied Statistics, 47(4):511--525, 1998.
[12]
B. Duffy, K. Smith, G. Terhanian, and J. Bremer. Comparing data from online and face-to-face surveys. Internation Journal of Market Research, 47(6):615--639, 2005.
[13]
A. Fernandes, E. Kotsovinos, S. Otring, and B. Dragovic. Pinocchio: Incentives for honest participation in global-scale distributed trust management. In Proceedings of iTrust2004, pages 63--77, 2003.
[14]
C.-J. Ho, T.-H. Chang, and J. Y.-j. Hsu. Photoslap: A multi-player online game for semantic annotation. In Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, British Columbia, July 2007.
[15]
C.-J. Ho and K.-T. Chen. On formal models for social verification. In Proceedings of Human Computation Workshop 2009 (affiliated to ACM KDD 2009), Paris, France, 2009.
[16]
J. Howe. The rise of crowdsourcing. Wired Magazine, 14(6):176--183, 2006.
[17]
Y. Ito and S. Tasaka. Quantitative assessment of user-level QoS and its mapping. IEEE Transactions on Multimedia, 7(3):572--584, June 2005.
[18]
ITU-R Recommendation BT.500-11. Methodology for the subjective assessment of the quality of television pictures, 2002.
[19]
ITU-T Recommendation P.800. Methods for subjective determination of transmission quality, 1996.
[20]
ITU-T Recommendation J.247. Objective perceptual multimedia video quality measurement in the presence of a full reference, 2008.
[21]
ITU-T Recommendation P.862. Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, 2001.
[22]
R. Jain. Quality of experience. IEEE Multimedia, 11(1):96--97, Jan.-March 2004.
[23]
S. Jain, Y. Chen, and D. C. Parkes. Designing incentives for online question and answers forums. In 10th ACM Electronic Commerce Conference (EC'09), 2009.
[24]
R. Jurca and B. Faltings. An incentive compatible reputation mechanism. In Proceedings of IEEE International Conference on E-Commerce Technology, pages 285--292, June 2003.
[25]
A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proceedings of ACM CHI'08, pages 453--456, 2008.
[26]
C. L. Knott and M. S. James. An alternate approach to developing a total celebrity endorser rating model using the analytic hierarchy process. International Transactions in Operational Research, 11(1):87--95, 2004.
[27]
R. D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, New York, 1959.
[28]
J. N. S. Matthews and K. P. Morris. An application of bradley-terry-type models to the measurement of pain. Applied Statistics, 44:243--255, 1995.
[29]
N. L. Powers and R. M. Pangborn. Paired comparison and time-intensity measurements of the sensory properties of beverages and gelatins containing sucrose or synthetic sweeteners. Journal of Food Science, 43(1):41--46, 1978.
[30]
P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman. Reputation systems. Commun. ACM, 43(12):45--48, 2000.
[31]
P. Rossi, Z. Gilula, and G. Allenby. Overcoming scale usage heterogeneity: A bayesian hierarchical approach. Journal of the American Statistical Association, 96(453):20--31, 2001.
[32]
T. L. Saaty. A scaling method for priorities in hierarchical structures. Journal of Mathematical Psychology, 15(3):234--281, 1977.
[33]
M. V. Selm and N. W. Jankowski. Conducting online surveys. Quality and Quantity, 40(3):435--456, 2006.
[34]
A. Sorokin and D. Forsyth. Utility data annotation with Amazon Mechanical Turk. In Computer Vision and Pattern Recognition Workshops (CVPRW'08), pages 1--8, June 2008.
[35]
S. Tasaka, H. Yoshimi, A. Hirashima, and T. Nunome. The effectiveness of a QoE-based video output scheme for audio-video IP transmission. In Proceeding of ACM Multimedia 2008, pages 259--268, Vancouver, Canada, 2008.
[36]
A. Watson and M. A. Sasse. Measuring perceived quality of speech and video in multimedia conferencing applications. In Proceedings of ACM Multimedia 1998, pages 55--60. ACM, 1998.
[37]
K. B. Wright. Researching Internet-based populations: Advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. Journal of Computer-Mediated Communication, 3(10), 2005.

Cited By

View all
  • (2024)Review of Image Quality Assessment Methods for Compressed ImagesJournal of Imaging10.3390/jimaging1005011310:5(113)Online publication date: 8-May-2024
  • (2024)A Theoretical Framework for Provider’s QoE Assessment using Individual and Objective QoE Monitoring2024 16th International Conference on Quality of Multimedia Experience (QoMEX)10.1109/QoMEX61742.2024.10598265(235-241)Online publication date: 18-Jun-2024
  • (2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '09: Proceedings of the 17th ACM international conference on Multimedia
October 2009
1202 pages
ISBN:9781605586083
DOI:10.1145/1631272
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bradley-Terry-Luce Model
  2. crowdsourcing
  3. mean opinion score (MOS)
  4. paired comparison
  5. probabilistic choice model
  6. quality of experience (QoE)

Qualifiers

  • Research-article

Conference

MM09
Sponsor:
MM09: ACM Multimedia Conference
October 19 - 24, 2009
Beijing, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Review of Image Quality Assessment Methods for Compressed ImagesJournal of Imaging10.3390/jimaging1005011310:5(113)Online publication date: 8-May-2024
  • (2024)A Theoretical Framework for Provider’s QoE Assessment using Individual and Objective QoE Monitoring2024 16th International Conference on Quality of Multimedia Experience (QoMEX)10.1109/QoMEX61742.2024.10598265(235-241)Online publication date: 18-Jun-2024
  • (2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
  • (2023)Subjective Assessment of Objective Image Quality Metrics Range Guaranteeing Visually Lossless CompressionSensors10.3390/s2303129723:3(1297)Online publication date: 23-Jan-2023
  • (2023)Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via TextIEEE Journal on Selected Areas in Communications10.1109/JSAC.2022.322195341:1(107-118)Online publication date: Jan-2023
  • (2023)A review of QoE research progress in metaverseDisplays10.1016/j.displa.2023.10238977(102389)Online publication date: Apr-2023
  • (2022)Break, Repair, Learn, Break Less: Investigating User Preferences for Assignment of Divergent Phrasing Learning Burden in Human-Agent Interaction to Minimize Conversational BreakdownsProceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia10.1145/3568444.3568454(151-158)Online publication date: 27-Nov-2022
  • (2022)Not All Samples are Trustworthy: Towards Deep Robust SVP PredictionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.304781744:6(3154-3169)Online publication date: 1-Jun-2022
  • (2022)A Survey on Multimedia Services QoE Assessment and Machine Learning-Based PredictionIEEE Access10.1109/ACCESS.2022.314959210(19507-19538)Online publication date: 2022
  • (2021)Subjective and Objective Quality Assessments of Display ProductsEntropy10.3390/e2307081423:7(814)Online publication date: 26-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media