research-article

Who moderates the moderators?: crowdsourcing abuse detection in user-generated content

Authors:

Preston McAfeeAuthors Info & Claims

EC '11: Proceedings of the 12th ACM conference on Electronic commerce

Pages 167 - 176

https://doi.org/10.1145/1993574.1993599

Published: 05 June 2011 Publication History

Abstract

A large fraction of user-generated content on the Web, such as posts or comments on popular online forums, consists of abuse or spam. Due to the volume of contributions on popular sites, a few trusted moderators cannot identify all such abusive content, so viewer ratings of contributions must be used for moderation. But not all viewers who rate content are trustworthy and accurate. What is a principled approach to assigning trust and aggregating user ratings, in order to accurately identify abusive content? In this paper, we introduce a framework to address the problem of moderating online content using crowdsourced ratings. Our framework encompasses users who are untrustworthy or inaccurate to an unknown extent --- that is, both the content and the raters are of unknown quality. With no knowledge whatsoever about the raters, it is impossible to do better than a random estimate. We present efficient algorithms to accurately detect abuse that only require knowledge about the identity of a single 'good' agent, who rates contributions accurately more than half the time. We prove that our algorithm can infer the quality of contributions with error that rapidly converges to zero as the number of observations increases; we also numerically demonstrate that the algorithm has very high accuracy for much fewer observations. Finally, we analyze the robustness of our algorithms to manipulation by adversarial or strategic raters, an important issue in moderating online content, and quantify how the performance of the algorithm degrades with the number of manipulating agents.

References

[1]

G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734--749, June 2005.

Digital Library

[2]

Y. Azar, A. Fiat, A. R. Karlin, F. McSherry, and J. Saia. Spectral analysis of data. In STOC, pages 619--626, 2001.

Digital Library

[3]

R. Bhattacharjee and A. Goel. Algorithms and incentives for robust ranking. In SODA, pages 425--433, 2007.

Digital Library

[4]

V. Conitzer. Making decisions based on the preferences of multiple agents. Commun. ACM, 53(3):84--94, 2010.

Digital Library

[5]

E. Friedman, P. Resnick, and R. Sami. Manipulation-resistant reputation systems. Algorithmic Game Theory, 2007.

[6]

A. Ghosh and R. P. McAfee. Incentivizing high-quality user generated content. In Proc. WWW, 2011.

Digital Library

[7]

B. Grofman and L. Shapley. Optimizing group judgmental accuracy in the presence of interdependencies. Public Choice, 43:329--343, 1984.

[8]

P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In Proc. Workshop on Human Computation, 2010.

Digital Library

[9]

S. Nitzan and J. Paroush. Optimal Decision Rules in Uncertain Dichotomous Choice Situations. International Economic Review, 23(2):289--297, June 1982.

[10]

M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. J. ACM, 54(4), 2007.

Digital Library

[11]

L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.

[12]

L. Trevisan. Computing eigenvectors. http://lucatrevisan.wordpress.com/2011/01/29/cs359g- lecture-7-computing-eigenvectors/.

[13]

P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In NIPS, 2010.

Digital Library

[14]

P. Welinder and P. Perona. Online crowdsourcing: Rating annotators and obtaining cost-effective labels. In Computer Vision and Pattern Recognition Workshop, 2010.

[15]

J. Whitehill, P. Ruvolo, J. Bergsma, T. Wu, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, 2009.

Digital Library

[16]

H. Yu, M. Kaminsky, P. B. Gibbons, and A. D. Flaxman. Sybilguard: Defending against sybil attacks via social networks. IEEE/ACM Trans. Netw., 16(3):576--589, 2008.

Digital Library

Cited By

Gunturi UKumar ADing XRho E(2024)Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/36373668:CSCW1(1-36)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637366
Kim DLee JChung H(2024)A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental LimitsIEEE Transactions on Information Theory10.1109/TIT.2023.331572070:3(2076-2117)Online publication date: Mar-2024
https://doi.org/10.1109/TIT.2023.3315720
Marrinan TIbrahim SFu X(2024)Labeling Sequential Data From Noisy Annotations2024 IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM)10.1109/SAM60225.2024.10636383(1-5)Online publication date: 8-Jul-2024
https://doi.org/10.1109/SAM60225.2024.10636383
Show More Cited By

Index Terms

Who moderates the moderators?: crowdsourcing abuse detection in user-generated content
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Tag suggestion and localization in user-generated videos based on social knowledge
WSM '10: Proceedings of second ACM SIGMM workshop on Social media

Nowadays, almost any web site that provides means for sharing user-generated multimedia content, like Flickr, Facebook, YouTube and Vimeo, has tagging functionalities to let users annotate the material that they want to share. The tags are then used to ...
Semantic annotation of personal video content using an image folksonomy
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

The increasing popularity of user-generated content (UGC) requires effective annotation techniques in order to facilitate precise content search and retrieval. In this paper, we propose a new approach for the semantic annotation of personal video ...
User generated content and credibility evaluation of online health information

This meta-analysis addresses credibility concerns for online health information.A collection of empirical studies addressing user-generated content was analyzed.We synthesized 22 effect sizes drawn from empirical studies of 1346 participants.Source ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EC '11: Proceedings of the 12th ACM conference on Electronic commerce

June 2011

384 pages

ISBN:9781450302616

DOI:10.1145/1993574

General Chair:
Yoav Shoham
Stanford University, USA
,
Program Chairs:
Yan Chen
University of Michigan, USA
,
Tim Roughgarden
Stanford University, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGecom: Special Interest Group on Economics and Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EC '11

Sponsor:

SIGecom

EC '11: ACM Conference on Electronic Commerce

June 5 - 9, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Upcoming Conference

EC '25

Sponsor:
sigecom

The 25th ACM Conference on Economics and Computation

July 7 - 11, 2025

Stanford , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

92
Total Citations
View Citations
1,072
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)8

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gunturi UKumar ADing XRho E(2024)Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/36373668:CSCW1(1-36)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3637366
Kim DLee JChung H(2024)A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental LimitsIEEE Transactions on Information Theory10.1109/TIT.2023.331572070:3(2076-2117)Online publication date: Mar-2024
https://doi.org/10.1109/TIT.2023.3315720
Marrinan TIbrahim SFu X(2024)Labeling Sequential Data From Noisy Annotations2024 IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM)10.1109/SAM60225.2024.10636383(1-5)Online publication date: 8-Jul-2024
https://doi.org/10.1109/SAM60225.2024.10636383
Wu JFang HZhu JZhang YLi XLiu YLiu HJin YHuang WLiu QChen CLiu YDuan LXu YXiao LYang WLiu Y(2024)Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple ratersScience Bulletin10.1016/j.scib.2024.06.03769:18(2906-2919)Online publication date: Sep-2024
https://doi.org/10.1016/j.scib.2024.06.037
Nguyen TIbrahim SFu XKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Deep clustering with incomplete noisy pairwise annotationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619489(25980-26007)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619489
Jeong HChung HKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Recovering top-two answers and confusion probability in multi-choice crowdsourcingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619013(14836-14868)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619013
Xu QYuan YWang JQu A(2023)Crowdsourcing Utilizing Subgroup Structure of Latent Factor ModelingJournal of the American Statistical Association10.1080/01621459.2023.2178925119:546(1192-1204)Online publication date: 16-Mar-2023
https://doi.org/10.1080/01621459.2023.2178925
Baer MPurves R(2023)Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-TransformersKI - Künstliche Intelligenz10.1007/s13218-022-00793-337:1(55-67)Online publication date: 20-Jan-2023
https://doi.org/10.1007/s13218-022-00793-3
Frey SZhong QBulat BWeisman WLiu CFujimoto SWang HSchweik C(2022)Governing Online Goods: Maturity and Formalization in Minecraft, Reddit, and World of Warcraft CommunitiesProceedings of the ACM on Human-Computer Interaction10.1145/35551916:CSCW2(1-23)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3555191
Yu HZhang CLi JZhang S(2022)Robust Sparse Weighted Classification For CrowdsourcingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3201955(1-13)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3201955
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten