skip to main content
10.1145/1993574.1993599acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
research-article

Who moderates the moderators?: crowdsourcing abuse detection in user-generated content

Published: 05 June 2011 Publication History

Abstract

A large fraction of user-generated content on the Web, such as posts or comments on popular online forums, consists of abuse or spam. Due to the volume of contributions on popular sites, a few trusted moderators cannot identify all such abusive content, so viewer ratings of contributions must be used for moderation. But not all viewers who rate content are trustworthy and accurate. What is a principled approach to assigning trust and aggregating user ratings, in order to accurately identify abusive content? In this paper, we introduce a framework to address the problem of moderating online content using crowdsourced ratings. Our framework encompasses users who are untrustworthy or inaccurate to an unknown extent --- that is, both the content and the raters are of unknown quality. With no knowledge whatsoever about the raters, it is impossible to do better than a random estimate. We present efficient algorithms to accurately detect abuse that only require knowledge about the identity of a single 'good' agent, who rates contributions accurately more than half the time. We prove that our algorithm can infer the quality of contributions with error that rapidly converges to zero as the number of observations increases; we also numerically demonstrate that the algorithm has very high accuracy for much fewer observations. Finally, we analyze the robustness of our algorithms to manipulation by adversarial or strategic raters, an important issue in moderating online content, and quantify how the performance of the algorithm degrades with the number of manipulating agents.

References

[1]
G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734--749, June 2005.
[2]
Y. Azar, A. Fiat, A. R. Karlin, F. McSherry, and J. Saia. Spectral analysis of data. In STOC, pages 619--626, 2001.
[3]
R. Bhattacharjee and A. Goel. Algorithms and incentives for robust ranking. In SODA, pages 425--433, 2007.
[4]
V. Conitzer. Making decisions based on the preferences of multiple agents. Commun. ACM, 53(3):84--94, 2010.
[5]
E. Friedman, P. Resnick, and R. Sami. Manipulation-resistant reputation systems. Algorithmic Game Theory, 2007.
[6]
A. Ghosh and R. P. McAfee. Incentivizing high-quality user generated content. In Proc. WWW, 2011.
[7]
B. Grofman and L. Shapley. Optimizing group judgmental accuracy in the presence of interdependencies. Public Choice, 43:329--343, 1984.
[8]
P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In Proc. Workshop on Human Computation, 2010.
[9]
S. Nitzan and J. Paroush. Optimal Decision Rules in Uncertain Dichotomous Choice Situations. International Economic Review, 23(2):289--297, June 1982.
[10]
M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. J. ACM, 54(4), 2007.
[11]
L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.
[12]
L. Trevisan. Computing eigenvectors. http://lucatrevisan.wordpress.com/2011/01/29/cs359g- lecture-7-computing-eigenvectors/.
[13]
P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In NIPS, 2010.
[14]
P. Welinder and P. Perona. Online crowdsourcing: Rating annotators and obtaining cost-effective labels. In Computer Vision and Pattern Recognition Workshop, 2010.
[15]
J. Whitehill, P. Ruvolo, J. Bergsma, T. Wu, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems, 2009.
[16]
H. Yu, M. Kaminsky, P. B. Gibbons, and A. D. Flaxman. Sybilguard: Defending against sybil attacks via social networks. IEEE/ACM Trans. Netw., 16(3):576--589, 2008.

Cited By

View all
  • (2024)Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/36373668:CSCW1(1-36)Online publication date: 26-Apr-2024
  • (2024)A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental LimitsIEEE Transactions on Information Theory10.1109/TIT.2023.331572070:3(2076-2117)Online publication date: Mar-2024
  • (2024)Labeling Sequential Data From Noisy Annotations2024 IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM)10.1109/SAM60225.2024.10636383(1-5)Online publication date: 8-Jul-2024
  • Show More Cited By

Index Terms

  1. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EC '11: Proceedings of the 12th ACM conference on Electronic commerce
    June 2011
    384 pages
    ISBN:9781450302616
    DOI:10.1145/1993574
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. moderation
    3. user-generated content

    Qualifiers

    • Research-article

    Conference

    EC '11
    Sponsor:
    EC '11: ACM Conference on Electronic Commerce
    June 5 - 9, 2011
    California, San Jose, USA

    Acceptance Rates

    Overall Acceptance Rate 664 of 2,389 submissions, 28%

    Upcoming Conference

    EC '25
    The 25th ACM Conference on Economics and Computation
    July 7 - 11, 2025
    Stanford , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social MediaProceedings of the ACM on Human-Computer Interaction10.1145/36373668:CSCW1(1-36)Online publication date: 26-Apr-2024
    • (2024)A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental LimitsIEEE Transactions on Information Theory10.1109/TIT.2023.331572070:3(2076-2117)Online publication date: Mar-2024
    • (2024)Labeling Sequential Data From Noisy Annotations2024 IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM)10.1109/SAM60225.2024.10636383(1-5)Online publication date: 8-Jul-2024
    • (2024)Multi-rater Prism: Learning self-calibrated medical image segmentation from multiple ratersScience Bulletin10.1016/j.scib.2024.06.03769:18(2906-2919)Online publication date: Sep-2024
    • (2023)Deep clustering with incomplete noisy pairwise annotationsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619489(25980-26007)Online publication date: 23-Jul-2023
    • (2023)Recovering top-two answers and confusion probability in multi-choice crowdsourcingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619013(14836-14868)Online publication date: 23-Jul-2023
    • (2023)Crowdsourcing Utilizing Subgroup Structure of Latent Factor ModelingJournal of the American Statistical Association10.1080/01621459.2023.2178925119:546(1192-1204)Online publication date: 16-Mar-2023
    • (2023)Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-TransformersKI - Künstliche Intelligenz10.1007/s13218-022-00793-337:1(55-67)Online publication date: 20-Jan-2023
    • (2022)Governing Online Goods: Maturity and Formalization in Minecraft, Reddit, and World of Warcraft CommunitiesProceedings of the ACM on Human-Computer Interaction10.1145/35551916:CSCW2(1-23)Online publication date: 11-Nov-2022
    • (2022)Robust Sparse Weighted Classification For CrowdsourcingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3201955(1-13)Online publication date: 2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media