Abstract
More and more online communities classify contributions based on collaborative ratings of these contributions. A popular method for such a rating-based classification is the Dawid–Skene algorithm (DSA). However, despite its popularity, DSA has two major shortcomings: (1) It is vulnerable to raters with a low competence, i.e., a low probability of rating correctly. (2) It is defenseless against collusion attacks. In a collusion attack, raters coordinate to rate the same data objects with the same value to artificially increase their remuneration. In this paper, to cope with these issues, we propose gold strategies based on the level of agreement between raters. Gold strategies adopt the notion of gold objects, i.e., contributions whose true value is known. We show that selecting gold objects at random, as is common in the literature, does not increase the accuracy of DSA in a low-competence setting to a satisfying degree. Instead, our gold strategies select contributions based on the level of agreement between community members, i.e., to which extent their ratings agree on the class of a given contribution. To maximize the net benefit of gold objects, i.e., their benefit minus their costs, we propose an adaptive algorithm. It determines the number of gold objects based on runtime information. We extensively evaluate the effectiveness of gold strategies in low-competence settings and against collusion attacks by means of simulation. We find that gold strategies based on a high level of agreement between raters improve the accuracy of DSA in low-competence settings considerably. Further, the gold strategies are highly effective against collusion attacks. Finally, the adaptive algorithm determines the optimal gold ratio for each strategy and each setting with high accuracy.













Similar content being viewed by others
Notes
The rule that decides in favor for the type t that receives most of the ratings, i.e., \(\hat{o}_k=t {\text {, if}}\; {{\mathrm{arg\,max}}}_{t\in T} =\sum _{r_{i,k}} \mathbb {1}(r_{i,k}=t)\), is called plurality vote. In the literature simple majority vote and plurality vote are often both called majority vote according to Kuncheva (2004). Plurality vote and majority vote are equivalent for settings where the number of types is two and the number of ratings is odd.
We chose \(n=100\), since we deem smaller, more unstable communities the more interesting case. Further, our results (not shown) indicate, that larger n do not change the simulation results to a significant degree.
For domains where we do not trust experts to be completely accurate we could combine the ratings of several experts, for example by means of DSA, to achieve a higher accuracy.
References
Aban IB, Meerschaert MM, Panorska AK (2006) Parameter estimation for the truncated Pareto distribution. J Am Stat Assoc 101(473):270–277. doi:10.1198/016214505000000411. http://amstat.tandfonline.com/doi/abs/10.1198/016214505000000411
Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703. ISSN 0036-1445. doi:10.1137/070710111. http://dx.doi.org/10.1137/070710111
Jean-Antoine-Nicolas de Caritat Marquis de Condorcet (1785) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Series C (Appl Stat) 20–28
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B 39(1):1–38
Duda RO, Hart PE (1973) Pattern Classification and Scene Analysis. J Wiley
Grofman B, Owen G, Feld SL (1983) Thirteen theorems in search of the truth. Theory Decision 15(3):261–278. ISSN 0040-5833. doi:10.1007/BF00125672. http://dx.doi.org/10.1007/BF00125672
Ipeirotis PG, Provost F, Wang J (2010) Quality management on Amazon Mechanical Turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, New York, pp 64–67. ACM. http://doi.acm.org/10.1145/1837885.1837906
Kazai Gabriella, Kamps Jaap, Milic-Frayling Natasa (2013) An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Inf Retr 16(2):138–178
Kuncheva LI (2004) Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience. ISBN 0471210781
Kuncheva Ludmila I, Whitaker Christopher J, Shipp Catherine A (2003) Limits on the majority vote accuracy in classifier fusion. Pattern Anal Appl 6(1):22–31
Lam L, Suen CY (1997) Application of majority voting to pattern recognition: an analysis of its behavior and performance. Syst Man Cybern Part A Syst Humans IEEE Trans 27(5):553–568. ISSN 1083-4427. doi:10.1109/3468.618255
H Li, Yu B, Zhou D (2013) Error rate analysis of labeling by crowdsourcing. Machine Learning Meets Crowdsourcing. In ICML Workshop, USA
Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, New York. ACM, pp 2857–2866
Meka R, Jain P, Dhillon IS (2009) Matrix completion from power-law distributed samples. In: Advances in Neural Information Processing Systems, pp 1258–1266
Marvin L (1969) Minsky and Seymour Papert. An Introduction to Computational Geometry. The MIT Press, Perceptrons
Nitzan S, Paroush J (1982) Optimal decision rules in uncertain dichotomous choice situations. Int Econ Rev 23:289–297. http://www.jstor.org/stable/2526438
Raykar VC, Yu S (2012) Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J Mach Learn Res 13(2):491–518
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, Stroudsburg. Association for Computational Linguistics. pp 254–263. http://dl.acm.org/citation.cfm?id=1613715.1613751
Wang J, Ipeirotis PG, Provost F (2011) Managing crowdsourcing workers. In: The 2011 Winter Conference on Business Intelligence, pp 10–12
Wang J, Ipeirotis PG, Provost F (2013) Quality-based pricing for crowdsourced workers. NYU-CBA Working Paper CBA-13-06. http://hdl.handle.net/2451/31833
Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan JR (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. NIPS 2035–2043
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kühne, C., Böhm, K. Protecting the Dawid–Skene algorithm against low-competence raters and collusion attacks with gold-selection strategies. Soc. Netw. Anal. Min. 5, 67 (2015). https://doi.org/10.1007/s13278-015-0306-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-015-0306-9