ABSTRACT
Factors such as instructions, payment schemes, platform demographics, along with strategies for mapping studies into crowdsourcing environments, play an important role in the reproducibility of results. However, inferring these details from scientific articles is often a challenging endeavor, calling for the development of proper reporting guidelines. This paper makes the first steps towards this goal, by describing an initial taxonomy of relevant attributes for crowdsourcing experiments, and providing a glimpse into the state of reporting by analyzing a sample of CSCW papers.
- Kitchenham Barbara and Stuart Charters. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. 2 (01 2007).Google Scholar
- Natã Miccael Barbosa and Monchu Chen. 2019. Rehumanized Crowdsourcing: A Labeling Framework Addressing Bias and Ethics in Machine Learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, Scotland, UK, May 04-09, 2019. 543. https://doi.org/10.1145/3290605.3300773Google ScholarDigital Library
- Michael Buhrmester, Tracy Kwang, and Samuel D. Gosling. 2011. Amazon's Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data? Perspectives on Psychological Science 6, 1 (2011), 3--5. https://doi.org/10.1177/1745691610393980 arXiv:https://doi.org/10.1177/1745691610393980 PMID: 26162106.Google ScholarCross Ref
- Michael Buhrmester, Sanaz Talaifar, and Samuel Gosling. 2018. An Evaluation of Amazon's Mechanical Turk, Its Rapid Rise, and Its Effective Use. Perspectives on Psychological Science 13 (03 2018), 149--154. https://doi.org/10.1177/1745691617706516Google Scholar
- Jesse Chandler, Pam Mueller, and Gabriele Paolacci. 2013. Nonnaïveté Among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers. Behavior research methods 46 (07 2013). https://doi.org/10.3758/s13428-013-0365--7Google Scholar
- Florian Daniel, Pavel Kucherbaev, Cinzia Cappiello, Boualem Benatallah, and Mohammad Allahbakhsh. 2018. Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM Comput. Surv. 51, 1 (2018), 7:1--7:40. https://doi.org/10.1145/3148148Google ScholarDigital Library
- Ujwal Gadiraju, Alessandro Checco, Neha Gupta, and Gianluca Demartini. 2017. Modus operandi of crowd workers: The invisible role of microtask work environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017).Google ScholarDigital Library
- Ujwal Gadiraju, Sebastian Möller, Martin Nöllenburg, Dietmar Saupe, Sebastian Egger-Lampl, Daniel W. Archambault, and Brian Fisher. 2015. Crowdsourcing Versus the Laboratory: Towards Human-Centered Experiments Using the Crowd. In Evaluation in the Crowd. Crowdsourcing and Human-Centered Experiments - Dagstuhl Seminar 15481, Dagstuhl Castle, Germany, November 22--27, 2015, Revised Contributions. 6--26. https://doi.org/10.1007/978--3--319--66435--4_2Google Scholar
- Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. CoRR abs/1803.09010 (2018). arXiv:1803.09010 http://arxiv.org/abs/1803.09010Google Scholar
- Joseph Goodman and G. Paolacci. 2017. Crowdsourcing consumer research. Journal of Consumer Research 44 (06 2017), 196--210. https://doi.org/10.1093/jcr/ucx047Google Scholar
- Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, and Jennifer Wortman Vaughan. 2015. Incentivizing High Quality Crowdwork. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18--22, 2015. 419--429. https://doi.org/10.1145/2736277.2741102Google ScholarDigital Library
- Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing user studies with Mechanical Turk. In Proceedings of the 2008 Conference on Human Factors in Computing Systems, CHI 2008, 2008, Florence, Italy, April 5--10, 2008. 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarDigital Library
- Aniket Kittur, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth Gerber, Aaron D. Shaw, John Zimmerman, Matt Lease, and John J. Horton. 2013. The future of crowd work. In Computer Supported Cooperative Work, CSCW 2013, San Antonio, TX, USA, February 23--27, 2013. 1301--1318. https://doi.org/10.1145/2441776.2441923Google ScholarDigital Library
- Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, and Michael S Bernstein. 2016. Embracing error to enable rapid crowdsourcing. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 3167--3179.Google ScholarDigital Library
- Eddy Maddalena, Marco Basaldella, Dario De Nart, Dante Degl'Innocenti, Stefano Mizzaro, and Gianluca Demartini. 2016. Crowdsourcing relevance assessments: The unexpected benefits of limiting the time to judge. In Fourth AAAI Conference on Human Computation and Crowdsourcing.Google ScholarCross Ref
- Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods 44, 1 (01 Mar 2012), 1--23. https://doi.org/10.3758/s13428-011-0124--6Google Scholar
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29--31, 2019. 220--229. https://doi.org/10.1145/3287560.3287596Google ScholarDigital Library
- Nathaniel D. Porter, Ashton M. Verdery, and S. Michael Gaddis. 2020. Enhancing big data in the social sciences with crowdsourcing: Data augmentation practices, techniques, and opportunities. PLOS ONE 15, 6 (06 2020), 1--21. https://doi.org/10.1371/journal.pone.0233154Google Scholar
- Rehab Kamal Qarout, Alessandro Checco, Gianluca Demartini, and Kalina Bontcheva. 2019. Platform-Related Factors in Repeatability and Reproducibility of Crowdsourcing Tasks. In HCOMP 2019.Google Scholar
- Jorge Ramírez, Marcos Baez, Fabio Casati, and Boualem Benatallah. 2019. Crowdsourced dataset to study the generation and impact of text highlighting in classification tasks. BMC Research Notes 12, 1 (2019), 820. https://doi.org/10.1186/s13104-019--4858-zGoogle ScholarCross Ref
- Jorge Ramírez, Marcos Baez, Fabio Casati, and Boualem Benatallah. 2019. Understanding the Impact of Text Highlighting in Crowdsourcing Tasks. In Proceedings of the Seventh AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2019, Vol. 7. AAAI, 144--152.Google ScholarCross Ref
- Jakob Rogstadius, Vassilis Kostakos, Aniket Kittur, Boris Smus, Jim Laredo, and Maja Vukovic. 2011. An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets. In Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17--21, 2011. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2778Google Scholar
- Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl. 2014. Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26--31, 2014. 859--866. http://www.lrec-conf.org/proceedings/lrec2014/summaries/497.htmlGoogle Scholar
- Harini Alagarai Sampath, Rajeev Rajeshuni, and Bipin Indurkhya. 2014. Cognitively inspired task design to improve user performance on crowdsourcing platforms. In CHI Conference on Human Factors in Computing Systems, CHI'14, Toronto, ON, Canada - April 26 - May 01, 2014. 3665--3674. https://doi.org/10.1145/2556288.2557155Google ScholarDigital Library
- Benjamin Saunders, Julius Sim, Tom Kingstone, Shula Baker, Jackie Waterfield, Bernadette Bartlam, Heather Burroughs, and Clare Jinks. 2018. Saturation in Qualitative Research: Exploring its Conceptualization and Operationalization. Quality & quantity 52, 4 (2018), 1893--1907.Google Scholar
- Larissa Shamseer, David Moher, Mike Clarke, Davina Ghersi, Alessandro Liberati, Mark Petticrew, Paul Shekelle, and Lesley A Stewart. 2015. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 349 (2015). https://doi.org/10.1136/bmj.g7647 arXiv: https://www.bmj.com/content/349/bmj.g7647.full.pdfGoogle Scholar
- Aaron D. Shaw, John J. Horton, and Daniel L. Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work, CSCW 2011, Hangzhou, China, March 19--23, 2011. 275--284. https://doi.org/10.1145/1958824.1958865Google ScholarDigital Library
- Meng-Han Wu and Alexander J. Quinn. 2017. Confusing the Crowd: Task Instruction Quality on Amazon Mechanical Turk. In HCOMP 2017. https://aaai.org/ocs/index.php/HCOMP/HCOMP17/paper/view/15943Google Scholar
Index Terms
- DREC: towards a Datasheet for Reporting Experiments in Crowdsourcing
Recommendations
On the State of Reporting in Crowdsourcing Experiments and a Checklist to Aid Current Practices
CSCW2Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research ...
Collaboration and Locality in Crowdsourcing
INCOS '15: Proceedings of the 2015 International Conference on Intelligent Networking and Collaborative SystemsAs novel forms of crowdsourcing emerge on the market, we emphasize that the important aspect of location-dependency is more complex than assumed and, thus, suggest a typology along two dimensions of locality: the first dimension refers to whether or not ...
A taxonomy of microtasks on the web
HT '14: Proceedings of the 25th ACM conference on Hypertext and social mediaNowadays, a substantial number of people are turning to crowdsourcing, in order to solve tasks that require human intervention. Despite a considerable amount of research done in the field of crowdsourcing, existing works fall short when it comes to ...
Comments