ABSTRACT
Evaluating the quality of workers is very important in crowdsourcing system and impactful methods are required in order to obtain the most appropriate quality. Previous work have introduced confidence intervals to estimate the quality of workers. However, we have found the size of the confidence interval is wide through analysis of experimental results, which leads to inaccurate worker error rates. In this paper, we propose an optimized algorithm of confidence interval to reduce the size of the confidence interval as narrow as possible and to estimate the quality of workers more precise. We verify our algorithm using the simulated data from our own crowdsourcing platform under realistic settings.
- Guoliang Li, Jianan Wang, Yudian Zheng, and Michael Franklin. Crowdsourced data management: A survey. IEEE Transactions on Knowledge and Data Engineering, 28(9):2296--2319, 2016. Google ScholarDigital Library
- Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, and Meihui Zhang. Cdas: a crowdsourcing data analytics system. Proceedings of the Vldb Endowment, 5(10):1040--1051, 2012. Google ScholarDigital Library
- Xuan Liu, Meiyu Lu, Beng Chin Ooi, Yanyan Shen, Sai Wu, and Meihui Zhang. Cdas: a crowdsourcing data analytics system. Proceedings of the Vldb Endowment, 5(10):1040--1051, 2012. Google ScholarDigital Library
- Yudian Zheng, Reynold Cheng, Silviu Maniu, and Luyi Mo. On optimality of jury selection in crowdsourcing. Intl.conf.on Extending Database Technology Brussels Belgium, 2015.Google Scholar
- Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Anna Jerebko, Charles Florin, Gerardo Hermosillo Valadez, Luca Bogoni, and Linda Moy. Supervised learning from multiple experts:whom to trust when everyone lies a bit. pages 889--896, 2009. Google ScholarDigital Library
- Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. Community-based bayesian aggregation models for crowdsourcing. pages 155--164, 2014. Google ScholarDigital Library
- Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. pages 2035--2043, 2009. Google ScholarDigital Library
- Manas Joglekar, Hector Garciamolina, and Aditya Parameswaran. Comprehensive and reliable crowd assessment algorithms. pages 195--206, 2014.Google Scholar
- Stephen Guo, Aditya Parameswaran, and Hector Garcia-Molina. So who won?: dynamic max discovery with the crowd. pages 385--396, 2012. Google ScholarDigital Library
- Caleb Chen Cao, Jieying She, Yongxin Tong, and Lei Chen. Whom to ask?: jury selection for decision making tasks on micro-blog services. Proceedings of the Vldb Endowment, 5(11):1495--1506, 2012. Google ScholarDigital Library
- Leyla Kazemi, Cyrus Shahabi, and Lei Chen. Geotrucrowd:trustworthy query answering with spatial crowdsourcing. pages 314--323, 2013. Google ScholarDigital Library
- A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society, 28(1):20--28, 1979.Google Scholar
- Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. Quality management on amazon mechanical turk. pages 64--67, 2010. Google ScholarDigital Library
- Larry Wasserman. All of statistics. Springer Texts in Statistics, 97(1-3):xx,442, 2004. Google ScholarDigital Library
- George Casella and Roger L. Berger. Statistical inference. Technometrics, 33(4):xii,328, 2001.Google Scholar
- Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. Human-powered sorts and joins. Proceedings of the Vldb Endowment, 5(1):13--24, 2011. Google ScholarDigital Library
- Geoffrey J. Mclachlan and Thriyambakam Krishnan. The em algorithm and extensions (wiley series in probability and statistics). Journal of Classification, 15(1):154--156, 2008.Google Scholar
- Aditya G Parameswaran, Hector Garciamolina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. Crowd-screen: algorithms for filtering data with humans. Sigmod, pages 361--372, 2012. Google ScholarDigital Library
- Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Neoklis Polyzotis, and Jennifer Widom. Human-assisted graph search: it's okay to ask questions. Proceedings of the Vldb Endowment, 4(5):267--278, 2011. Google ScholarDigital Library
- Vikas C Raykar and Shipeng Yu. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 13(1):491--518, 2012. Google ScholarDigital Library
- Flvio Ribeiro, Dinei Florencio, and Vtor Nascimento. Crowdsourcing subjective image quality evaluation. pages 3097--3100, 2011.Google Scholar
- Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. pages 614--622, 2008. Google ScholarDigital Library
- Yuandong Tian and Jun Zhu. Learning from crowds in the presence of schools of thought. pages 226--234, 2012. Google ScholarDigital Library
- Manas Joglekar, Hector Garciamolina, and Aditya Parameswaran. Evaluating the crowd with confidence. Computer Science, pages 686--694, 2013. Google ScholarDigital Library
- Edwin B. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158):209--212, 1927.Google ScholarCross Ref
Index Terms
- Empirical Study on Assessment Algorithms with Confidence in Crowdsourcing
Recommendations
Software crowdsourcing reliability: an empirical study on developers behavior
SWAN 2016: Proceedings of the 2nd International Workshop on Software AnalyticsCrowdsourcing has become an emergent paradigm for software production in recent decades. Its open-call format attracts the participation of hundreds of thousands of developers. To ensure the success of software crowdsourcing, we must accurately measure ...
Assessing internet video quality using crowdsourcing
CrowdMM '13: Proceedings of the 2nd ACM international workshop on Crowdsourcing for multimediaIn this paper, we present a subjective video quality evaluation system that has been integrated with different crowdsourcing platforms. We try to evaluate the feasibility of replacing the time consuming and expensive traditional tests with a faster and ...
Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?
SocialSens'17: Proceedings of the 2nd International Workshop on Social SensingWe explore the design of an effective crowdsourcing system for an M-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final classification decision. We consider the scenario where the ...
Comments