Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

Han, Tao; Sun, Hailong; Song, Yangqiu; Fang, Yili; Liu, Xudong

doi:10.1007/s11704-020-9364-x

Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing

Research Article
Published: 02 October 2020

Volume 15, article number 154315, (2021)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Tao Han^1,2,
Hailong Sun^1,2,
Yangqiu Song³,
Yili Fang⁴ &
…
Xudong Liu^1,2

130 Accesses
10 Citations
Explore all metrics

Abstract

Crowdsourcing has been a helpful mechanism to leverage human intelligence to acquire useful knowledge. However, when we aggregate the crowd knowledge based on the currently developed voting algorithms, it often results in common knowledge that may not be expected. In this paper, we consider the problem of collecting specific knowledge via crowdsourcing. With the help of using external knowledge base such as WordNet, we incorporate the semantic relations between the alternative answers into a probabilistic model to determine which answer is more specific. We formulate the probabilistic model considering both worker’s ability and task’s difficulty from the basic assumption, and solve it by the expectation-maximization (EM) algorithm. To increase algorithm compatibility, we also refine our method into semi-supervised one. Experimental results show that our approach is robust with hyper-parameters and achieves better improvement than majority voting and other algorithms when more specific answers are expected, especially for sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Evidential Semi-supervised Label Aggregation Approach

Research on Result Integration Mechanism Based on Crowd Wisdom to Achieve the Correlation of Resources and Knowledge Points

An Algorithm of Crowdsourcing Answer Integration Based on Specialty Categories of Workers

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4
Google Scholar
Wang J, Li G, Kraska T, Franklin M J, Feng J. Leveraging transitive relations for crowdsourced joins. In: Proceedings of ACM Conference on Management of Data. 2013, 229–240
Russell B C, Torralba A, Murphy K P, Freeman W T. Labelme: a database and Web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1–3): 157–173
Article Google Scholar
Hwang K, Lee S Y. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics, 2012, 58(2): 700–705
Article Google Scholar
Vondrick C, Patterson D, Ramanan D. Efficiently scaling up crowdsourced video annotation. International Journal of Computer Vision, 2013, 101(1): 184–204
Article Google Scholar
Waggoner B, Chen Y. Output agreement mechanisms and common knowledge. In: Proceedings of the 2nd AAAI Conference on Human Computation and Crowdsourcing. 2014
Ordonez V, Deng J, Choi Y, Berg A C, Berg T. From large scale image categorization to entry-level categories. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2768–2775
Feng S, Ravi S, Kumar R, Kuznetsova P, Liu W, Berg A C, Berg T L, Choi Y. Refer-to-as relations as semantic knowledge. In: Proceedings of International Conference on Automated Planning and Scheduling. 2015
Dawid A P, Skene A M. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 1979, 28(1): 20–28
Article Google Scholar
Whitehill J, Wu T F, Bergsma J, Movellan J R, Ruvolo P L. Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2009, 2035–2043
Salek M, Bachrach Y, Key P. Hotspotting-a probabilistic graphical model for image object localization through crowdsourcing. In: Proceedings of International Conference on Automated Planning and Scheduling. 2013
Bachrach Y, Minka T, Guiver J, Graepel T. How to grade a test without knowing the answers—a bayesian graphical model for adaptive crowdsourcing and aptitude testing. In: Proceedings of the 29th International Conference on Machine Learning. 2012, 819–826
Raykar V C, Yu S, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010, 11(43): 1297–1322
MathSciNet Google Scholar
Demartini G, Difallah D E, Cudré-Mauroux P. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469–478
Zhou D, Basu S, Mao Y, Platt J C. Learning from the wisdom of crowds by minimax entropy. In: Proceedings of Annual Conference on Neural Information Processing Systems. 2012, 2195–2203
Han T, Sun H, Song Y, Fang Y, Liu X. Incorporating external knowledge into crowd intelligence for more specific knowledge acquisition. In: Proceedings of International Joint Conference on Artificial Intelligence. 2016, 1541–1547
Chilton L B, Little G, Edge D, Weld D S, Landay J A. Cascade: crowdsourcing taxonomy creation. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems. 2013, 1999–2008
Bragg J, Weld D S. Crowdsourcing multi-label classification for taxonomy creation. In: Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing. 2013
Sun Y, Singla A, Fox D, Krause A. Building hierarchies of concepts via crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2015, 844–851
Fellbaum C. WordNet: An Electronic Lexical Database. MIT Press, 1998
Lenat D B, Guha R V. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1989
Speer R, Havasi C. Representing general relational knowledge in conceptnet 5. In: Proceedings of Language Resources and Evaluation Conference. 2012, 3679–3686
Wu W, Li H, Wang H, Zhu K Q. Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM Conference on Management of Data. 2012, 481–492
Prelec D, Seung H S, McCoy J. A solution to the single-question crowd wisdom problem. Nature, 2017, 541(7638): 532–535
Article Google Scholar
Divvala S K, Farhadi A, Guestrin C. Learning everything about anything: webly-supervised visual concept learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 3270–3277
Sheng V S, Provost F, Ipeirotis P G. Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614–622
Ipeirotis P G, Provost F, Wang J. Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. 2010, 64–67
Han T, Sun H, Song Y, Wang Z, Liu X. Budgeted task scheduling for crowdsourced knowledge acquisition. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2017, 1059–1068
Callison-Burch C. Fast, cheap, and creative: evaluating translation quality using amazon’s mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. 2009, 286–295
Hu C, Bederson B B, Resnik P. Translation by iterative collaboration between monolingual users. In: Proceedings of Graphics Interface 2010. 2010, 39–46
Ambati V, Vogel S, Carbonell J. Active learning and crowd-sourcing for machine translation. In: Proceedings of the 7th International Conference on Language Resources and Evaluation. 2010
Dong X L, Gabrilovich E, Heitz G, Horn W, Murphy K, Sun S, Zhang W. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892
Article Google Scholar
Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J. Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, 745–754
Fang Y, Sun H, Chen P, Huai J. On the cost complexity of crowdsourcing. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 1531–1537
Luengo-Oroz M A, Arranz A, Frean J. Crowdsourcing malaria parasite quantification: an online game for analyzing images of infected thick blood smears. Journal of Medical Internet Research, 2012, 14(6): e167
Article Google Scholar
Kalman R E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35–45
Article MathSciNet Google Scholar
Sun H, Hu K, Fang Y, Song Y. Adaptive result inference for collecting quantitative data with crowdsourcing. IEEE Internet of Things Journal, 2017, 4(5): 1389–1398
Article Google Scholar
Dai P, Lin C H, Weld D S. Pomdp-based control of workflows for crowdsourcing. Artificial Intelligence, 2013, 202: 52–85
Article MathSciNet Google Scholar
Dai P, Weld D S. Artificial intelligence for artificial artificial intelligence. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011
Fang Y, Sun H, Li G, Zhang R, Huai J. Context-aware result inference in crowdsourcing. Information Sciences, 2018, 460: 346–363
Article Google Scholar
Otani N, Baba Y, Kashima H. Quality control of crowdsourced classification using hierarchical class structures. Expert Systems with Applications, 2016, 58: 155–163
Article Google Scholar
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255

Download references

Acknowledgements

This work was supported partly by National Key Research and Development Program of China (2019YFB1705902), partly by the National Natural Science Foundation of China (Grant Nos. 61932007, 61972013, 61976187, 61421003). We thank Prof. Jinpeng Huai for his valuable support and contributions to this work. The authors would thank the anonymous reviewers for the helpful comments and suggestions to improve this paper.

Author information

Authors and Affiliations

SKLSDE Lab, School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Tao Han, Hailong Sun & Xudong Liu
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China
Tao Han, Hailong Sun & Xudong Liu
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clearwater Bay, Hong Kong, 999077, China
Yangqiu Song
School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, 310018, China
Yili Fang

Authors

Tao Han
View author publications
Search author on:PubMed Google Scholar
Hailong Sun
View author publications
Search author on:PubMed Google Scholar
Yangqiu Song
View author publications
Search author on:PubMed Google Scholar
Yili Fang
View author publications
Search author on:PubMed Google Scholar
Xudong Liu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Hailong Sun.

Additional information

Tao Han received the BS degree in the School of Mathematics and System Science, Beihang University, China in 2014. He is currently a PhD candidate in the School of Computer Science and Engineering, Beihang University, China. His research interests mainly include machine learning and human computation/crowdsourcing.

Hailong Sun received the BS degree in computer science from Beijing Jiaotong University, China in 2001. He received the PhD degree in computer software and theory from Beihang University, China in 2008. He is an associate professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include crowdsourcing, software analytics, and distributed systems. He is a member of the Yangqiu Song received the BE and PhD degree from Tsinghua University, China in July 2003 and January 2009. He is now an assistant professor at the Department of CSE with a joint appointment at the Math Department at HKUST, China, associate director of WeChat-HKUST Joint Lab on Artificial Intelligence Technology WHATLab and HKUST-WeBank Joint Lab. His research interests mainly include machine learning, data mining, natural language processing, knowledge graph, information networks.

Yili Fang is currently an assistant professor in the school of computer and information engineering at the University of Zhejiang Gongshang, China. Yili completed his PhD at Beihang University, China. His research interests mainly include crowd computing/crowdsourcing, social computing and decision science.

Xudong Liu received the PhD degree in computer application technology from Beihang University, China. He is a professor and doctoral supervisor at Beihang University, China. His research interests mainly include middle-ware technology and applications, service-oriented computing, trusted network computing, and network software development.

Electronic supplementary material