ABSTRACT
Crowdsourcing provides a practical approach for obtaining annotated data to train supervised learning models. However, since the crowd annotators may have different expertise domain and cannot always guarantee the high-quality annotations, learning from crowds generally suffers from the problem of unreliable results of introducing some noises, which makes it hard to achieve satisfying performance. In this work, we investigate the reliability of annotations to improve learning from crowds. Specifically, we first project annotator and data instance to factor vectors and model the complex interaction between annotator expertise and instance difficulty to predict annotation reliability. The learned reliability can be used to evaluate the quality of crowdsourced data directly. Then, we construct a new annotation, namely soft annotation, which serves as the gold label during the training. To recognize the different strengths of annotators, we model each annotator's confusion in an end-to-end manner. Extensive experimental results on three real-world datasets demonstrate the effectiveness of our method.
Supplemental Material
- Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images. IEEE Transactions on Medical Imaging, Vol. 35, 5 (February 2016), 1313--1321.Google ScholarCross Ref
- Yoram Bachrach, Thore Graepel, Thomas P. Minka, and Jo W Guiver. 2012. How to Grade a Test without Knowing the Answers: A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing. In Proceedings of the 29th International Coference on International Conference on Machine Learning. Omnipress, Madison, WI, USA, 819--826.Google Scholar
- Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google Scholar
- Pengpeng Chen, Hailong Sun, Yongqiang Yang, and Zhijun Chen. 2022. Adversarial Learning from Crowds. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 5 (June 2022), 5304--5312.Google ScholarCross Ref
- Zhijun Chen, Huimin Wang, Hailong Sun, Pengpeng Chen, Tao Han, Xudong Liu, and Jie Yang. 2021. Structured Probabilistic End-to-End Learning from Crowds. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 1512--1518.Google Scholar
- Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from Crowds by Modeling Common Confusions. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 7 (May 2021), 5832--5840.Google ScholarCross Ref
- Zhendong Chu and Hongning Wang. 2021. Improve Learning from Crowds via Generative Augmentation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 167--175.Google ScholarDigital Library
- A. Philip Dawid and Allan Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 28, 1 (March 1979), 20--28.Google Scholar
- Jimmy de la Torre. 2008. An Empirically Based Method of Q?Matrix Validation for the DINA Model: Development and Applications. Journal of Educational Measurement, Vol. 45 (December 2008), 343--362.Google ScholarCross Ref
- Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, 469--478.Google ScholarDigital Library
- Melody Y. Guan, Varun Gulshan, Andrew M. Dai, and Geoffrey E. Hinton. 2018. Who Said What: Modeling Individual Labelers Improves Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 1 (April 2018).Google Scholar
- Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. DisenQNet: Disentangled Representation Learning for Educational Questions. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021).Google ScholarDigital Library
- Shahana Ibrahim, Tri Nguyen, and Xiao Fu. 2023. Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization. In The Eleventh International Conference on Learning Representations.Google Scholar
- Ashish Khetan, Zachary Chase Lipton, and Anima Anandkumar. 2017. Learning From Noisy Singly-labeled Data. ArXiv, Vol. abs/1712.04577 (2017).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (December 2014).Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).Google Scholar
- Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, T. Elsayed, and Matthew Lease. 2018. Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018).Google ScholarDigital Library
- Hongwei Li and Bin Yu. 2014. Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing. arXiv preprint arXiv:1412.4086 (November 2014).Google Scholar
- Jiyi Li. 2020. Crowdsourced Text Sequence Aggregation Based on Hybrid Reliability and Representation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China). Association for Computing Machinery, New York, NY, USA, 1761--1764.Google ScholarDigital Library
- Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, Vol. 8, 4 (December 2014), 425--436.Google ScholarDigital Library
- Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A Survey on Truth Discovery. SIGKDD Explor. Newsl., Vol. 17, 2 (February 2016), 1--16.Google ScholarDigital Library
- Jiayu Liu, Zhenya Huang, Chengxiang Zhai, and Qi Liu. 2023. Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning. arXiv preprint arXiv:2302.05717 (2023).Google Scholar
- Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, and Olga Russakovsky. 2019. Human Uncertainty Makes Classification More Robust. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9616--9625.Google Scholar
- Filipe Rodrigues and Francisco Câmara Pereira. 2018. Deep Learning from Crowds. In Proceedings of the AAAI conference on artificial intelligence.Google ScholarCross Ref
- Filipe Rodrigues, Francisco Camara Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In Proceedings of the 31st International Conference on Machine Learning. PMLR, Bejing, China, 433--441.Google Scholar
- Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Can The Crowd Identify Misinformation Objectively?: The Effects of Judgment Scale and Assessor's Background. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (May 2020).Google ScholarDigital Library
- Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision, Vol. 77, 1 (May 2008), 157--173.Google ScholarDigital Library
- Nasim Sabetpour, Adithya Kulkarni, Sihong Xie, and Qi Li. 2021. Truth Discovery in Sequence Labels from Crowds. In 2021 IEEE International Conference on Data Mining (ICDM). 539--548.Google ScholarCross Ref
- Rion Snow, Brendan T. O'Connor, Dan Jurafsky, and A. Ng. 2008. Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA, 254--263.Google Scholar
- Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, and Nathan Silberman. 2019. Learning From Noisy Labels by Regularized Estimation of Annotator Confusion. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019), 11236--11245.Google ScholarCross Ref
- Matteo Venanzi, Jo W Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based bayesian aggregation models for crowdsourcing. Proceedings of the 23rd international conference on World wide web (2014).Google ScholarDigital Library
- Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yuying Chen, Yu Yin, Zai Huang, and Shijin Wang. 2020. Neural Cognitive Diagnosis for Intelligent Education Systems. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 6153--6161.Google ScholarCross Ref
- Hongxin Wei, Renchunzi Xie, Lei Feng, Bo Han, and Bo An. 2022. Deep Learning From Multiple Noisy Annotators as A Union. IEEE transactions on neural networks and learning systems, Vol. PP (2022).Google Scholar
- Peter Welinder, Steve Branson, Serge J. Belongie, and Pietro Perona. 2010. The Multidimensional Wisdom of Crowds. In Advances in Neural Information Processing Systems. Curran Associates, Inc.Google Scholar
- Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 2035--2043.Google Scholar
- Yan Yan, Rómer Rosales, Glenn Fung, Subramanian Ramanathan, and Jennifer G. Dy. 2014. Learning from multiple annotators with varying expertise. Machine Learning, Vol. 95 (October 2014), 291--327.Google Scholar
- C. Zhang, Lei Chen, H. V. Jagadish, Mengchen Zhang, and Yongxin Tong. 2018. Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates. IEEE Transactions on Knowledge and Data Engineering, Vol. 32 (2018), 135--151.Google ScholarCross Ref
- Kun Zhang, Le Wu, Guangyi Lv, Meng Wang, Enhong Chen, and Shulan Ruan. 2021. Making the Relation Matters: Relation of Relation Learning Network for Sentence Semantic Matching. In AAAI Conference on Artificial Intelligence.Google Scholar
- Hongke Zhao, Chuang Zhao, Xi Zhang, Nanlin Liu, Hengshu Zhu, Qi Liu, and Hui Xiong. 2023. An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems. INFORMS Journal on Computing (2023).Google Scholar
- Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endow., Vol. 10, 5 (January 2017), 541--552.Google ScholarDigital Library
Index Terms
- Learning from Crowds with Annotation Reliability
Recommendations
Collective annotation patterns in learning from crowds
The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. ...
Improve Learning from Crowds via Generative Augmentation
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningCrowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity ...
Crowdsourcing for web genre annotation
Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are ...
Comments