short-paper

Learning from Crowds with Annotation Reliability

Authors:
Zhi Cao

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

0000-0002-2767-4588
View Profile

,
Enhong Chen

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

0000-0002-4835-4102
View Profile

,
Ye Huang

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

0000-0002-7607-9443
View Profile

,
Shuanghong Shen

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

0000-0003-3905-9352
View Profile

,
Zhenya Huang

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, Hefei, China

0000-0003-1661-0420
View Profile

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2023Pages 2103–2107https://doi.org/10.1145/3539618.3592007

Published:18 July 2023Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2103–2107

ABSTRACT

Crowdsourcing provides a practical approach for obtaining annotated data to train supervised learning models. However, since the crowd annotators may have different expertise domain and cannot always guarantee the high-quality annotations, learning from crowds generally suffers from the problem of unreliable results of introducing some noises, which makes it hard to achieve satisfying performance. In this work, we investigate the reliability of annotations to improve learning from crowds. Specifically, we first project annotator and data instance to factor vectors and model the complex interaction between annotator expertise and instance difficulty to predict annotation reliability. The learned reliability can be used to evaluate the quality of crowdsourced data directly. Then, we construct a new annotation, namely soft annotation, which serves as the gold label during the training. To recognize the different strengths of annotators, we model each annotator's confusion in an end-to-end manner. Extensive experimental results on three real-world datasets demonstrate the effectiveness of our method.

Supplemental Material

SIGIR23-srp4189.mp4

mp4

16.8 MB

Download

References

Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images. IEEE Transactions on Medical Imaging, Vol. 35, 5 (February 2016), 1313--1321.Google ScholarCross Ref
Yoram Bachrach, Thore Graepel, Thomas P. Minka, and Jo W Guiver. 2012. How to Grade a Test without Knowing the Answers: A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing. In Proceedings of the 29th International Coference on International Conference on Machine Learning. Omnipress, Madison, WI, USA, 819--826.Google Scholar
Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google Scholar
Pengpeng Chen, Hailong Sun, Yongqiang Yang, and Zhijun Chen. 2022. Adversarial Learning from Crowds. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 5 (June 2022), 5304--5312.Google ScholarCross Ref
Zhijun Chen, Huimin Wang, Hailong Sun, Pengpeng Chen, Tao Han, Xudong Liu, and Jie Yang. 2021. Structured Probabilistic End-to-End Learning from Crowds. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 1512--1518.Google Scholar
Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from Crowds by Modeling Common Confusions. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 7 (May 2021), 5832--5840.Google ScholarCross Ref
Zhendong Chu and Hongning Wang. 2021. Improve Learning from Crowds via Generative Augmentation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 167--175.Google ScholarDigital Library
A. Philip Dawid and Allan Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 28, 1 (March 1979), 20--28.Google Scholar
Jimmy de la Torre. 2008. An Empirically Based Method of Q?Matrix Validation for the DINA Model: Development and Applications. Journal of Educational Measurement, Vol. 45 (December 2008), 343--362.Google ScholarCross Ref
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, 469--478.Google ScholarDigital Library
Melody Y. Guan, Varun Gulshan, Andrew M. Dai, and Geoffrey E. Hinton. 2018. Who Said What: Modeling Individual Labelers Improves Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 1 (April 2018).Google Scholar
Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. DisenQNet: Disentangled Representation Learning for Educational Questions. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021).Google ScholarDigital Library
Shahana Ibrahim, Tri Nguyen, and Xiao Fu. 2023. Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization. In The Eleventh International Conference on Learning Representations.Google Scholar
Ashish Khetan, Zachary Chase Lipton, and Anima Anandkumar. 2017. Learning From Noisy Singly-labeled Data. ArXiv, Vol. abs/1712.04577 (2017).Google Scholar
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (December 2014).Google Scholar
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).Google Scholar
Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, T. Elsayed, and Matthew Lease. 2018. Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018).Google ScholarDigital Library
Hongwei Li and Bin Yu. 2014. Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing. arXiv preprint arXiv:1412.4086 (November 2014).Google Scholar
Jiyi Li. 2020. Crowdsourced Text Sequence Aggregation Based on Hybrid Reliability and Representation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China). Association for Computing Machinery, New York, NY, USA, 1761--1764.Google ScholarDigital Library
Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, Vol. 8, 4 (December 2014), 425--436.Google ScholarDigital Library
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A Survey on Truth Discovery. SIGKDD Explor. Newsl., Vol. 17, 2 (February 2016), 1--16.Google ScholarDigital Library
Jiayu Liu, Zhenya Huang, Chengxiang Zhai, and Qi Liu. 2023. Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning. arXiv preprint arXiv:2302.05717 (2023).Google Scholar
Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, and Olga Russakovsky. 2019. Human Uncertainty Makes Classification More Robust. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9616--9625.Google Scholar
Filipe Rodrigues and Francisco Câmara Pereira. 2018. Deep Learning from Crowds. In Proceedings of the AAAI conference on artificial intelligence.Google ScholarCross Ref
Filipe Rodrigues, Francisco Camara Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In Proceedings of the 31st International Conference on Machine Learning. PMLR, Bejing, China, 433--441.Google Scholar
Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Can The Crowd Identify Misinformation Objectively?: The Effects of Judgment Scale and Assessor's Background. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (May 2020).Google ScholarDigital Library
Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision, Vol. 77, 1 (May 2008), 157--173.Google ScholarDigital Library
Nasim Sabetpour, Adithya Kulkarni, Sihong Xie, and Qi Li. 2021. Truth Discovery in Sequence Labels from Crowds. In 2021 IEEE International Conference on Data Mining (ICDM). 539--548.Google ScholarCross Ref
Rion Snow, Brendan T. O'Connor, Dan Jurafsky, and A. Ng. 2008. Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA, 254--263.Google Scholar
Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, and Nathan Silberman. 2019. Learning From Noisy Labels by Regularized Estimation of Annotator Confusion. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019), 11236--11245.Google ScholarCross Ref
Matteo Venanzi, Jo W Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based bayesian aggregation models for crowdsourcing. Proceedings of the 23rd international conference on World wide web (2014).Google ScholarDigital Library
Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yuying Chen, Yu Yin, Zai Huang, and Shijin Wang. 2020. Neural Cognitive Diagnosis for Intelligent Education Systems. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 6153--6161.Google ScholarCross Ref
Hongxin Wei, Renchunzi Xie, Lei Feng, Bo Han, and Bo An. 2022. Deep Learning From Multiple Noisy Annotators as A Union. IEEE transactions on neural networks and learning systems, Vol. PP (2022).Google Scholar
Peter Welinder, Steve Branson, Serge J. Belongie, and Pietro Perona. 2010. The Multidimensional Wisdom of Crowds. In Advances in Neural Information Processing Systems. Curran Associates, Inc.Google Scholar
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 2035--2043.Google Scholar
Yan Yan, Rómer Rosales, Glenn Fung, Subramanian Ramanathan, and Jennifer G. Dy. 2014. Learning from multiple annotators with varying expertise. Machine Learning, Vol. 95 (October 2014), 291--327.Google Scholar
C. Zhang, Lei Chen, H. V. Jagadish, Mengchen Zhang, and Yongxin Tong. 2018. Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates. IEEE Transactions on Knowledge and Data Engineering, Vol. 32 (2018), 135--151.Google ScholarCross Ref
Kun Zhang, Le Wu, Guangyi Lv, Meng Wang, Enhong Chen, and Shulan Ruan. 2021. Making the Relation Matters: Relation of Relation Learning Network for Sentence Semantic Matching. In AAAI Conference on Artificial Intelligence.Google Scholar
Hongke Zhao, Chuang Zhao, Xi Zhang, Nanlin Liu, Hengshu Zhu, Qi Liu, and Hui Xiong. 2023. An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems. INFORMS Journal on Computing (2023).Google Scholar
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endow., Vol. 10, 5 (January 2017), 541--552.Google ScholarDigital Library

Index Terms

Learning from Crowds with Annotation Reliability
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Collective annotation patterns in learning from crowds

The lack of annotated data is one of the major barriers facing machine learning applications today. Learning from crowds, i.e. collecting ground-truth data from multiple inexpensive annotators, has become a common method to cope with this issue. ...
Read More
Improve Learning from Crowds via Generative Augmentation
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity ...
Read More
Crowdsourcing for web genre annotation

Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
annotation reliability
crowdsourcing
neural networks
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 189
  Total Downloads
- Downloads (Last 12 months)189
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning from Crowds with Annotation Reliability

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Collective annotation patterns in learning from crowds

Improve Learning from Crowds via Generative Augmentation

Crowdsourcing for web genre annotation