skip to main content
10.1145/3539618.3592007acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Learning from Crowds with Annotation Reliability

Authors Info & Claims
Published:18 July 2023Publication History

ABSTRACT

Crowdsourcing provides a practical approach for obtaining annotated data to train supervised learning models. However, since the crowd annotators may have different expertise domain and cannot always guarantee the high-quality annotations, learning from crowds generally suffers from the problem of unreliable results of introducing some noises, which makes it hard to achieve satisfying performance. In this work, we investigate the reliability of annotations to improve learning from crowds. Specifically, we first project annotator and data instance to factor vectors and model the complex interaction between annotator expertise and instance difficulty to predict annotation reliability. The learned reliability can be used to evaluate the quality of crowdsourced data directly. Then, we construct a new annotation, namely soft annotation, which serves as the gold label during the training. To recognize the different strengths of annotators, we model each annotator's confusion in an end-to-end manner. Extensive experimental results on three real-world datasets demonstrate the effectiveness of our method.

Skip Supplemental Material Section

Supplemental Material

SIGIR23-srp4189.mp4

mp4

16.8 MB

References

  1. Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images. IEEE Transactions on Medical Imaging, Vol. 35, 5 (February 2016), 1313--1321.Google ScholarGoogle ScholarCross RefCross Ref
  2. Yoram Bachrach, Thore Graepel, Thomas P. Minka, and Jo W Guiver. 2012. How to Grade a Test without Knowing the Answers: A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing. In Proceedings of the 29th International Coference on International Conference on Machine Learning. Omnipress, Madison, WI, USA, 819--826.Google ScholarGoogle Scholar
  3. Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  4. Pengpeng Chen, Hailong Sun, Yongqiang Yang, and Zhijun Chen. 2022. Adversarial Learning from Crowds. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36, 5 (June 2022), 5304--5312.Google ScholarGoogle ScholarCross RefCross Ref
  5. Zhijun Chen, Huimin Wang, Hailong Sun, Pengpeng Chen, Tao Han, Xudong Liu, and Jie Yang. 2021. Structured Probabilistic End-to-End Learning from Crowds. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 1512--1518.Google ScholarGoogle Scholar
  6. Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from Crowds by Modeling Common Confusions. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 7 (May 2021), 5832--5840.Google ScholarGoogle ScholarCross RefCross Ref
  7. Zhendong Chu and Hongning Wang. 2021. Improve Learning from Crowds via Generative Augmentation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 167--175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Philip Dawid and Allan Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 28, 1 (March 1979), 20--28.Google ScholarGoogle Scholar
  9. Jimmy de la Torre. 2008. An Empirically Based Method of Q?Matrix Validation for the DINA Model: Development and Applications. Journal of Educational Measurement, Vol. 45 (December 2008), 343--362.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, 469--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Melody Y. Guan, Varun Gulshan, Andrew M. Dai, and Geoffrey E. Hinton. 2018. Who Said What: Modeling Individual Labelers Improves Classification. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 1 (April 2018).Google ScholarGoogle Scholar
  12. Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen, Jianhui Ma, Yu Su, and Wei Tong. 2021. DisenQNet: Disentangled Representation Learning for Educational Questions. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shahana Ibrahim, Tri Nguyen, and Xiao Fu. 2023. Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization. In The Eleventh International Conference on Learning Representations.Google ScholarGoogle Scholar
  14. Ashish Khetan, Zachary Chase Lipton, and Anima Anandkumar. 2017. Learning From Noisy Singly-labeled Data. ArXiv, Vol. abs/1712.04577 (2017).Google ScholarGoogle Scholar
  15. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (December 2014).Google ScholarGoogle Scholar
  16. Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Handbook of Systemic Autoimmune Diseases, Vol. 1, 4 (2009).Google ScholarGoogle Scholar
  17. Mucahid Kutlu, Tyler McDonnell, Yassmine Barkallah, T. Elsayed, and Matthew Lease. 2018. Crowd vs. Expert: What Can Relevance Judgment Rationales Teach Us About Assessor Disagreement? The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hongwei Li and Bin Yu. 2014. Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing. arXiv preprint arXiv:1412.4086 (November 2014).Google ScholarGoogle Scholar
  19. Jiyi Li. 2020. Crowdsourced Text Sequence Aggregation Based on Hybrid Reliability and Representation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China). Association for Computing Machinery, New York, NY, USA, 1761--1764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, Vol. 8, 4 (December 2014), 425--436.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2016. A Survey on Truth Discovery. SIGKDD Explor. Newsl., Vol. 17, 2 (February 2016), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiayu Liu, Zhenya Huang, Chengxiang Zhai, and Qi Liu. 2023. Learning by Applying: A General Framework for Mathematical Reasoning via Enhancing Explicit Knowledge Learning. arXiv preprint arXiv:2302.05717 (2023).Google ScholarGoogle Scholar
  23. Joshua C. Peterson, Ruairidh M. Battleday, Thomas L. Griffiths, and Olga Russakovsky. 2019. Human Uncertainty Makes Classification More Robust. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 9616--9625.Google ScholarGoogle Scholar
  24. Filipe Rodrigues and Francisco Câmara Pereira. 2018. Deep Learning from Crowds. In Proceedings of the AAAI conference on artificial intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  25. Filipe Rodrigues, Francisco Camara Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In Proceedings of the 31st International Conference on Machine Learning. PMLR, Bejing, China, 433--441.Google ScholarGoogle Scholar
  26. Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, and Gianluca Demartini. 2020. Can The Crowd Identify Misinformation Objectively?: The Effects of Judgment Scale and Assessor's Background. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (May 2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bryan C. Russell, Antonio Torralba, Kevin P. Murphy, and William T. Freeman. 2008. LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision, Vol. 77, 1 (May 2008), 157--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Nasim Sabetpour, Adithya Kulkarni, Sihong Xie, and Qi Li. 2021. Truth Discovery in Sequence Labels from Crowds. In 2021 IEEE International Conference on Data Mining (ICDM). 539--548.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rion Snow, Brendan T. O'Connor, Dan Jurafsky, and A. Ng. 2008. Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, USA, 254--263.Google ScholarGoogle Scholar
  30. Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, and Nathan Silberman. 2019. Learning From Noisy Labels by Regularized Estimation of Annotator Confusion. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019), 11236--11245.Google ScholarGoogle ScholarCross RefCross Ref
  31. Matteo Venanzi, Jo W Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based bayesian aggregation models for crowdsourcing. Proceedings of the 23rd international conference on World wide web (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fei Wang, Qi Liu, Enhong Chen, Zhenya Huang, Yuying Chen, Yu Yin, Zai Huang, and Shijin Wang. 2020. Neural Cognitive Diagnosis for Intelligent Education Systems. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 04 (April 2020), 6153--6161.Google ScholarGoogle ScholarCross RefCross Ref
  33. Hongxin Wei, Renchunzi Xie, Lei Feng, Bo Han, and Bo An. 2022. Deep Learning From Multiple Noisy Annotators as A Union. IEEE transactions on neural networks and learning systems, Vol. PP (2022).Google ScholarGoogle Scholar
  34. Peter Welinder, Steve Branson, Serge J. Belongie, and Pietro Perona. 2010. The Multidimensional Wisdom of Crowds. In Advances in Neural Information Processing Systems. Curran Associates, Inc.Google ScholarGoogle Scholar
  35. Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 2035--2043.Google ScholarGoogle Scholar
  36. Yan Yan, Rómer Rosales, Glenn Fung, Subramanian Ramanathan, and Jennifer G. Dy. 2014. Learning from multiple annotators with varying expertise. Machine Learning, Vol. 95 (October 2014), 291--327.Google ScholarGoogle Scholar
  37. C. Zhang, Lei Chen, H. V. Jagadish, Mengchen Zhang, and Yongxin Tong. 2018. Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates. IEEE Transactions on Knowledge and Data Engineering, Vol. 32 (2018), 135--151.Google ScholarGoogle ScholarCross RefCross Ref
  38. Kun Zhang, Le Wu, Guangyi Lv, Meng Wang, Enhong Chen, and Shulan Ruan. 2021. Making the Relation Matters: Relation of Relation Learning Network for Sentence Semantic Matching. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  39. Hongke Zhao, Chuang Zhao, Xi Zhang, Nanlin Liu, Hengshu Zhu, Qi Liu, and Hui Xiong. 2023. An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems. INFORMS Journal on Computing (2023).Google ScholarGoogle Scholar
  40. Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endow., Vol. 10, 5 (January 2017), 541--552.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning from Crowds with Annotation Reliability

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)189
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader