skip to main content
10.1145/3539618.3591685acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

Published:18 July 2023Publication History

ABSTRACT

Label aggregation (LA) is the task of inferring a high-quality label for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes a label generation process and designs a probabilistic graphical model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily affected by the noise of the crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA model outperforms the rest on all datasets.

We extend PGM-based LA models by integrating a GP prior on the true labels. The advantage of LA models extended with a GP prior is that they can take as input crowd labels, example features, and existing pre-trained label prediction models to infer the true labels, while the original LA can only leverage crowd labels. Experimental results on both synthetic and real datasets show that any LA models extended with a GP prior and a suitable mean function achieves better performance than the underlying LA models, demonstrating the effectiveness of using a GP prior.

References

  1. Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1313--1321.Google ScholarGoogle Scholar
  2. Valerio Basile. [n.,d.]. The Perspectivist Data Manifesto. https://pdai.info/. [Online; accessed 2-January-2023].Google ScholarGoogle Scholar
  3. Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  4. Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise Ranking Aggregation in a Crowdsourced Setting. In Proceedings of the sixth ACM international conference on Web search and data mining. 193--202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexander Philip Dawid and Allan M Skene. 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied statistics (1979), 20--28.Google ScholarGoogle Scholar
  7. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. 469--478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Djellel Difallah and Alessandro Checco. 2021. Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4842--4844.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, and Daria Baidakova. 2020. Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing. In Proceedings of the 13th International Conference on Web Search and Data Mining. 873--876.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Peter A Flach, José Hernández-Orallo, and Cèsar Ferri Ramirez. 2011. A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In ICML.Google ScholarGoogle Scholar
  11. Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for Confirmation Bias in Crowdsourced Label Aggregation.. In IJCAI. 1729--1735.Google ScholarGoogle Scholar
  12. Perry Groot, Adriana Birlutiu, and Tom Heskes. 2011. Learning from Multiple Annotators with Gaussian Processes. In International Conference on Artificial Neural Networks. Springer, 159--164.Google ScholarGoogle ScholarCross RefCross Ref
  13. Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, and Mark Girolami. 2019. Multi-resolution Multi-task Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google ScholarGoogle Scholar
  14. Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, and Gianluca Demartini. 2020. Crowd Worker Strategies in Relevance Judgment Tasks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 241--249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1120--1130.Google ScholarGoogle Scholar
  16. Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, and Lora Aroyo. 2018. Studying Topical Relevance with Evidence-based Crowdsourcing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1253--1262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace. Proceedings of the VLDB Endowment, Vol. 10, 7 (2017), 829--840.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yuan Jin, Mark Carman, Ye Zhu, and Yong Xiang. 2020. A Technical Survey on Statistical Modelling and Design Methods for Crowdsourcing Quality Control. Artificial Intelligence, Vol. 287 (2020), 103351.Google ScholarGoogle ScholarCross RefCross Ref
  19. Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An Analysis of Human Factors and Label Accuracy in Crowdsourcing Relevance Judgments. Information retrieval, Vol. 16, 2 (2013), 138--178.Google ScholarGoogle Scholar
  20. Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian Classifier Combination. In Artificial Intelligence and Statistics. 619--627.Google ScholarGoogle Scholar
  21. Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. Variational Learning on Aggregate Outputs with Gaussian Processes. Advances in neural information processing systems , Vol. 31 (2018).Google ScholarGoogle Scholar
  22. Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021b. CrowdGP: A Gaussian Process Model for Inferring Relevance from Crowd Annotations. In Proceedings of the Web Conference 2021. 1821--1832.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shao-Yuan Li, Sheng-Jun Huang, and Songcan Chen. 2021a. Crowdsourcing Aggregation with Deep Bayesian Learning. Science China Information Sciences, Vol. 64, 3 (2021), 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yuan Li. 2019. Probabilistic Models for Aggregating Crowdsourced Annotations. Ph.,D. Dissertation. University of Melbourne, Parkville, Victoria, Australia.Google ScholarGoogle Scholar
  25. Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2017. GPflow: A Gaussian Process Library Using TensorFlow. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 1299--1304.Google ScholarGoogle Scholar
  26. Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. John Wiley & Sons.Google ScholarGoogle Scholar
  27. Pablo Morales-Álvarez, Pablo Ruiz, Raúl Santos-Rodríguez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Scalable and Efficient Learning from Crowds with Gaussian Processes. Information Fusion, Vol. 52 (2019), 110--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yashar Moshfeghi and Alvaro Francisco Huertas-Rosero. 2021. A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments. ACM Transactions on Information Systems (TOIS), Vol. 40, 3 (2021), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Radford M Neal and Geoffrey E Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in graphical models. Springer, 355--368.Google ScholarGoogle Scholar
  30. Carl Edward Rasmussen. 2004. Gaussian Processes in Machine Learning. In Advanced lectures on machine learning. Springer, 63--71.Google ScholarGoogle Scholar
  31. Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from Crowds. Journal of Machine Learning Research, Vol. 11, Apr (2010), 1297--1322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In International Conference on Machine Learning. 433--441.Google ScholarGoogle Scholar
  33. Kevin Roitero, Alessandro Checco, Stefano Mizzaro, and Gianluca Demartini. 2022. Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments. In Proceedings of the ACM Web Conference 2022. 319--327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pablo Ruiz, Pablo Morales-Álvarez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Learning from Crowds with Variational Gaussian Processes. Pattern Recognition, Vol. 88 (2019), 298--311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Soprano, Kevin Roitero, Francesco Bombassei De Bona, and Stefano Mizzaro. 2022. Crowd Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-shelf. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1605--1608.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google ScholarGoogle Scholar
  37. Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR?11). 21--26.Google ScholarGoogle Scholar
  38. Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in neural information processing systems. 2035--2043.Google ScholarGoogle Scholar
  39. Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2021. Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 2 (2021), 1--18.Google ScholarGoogle Scholar
  40. Ming Wu, Qianmu Li, Jing Zhang, and Jun Hou. 2022. Label Aggregation with Clustering for Biased Crowdsourced Labeling. In 2022 14th International Conference on Machine Learning and Computing (ICMLC). 165--169.Google ScholarGoogle Scholar
  41. Fariba Yousefi, Michael T Smith, and Mauricio Alvarez. 2019. Multi-task Learning for Aggregated Data Using Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google ScholarGoogle Scholar
  42. Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2023. Learning on Large-scale Text-attributed Graphs via Variational Inference. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=q0nmYciuuZNGoogle ScholarGoogle Scholar
  43. Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 541--552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yao Zhou, Fenglong Ma, Jing Gao, and Jingrui He. 2019. Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3231--3232.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2023
        3567 pages
        ISBN:9781450394086
        DOI:10.1145/3539618

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)72
        • Downloads (Last 6 weeks)9

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader