research-article

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

Authors:
Dan Li

University of Amsterdam & Elsevier, Amsterdam, Netherlands

University of Amsterdam & Elsevier, Amsterdam, Netherlands

0000-0001-6381-1087
View Profile

,
Maarten de Rijke

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands

0000-0002-1086-0202
View Profile

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2023Pages 729–738https://doi.org/10.1145/3539618.3591685

Published:18 July 2023Publication History

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 729–738

ABSTRACT

Label aggregation (LA) is the task of inferring a high-quality label for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes a label generation process and designs a probabilistic graphical model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily affected by the noise of the crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA model outperforms the rest on all datasets.

We extend PGM-based LA models by integrating a GP prior on the true labels. The advantage of LA models extended with a GP prior is that they can take as input crowd labels, example features, and existing pre-trained label prediction models to infer the true labels, while the original LA can only leverage crowd labels. Experimental results on both synthetic and real datasets show that any LA models extended with a GP prior and a suitable mean function achieves better performance than the underlying LA models, demonstrating the effectiveness of using a GP prior.

References

Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1313--1321.Google Scholar
Valerio Basile. [n.,d.]. The Perspectivist Data Manifesto. https://pdai.info/. [Online; accessed 2-January-2023].Google Scholar
Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google Scholar
Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise Ranking Aggregation in a Crowdsourced Setting. In Proceedings of the sixth ACM international conference on Web search and data mining. 193--202.Google ScholarDigital Library
Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.Google ScholarDigital Library
Alexander Philip Dawid and Allan M Skene. 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied statistics (1979), 20--28.Google Scholar
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. 469--478.Google ScholarDigital Library
Djellel Difallah and Alessandro Checco. 2021. Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4842--4844.Google ScholarDigital Library
Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, and Daria Baidakova. 2020. Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing. In Proceedings of the 13th International Conference on Web Search and Data Mining. 873--876.Google ScholarDigital Library
Peter A Flach, José Hernández-Orallo, and Cèsar Ferri Ramirez. 2011. A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In ICML.Google Scholar
Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for Confirmation Bias in Crowdsourced Label Aggregation.. In IJCAI. 1729--1735.Google Scholar
Perry Groot, Adriana Birlutiu, and Tom Heskes. 2011. Learning from Multiple Annotators with Gaussian Processes. In International Conference on Artificial Neural Networks. Springer, 159--164.Google ScholarCross Ref
Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, and Mark Girolami. 2019. Multi-resolution Multi-task Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, and Gianluca Demartini. 2020. Crowd Worker Strategies in Relevance Judgment Tasks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 241--249.Google ScholarDigital Library
Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1120--1130.Google Scholar
Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, and Lora Aroyo. 2018. Studying Topical Relevance with Evidence-based Crowdsourcing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1253--1262.Google ScholarDigital Library
Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace. Proceedings of the VLDB Endowment, Vol. 10, 7 (2017), 829--840.Google ScholarDigital Library
Yuan Jin, Mark Carman, Ye Zhu, and Yong Xiang. 2020. A Technical Survey on Statistical Modelling and Design Methods for Crowdsourcing Quality Control. Artificial Intelligence, Vol. 287 (2020), 103351.Google ScholarCross Ref
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An Analysis of Human Factors and Label Accuracy in Crowdsourcing Relevance Judgments. Information retrieval, Vol. 16, 2 (2013), 138--178.Google Scholar
Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian Classifier Combination. In Artificial Intelligence and Statistics. 619--627.Google Scholar
Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. Variational Learning on Aggregate Outputs with Gaussian Processes. Advances in neural information processing systems , Vol. 31 (2018).Google Scholar
Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021b. CrowdGP: A Gaussian Process Model for Inferring Relevance from Crowd Annotations. In Proceedings of the Web Conference 2021. 1821--1832.Google ScholarDigital Library
Shao-Yuan Li, Sheng-Jun Huang, and Songcan Chen. 2021a. Crowdsourcing Aggregation with Deep Bayesian Learning. Science China Information Sciences, Vol. 64, 3 (2021), 1--11.Google ScholarCross Ref
Yuan Li. 2019. Probabilistic Models for Aggregating Crowdsourced Annotations. Ph.,D. Dissertation. University of Melbourne, Parkville, Victoria, Australia.Google Scholar
Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2017. GPflow: A Gaussian Process Library Using TensorFlow. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 1299--1304.Google Scholar
Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. John Wiley & Sons.Google Scholar
Pablo Morales-Álvarez, Pablo Ruiz, Raúl Santos-Rodríguez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Scalable and Efficient Learning from Crowds with Gaussian Processes. Information Fusion, Vol. 52 (2019), 110--127.Google ScholarDigital Library
Yashar Moshfeghi and Alvaro Francisco Huertas-Rosero. 2021. A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments. ACM Transactions on Information Systems (TOIS), Vol. 40, 3 (2021), 1--29.Google ScholarDigital Library
Radford M Neal and Geoffrey E Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in graphical models. Springer, 355--368.Google Scholar
Carl Edward Rasmussen. 2004. Gaussian Processes in Machine Learning. In Advanced lectures on machine learning. Springer, 63--71.Google Scholar
Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from Crowds. Journal of Machine Learning Research, Vol. 11, Apr (2010), 1297--1322.Google ScholarDigital Library
Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In International Conference on Machine Learning. 433--441.Google Scholar
Kevin Roitero, Alessandro Checco, Stefano Mizzaro, and Gianluca Demartini. 2022. Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments. In Proceedings of the ACM Web Conference 2022. 319--327.Google ScholarDigital Library
Pablo Ruiz, Pablo Morales-Álvarez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Learning from Crowds with Variational Gaussian Processes. Pattern Recognition, Vol. 88 (2019), 298--311.Google ScholarDigital Library
Michael Soprano, Kevin Roitero, Francesco Bombassei De Bona, and Stefano Mizzaro. 2022. Crowd Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-shelf. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1605--1608.Google ScholarDigital Library
Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR?11). 21--26.Google Scholar
Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in neural information processing systems. 2035--2043.Google Scholar
Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2021. Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 2 (2021), 1--18.Google Scholar
Ming Wu, Qianmu Li, Jing Zhang, and Jun Hou. 2022. Label Aggregation with Clustering for Biased Crowdsourced Labeling. In 2022 14th International Conference on Machine Learning and Computing (ICMLC). 165--169.Google Scholar
Fariba Yousefi, Michael T Smith, and Mauricio Alvarez. 2019. Multi-task Learning for Aggregated Data Using Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2023. Learning on Large-scale Text-attributed Graphs via Variational Inference. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=q0nmYciuuZNGoogle Scholar
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 541--552.Google ScholarDigital Library
Yao Zhou, Fenglong Ma, Jing Gao, and Jingrui He. 2019. Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3231--3232.Google ScholarDigital Library

Index Terms

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning in probabilistic graphical models
        Bayesian network models
2. Information systems
  1. Information retrieval

Recommendations

Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Read More
Label Aggregation for Crowdsourcing with Bi-Layer Clustering
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

This paper proposes a novel general label aggregation method for both binary and multi-class labeling in crowdsourcing, namely Bi-Layer Clustering (BLC), which clusters two layers of features - the conceptual-level and the physical-level features - to ...
Read More
Multi-Label Inference for Crowdsourcing
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

When acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
label aggregation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Label Aggregation with Clustering for Biased Crowdsourced Labeling

Label Aggregation for Crowdsourcing with Bi-Layer Clustering

Multi-Label Inference for Crowdsourcing