ABSTRACT
Label aggregation (LA) is the task of inferring a high-quality label for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes a label generation process and designs a probabilistic graphical model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily affected by the noise of the crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA model outperforms the rest on all datasets.
We extend PGM-based LA models by integrating a GP prior on the true labels. The advantage of LA models extended with a GP prior is that they can take as input crowd labels, example features, and existing pre-trained label prediction models to infer the true labels, while the original LA can only leverage crowd labels. Experimental results on both synthetic and real datasets show that any LA models extended with a GP prior and a suitable mean function achieves better performance than the underlying LA models, demonstrating the effectiveness of using a GP prior.
- Shadi Albarqouni, Christoph Baur, Felix Achilles, Vasileios Belagiannis, Stefanie Demirci, and Nassir Navab. 2016. Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, Vol. 35, 5 (2016), 1313--1321.Google Scholar
- Valerio Basile. [n.,d.]. The Perspectivist Data Manifesto. https://pdai.info/. [Online; accessed 2-January-2023].Google Scholar
- Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds. In International Conference on Learning Representations.Google Scholar
- Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise Ranking Aggregation in a Crowdsourced Setting. In Proceedings of the sixth ACM international conference on Web search and data mining. 193--202.Google ScholarDigital Library
- Zhuyun Dai and Jamie Callan. 2019. Deeper Text Understanding for IR with Contextual Neural Language Modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 985--988.Google ScholarDigital Library
- Alexander Philip Dawid and Allan M Skene. 1979. Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm. Applied statistics (1979), 20--28.Google Scholar
- Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In Proceedings of the 21st International Conference on World Wide Web. 469--478.Google ScholarDigital Library
- Djellel Difallah and Alessandro Checco. 2021. Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4842--4844.Google ScholarDigital Library
- Alexey Drutsa, Valentina Fedorova, Dmitry Ustalov, Olga Megorskaya, Evfrosiniya Zerminova, and Daria Baidakova. 2020. Practice of Efficient Data Collection via Crowdsourcing: Aggregation, Incremental Relabelling, and Pricing. In Proceedings of the 13th International Conference on Web Search and Data Mining. 873--876.Google ScholarDigital Library
- Peter A Flach, José Hernández-Orallo, and Cèsar Ferri Ramirez. 2011. A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In ICML.Google Scholar
- Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for Confirmation Bias in Crowdsourced Label Aggregation.. In IJCAI. 1729--1735.Google Scholar
- Perry Groot, Adriana Birlutiu, and Tom Heskes. 2011. Learning from Multiple Annotators with Gaussian Processes. In International Conference on Artificial Neural Networks. Springer, 159--164.Google ScholarCross Ref
- Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, and Mark Girolami. 2019. Multi-resolution Multi-task Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
- Lei Han, Eddy Maddalena, Alessandro Checco, Cristina Sarasua, Ujwal Gadiraju, Kevin Roitero, and Gianluca Demartini. 2020. Crowd Worker Strategies in Relevance Judgment Tasks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 241--249.Google ScholarDigital Library
- Dirk Hovy, Taylor Berg-Kirkpatrick, Ashish Vaswani, and Eduard Hovy. 2013. Learning Whom to Trust with MACE. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1120--1130.Google Scholar
- Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, and Lora Aroyo. 2018. Studying Topical Relevance with Evidence-based Crowdsourcing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1253--1262.Google ScholarDigital Library
- Ayush Jain, Akash Das Sarma, Aditya Parameswaran, and Jennifer Widom. 2017. Understanding Workers, Developing Effective Tasks, and Enhancing Marketplace Dynamics: A Study of a Large Crowdsourcing Marketplace. Proceedings of the VLDB Endowment, Vol. 10, 7 (2017), 829--840.Google ScholarDigital Library
- Yuan Jin, Mark Carman, Ye Zhu, and Yong Xiang. 2020. A Technical Survey on Statistical Modelling and Design Methods for Crowdsourcing Quality Control. Artificial Intelligence, Vol. 287 (2020), 103351.Google ScholarCross Ref
- Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling. 2013. An Analysis of Human Factors and Label Accuracy in Crowdsourcing Relevance Judgments. Information retrieval, Vol. 16, 2 (2013), 138--178.Google Scholar
- Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian Classifier Combination. In Artificial Intelligence and Statistics. 619--627.Google Scholar
- Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. Variational Learning on Aggregate Outputs with Gaussian Processes. Advances in neural information processing systems , Vol. 31 (2018).Google Scholar
- Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021b. CrowdGP: A Gaussian Process Model for Inferring Relevance from Crowd Annotations. In Proceedings of the Web Conference 2021. 1821--1832.Google ScholarDigital Library
- Shao-Yuan Li, Sheng-Jun Huang, and Songcan Chen. 2021a. Crowdsourcing Aggregation with Deep Bayesian Learning. Science China Information Sciences, Vol. 64, 3 (2021), 1--11.Google ScholarCross Ref
- Yuan Li. 2019. Probabilistic Models for Aggregating Crowdsourced Annotations. Ph.,D. Dissertation. University of Melbourne, Parkville, Victoria, Australia.Google Scholar
- Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. 2017. GPflow: A Gaussian Process Library Using TensorFlow. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 1299--1304.Google Scholar
- Geoffrey J McLachlan and Thriyambakam Krishnan. 2007. The EM algorithm and extensions. John Wiley & Sons.Google Scholar
- Pablo Morales-Álvarez, Pablo Ruiz, Raúl Santos-Rodríguez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Scalable and Efficient Learning from Crowds with Gaussian Processes. Information Fusion, Vol. 52 (2019), 110--127.Google ScholarDigital Library
- Yashar Moshfeghi and Alvaro Francisco Huertas-Rosero. 2021. A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments. ACM Transactions on Information Systems (TOIS), Vol. 40, 3 (2021), 1--29.Google ScholarDigital Library
- Radford M Neal and Geoffrey E Hinton. 1998. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In Learning in graphical models. Springer, 355--368.Google Scholar
- Carl Edward Rasmussen. 2004. Gaussian Processes in Machine Learning. In Advanced lectures on machine learning. Springer, 63--71.Google Scholar
- Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from Crowds. Journal of Machine Learning Research, Vol. 11, Apr (2010), 1297--1322.Google ScholarDigital Library
- Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. 2014. Gaussian Process Classification and Active Learning with Multiple Annotators. In International Conference on Machine Learning. 433--441.Google Scholar
- Kevin Roitero, Alessandro Checco, Stefano Mizzaro, and Gianluca Demartini. 2022. Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments. In Proceedings of the ACM Web Conference 2022. 319--327.Google ScholarDigital Library
- Pablo Ruiz, Pablo Morales-Álvarez, Rafael Molina, and Aggelos K Katsaggelos. 2019. Learning from Crowds with Variational Gaussian Processes. Pattern Recognition, Vol. 88 (2019), 298--311.Google ScholarDigital Library
- Michael Soprano, Kevin Roitero, Francesco Bombassei De Bona, and Stefano Mizzaro. 2022. Crowd Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-shelf. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1605--1608.Google ScholarDigital Library
- Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
- Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff. 2011. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy. In Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR?11). 21--26.Google Scholar
- Jacob Whitehill, Ting-fan Wu, Jacob Bergsma, Javier R Movellan, and Paul L Ruvolo. 2009. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In Advances in neural information processing systems. 2035--2043.Google Scholar
- Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2021. Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 2 (2021), 1--18.Google Scholar
- Ming Wu, Qianmu Li, Jing Zhang, and Jun Hou. 2022. Label Aggregation with Clustering for Biased Crowdsourced Labeling. In 2022 14th International Conference on Machine Learning and Computing (ICMLC). 165--169.Google Scholar
- Fariba Yousefi, Michael T Smith, and Mauricio Alvarez. 2019. Multi-task Learning for Aggregated Data Using Gaussian Processes. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
- Jianan Zhao, Meng Qu, Chaozhuo Li, Hao Yan, Qian Liu, Rui Li, Xing Xie, and Jian Tang. 2023. Learning on Large-scale Text-attributed Graphs via Variational Inference. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=q0nmYciuuZNGoogle Scholar
- Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proceedings of the VLDB Endowment, Vol. 10, 5 (2017), 541--552.Google ScholarDigital Library
- Yao Zhou, Fenglong Ma, Jing Gao, and Jingrui He. 2019. Optimizing the Wisdom of the Crowd: Inference, Learning, and Teaching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3231--3232.Google ScholarDigital Library
Index Terms
- Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels
Recommendations
Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and ComputingWith the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Label Aggregation for Crowdsourcing with Bi-Layer Clustering
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalThis paper proposes a novel general label aggregation method for both binary and multi-class labeling in crowdsourcing, namely Bi-Layer Clustering (BLC), which clusters two layers of features - the conceptual-level and the physical-level features - to ...
Multi-Label Inference for Crowdsourcing
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningWhen acquiring labels from crowdsourcing platforms, a task may be designed to include multiple labels and the values of each label may belong to a set of various distinct options, which is the so-called multi-class multi-label annotation. To improve the ...
Comments