skip to main content
research-article

Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding

Published: 27 February 2023 Publication History

Abstract

Crowdsourcing truth inference aims to assign a correct answer to each task from candidate answers that are provided by crowdsourced workers. A common approach is to generate workers’ reliabilities to represent the quality of answers. Although crowdsourced triples can be converted into various crowdsourced relationships, the available related methods are not effective in capturing these relationships to alleviate the harm to inference that is caused by conflicting answers. In this research, we propose a Reliability-driven Multi-view Graph Embedding framework for Truth inference (TiReMGE), which explores multiple crowdsourced relationships by organically integrating worker reliabilities into a graph space that is constructed from crowdsourced triples. Specifically, to create an interactive environment, we propose a reliability-driven initialization criterion for initializing vectors of tasks and workers as interactive carriers of reliabilities. From the perspective of multiple crowdsourced relationships, a multi-view graph embedding framework is proposed for reliability information interaction on a task-worker graph, which encodes latent crowdsourced relationships into vectors of workers and tasks for reliability update and truth inference. A heritable reliability updating method based on the Lagrange multiplier method is proposed to obtain reliabilities that match the quality of workers for interaction by a novel constraint law. Our ultimate goal is to minimize the Euclidean distance between the encoded task vector and the answer that is provided by a worker with high reliability. Extensive experimental results on nine real-world datasets demonstrate that TiReMGE significantly outperforms the nine state-of-the-art baselines.

References

[1]
Bahadir Ismail Aydin, Yavuz Selim Yilmaz, Yaliang Li, Qi Li, Jing Gao, and Murat Demirbas. 2014. Crowdsourcing for multiple-choice question answering. In Proceedings of the Association for the Advancement of Artificial Intelligence. AAAI Press, 2946–2953.
[2]
Agathe Balayn, Panagiotis Soilis, Christoph Lofi, Jie Yang, and Alessandro Bozzon. 2021. What do you mean? Interpreting image classification with crowdsourced concept extraction and analysis. In Proceedings of the Web Conference WWW. ACM, 1937–1948.
[3]
Stephen P. Boyd and Lieven Vandenberghe. 2014. Convex Optimization. Cambridge University Press.
[4]
Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616–1637.
[5]
Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: An information theoretic approach for joint learning from crowds. In Proceedings of the International Conference on Learning Representations. OpenReview.net.
[6]
Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from crowds by modeling common confusions. In Proceedings of the Association for the Advancement of Artificial Intelligence. AAAI Press, 5832–5840.
[7]
Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 20–28.
[8]
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the International World Wide Web Conference. ACM, 469–478.
[9]
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.
[10]
Djellel Difallah and Alessandro Checco. 2021. Aggregation techniques in crowdsourcing: Multiple choice questions and beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, 4842–4844.
[11]
Xin Luna Dong, Laure Berti-Équille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proceedings of the VLDB Endowment 2, 1 (2009), 550–561.
[12]
Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 61–72.
[13]
Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for confirmation bias in crowdsourced label aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 1729–1735.
[14]
Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, and Lydia Y. Chen. 2021. Online label aggregation: A variational Bayesian approach. In Proceedings of the International World Wide Web Conference. ACM, 1904–1915.
[15]
Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, and Changsheng Xu. 2021. Efficient graph deep learning in TensorFlow with tf_geometric. In Proceedings of the 29th ACM International Conference on Multimedia. ACM, 3775–3778.
[16]
David R. Karger, Sewoong Oh, and Devavrat Shah. 2011. Iterative learning for reliable crowdsourcing systems. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 1953–1961.
[17]
Yasushi Kawase, Yuko Kuroki, and Atsushi Miyauchi. 2019. Graph mining meets crowdsourcing: Extracting experts for answer aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 1272–1279.
[18]
Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian classifier combination. In AISTATS (JMLR Proceedings), Vol. 22. JMLR.org, 619–627.
[19]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations. OpenReview.net.
[20]
Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models—Principles and Techniques. MIT Press.
[21]
Ang Li, Yixiao Duan, Huanrui Yang, Yiran Chen, and Jianlei Yang. 2020. TIPRDC: Task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 824–832.
[22]
Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021. CrowdGP: A Gaussian process model for inferring relevance from crowd annotations. In Proceedings of the International World Wide Web Conference. ACM, 1821–1832.
[23]
Maocheng Li, Jiachuan Wang, Libin Zheng, Han Wu, Peng Cheng, Lei Chen, and Xuemin Lin. 2021. Privacy-preserving batch-based task assignment in spatial crowdsourcing with untrusted server. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, 947–956.
[24]
Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment 8, 4 (2014), 425–436.
[25]
Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1187–1198.
[26]
Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. A survey on truth discovery. SIGKDD Explorations Newsletter 17, 2 (2015), 1–16.
[27]
Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. On the discovery of evolving truth. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 675–684.
[28]
Yuan Li, Benjamin I. P. Rubinstein, and Trevor Cohn. 2019. Exploiting worker correlation for label aggregation in crowdsourcing. In Proceedings of the International Conference on Machine Learning, Vol. 97. PMLR, 3886–3895.
[29]
Yuan Li, Benjamin I. P. Rubinstein, and Trevor Cohn. 2019. Truth inference at scale: A Bayesian model for adjudicating highly redundant crowd annotations. In Proceedings of the International World Wide Web Conference. ACM, 1028–1038.
[30]
Yanying Li, Haipei Sun, and Wendy Hui Wang. 2020. Towards fair truth discovery from biased crowdsourced answers. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 599–607.
[31]
Qiang Liu, Jian Peng, and Alexander T. Ihler. 2012. Variational inference for crowdsourcing. In Proceedings of the Neural Information Processing Systems. Curran Associates, Inc., 701–709.
[32]
Shanshan Lyu, Wentao Ouyang, Huawei Shen, and Xueqi Cheng. 2017. Truth discovery by claim and source embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 2183–2186.
[33]
Shanshan Lyu, Wentao Ouyang, Yongqing Wang, Huawei Shen, and Xueqi Cheng. 2021. Truth discovery by claim and source embedding. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2021), 1264–1275.
[34]
Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei Wang. 2018. Drug similarity integration through attentive multi-view graph auto-encoders. In IJCAI. ijcai.org, 3477–3483.
[35]
Adam Marcus, Eugene Wu, Samuel Madden, and Robert C. Miller. 2011. Crowdsourced databases: Query processing with people. In Proceedings of the Conference on Innovative Data Systems Research. 211–214. www.cidrdb.org.
[36]
Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, and Yun Cheng. 2015. Truth discovery on crowd sensing of correlated entities. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 169–182.
[37]
Aditya G. Parameswaran, Hyunjung Park, Hector Garcia-Molina, Neoklis Polyzotis, and Jennifer Widom. 2012. Deco: Declarative crowdsourcing. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 1203–1212.
[38]
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. 2014. Fusing data with correlations. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 433–444.
[39]
Hazem Sallouha, Alessandro Chiumento, and Sofie Pollin. 2021. Aerial vehicles tracking using noncoherent crowdsourced wireless networks. IEEE Transactions on Vehicular Technology 70, 10 (2021), 10780–10791.
[40]
Anish Das Sarma, Xin Luna Dong, and Alon Y. Halevy. 2011. Data integration with dependent sources. In Proceedings of the 14th International Conference on Extending Database Technology. ACM, 401–412.
[41]
Zheyuan Ryan Shi, Leah Lizarondo, and Fei Fang. 2021. A recommender system for crowdsourcing food rescue platforms. In Proceedings of the International World Wide Web Conference. ACM, 857–865.
[42]
Jianchao Tang, Shaojing Fu, Ming Xu, Yuchuan Luo, and Kai Huang. 2019. Achieve privacy-preserving truth discovery in crowdsensing systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 1301–1310.
[43]
Yongxin Tong, Yuxiang Zeng, Bolin Ding, Libin Wang, and Lei Chen. 2021. Two-sided online micro-task assignment in spatial crowdsourcing. IEEE Trans. Knowl. Data Eng. 33, 5 (2021), 2295–2309.
[44]
Jiayang Tu, Peng Cheng, and Lei Chen. 2021. Quality-assured synchronized task assignment in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2021), 1156–1168.
[45]
Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based Bayesian aggregation models for crowdsourcing. In Proceedings of the International World Wide Web Conference. ACM, 155–164.
[46]
Matteo Venanzi, Oliver Parson, Alex Rogers, and Nick R. Jennings. 2015. The ActiveCrowdToolkit: An open-source tool for benchmarking active learning algorithms for crowdsourcing research. In Proceedings of the 3rd AAAI Conference on Human Computation and Crowdsourcing. AAAI Press, 44–45.
[47]
Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Foundations Trends in Machine Learning 1, 1–2 (2008), 1–305.
[48]
Dong Wang, Md. Tanvir Al Amin, Shen Li, Tarek F. Abdelzaher, Lance M. Kaplan, Siyu Gu, Chenji Pan, Hengchang Liu, Charu C. Aggarwal, Raghu K. Ganti, Xinlei Wang, Prasant Mohapatra, Boleslaw K. Szymanski, and Hieu Khac Le. 2014. Using humans as sensors: An estimation-theoretic perspective. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks. IEEE/ACM, 35–46.
[49]
Ping Wang, Khushbu Agarwal, Colby Ham, Sutanay Choudhury, and Chandan K. Reddy. 2021. Self-supervised learning of contextual embeddings for link prediction in heterogeneous networks. In Proceedings of the International World Wide Web Conference. ACM, 2946–2957.
[50]
Shiguang Wang, Lu Su, Shen Li, Shaohan Hu, Md. Tanvir Al Amin, Hongwei Wang, Shuochao Yao, Lance M. Kaplan, and Tarek F. Abdelzaher. 2015. Scalable social sensing of interdependent phenomena. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks. ACM, 202–213.
[51]
Yue Wang, Ke Wang, and Chunyan Miao. 2020. Truth discovery against strategic sybil attack in crowdsourcing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 95–104.
[52]
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS. Curran Associates, Inc., 2035–2043.
[53]
Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2022. Exploiting heterogeneous graph neural networks with latent worker/task correlation information for label aggregation in crowdsourcing. ACM Transactions on Knowledge Discovery from Data 16, 2 (2022), 27:1–27:18.
[54]
Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, and Baihua Zheng. 2022. CrowdTC: Crowd-powered learning for text classification. ACM Transactions on Knowledge Discovery from Data 16, 1 (2022), 15:1–15:23.
[55]
Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 4805–4815.
[56]
Jing Zhang, Victor S. Sheng, and Tao Li. 2017. Label aggregation for crowdsourcing with Bi-layer clustering. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 921–924.
[57]
Jing Zhang, Victor S. Sheng, and Jian Wu. 2019. Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 3172–3185.
[58]
Jing Zhang, Victor S. Sheng, Jian Wu, and Xindong Wu. 2016. Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering 28, 4 (2016), 1080–1085.
[59]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved?Proceedings of the VLDB Endowment 10, 5 (2017), 541–552.
[60]
Dengyong Zhou, John C. Platt, Sumit Basu, and Yi Mao. 2012. Learning from the wisdom of crowds by minimax entropy. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 2204–2212.

Cited By

View all
  • (2025)Robust annotation aggregation in crowdsourcing via enhanced worker ability modelingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10391462:1Online publication date: 1-Jan-2025
  • (2024)Graph Contrastive Learning for Truth Inference2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00027(263-275)Online publication date: 13-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 5
June 2023
386 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3583066
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023
Online AM: 04 October 2022
Accepted: 21 September 2022
Revised: 15 July 2022
Received: 20 January 2022
Published in TKDD Volume 17, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Crowdsourcing truth inference
  2. graph embedding
  3. reliability interaction
  4. worker quality match

Qualifiers

  • Research-article

Funding Sources

  • National Key Research and Development Program of China
  • Program for Innovative Research Team in University of the Ministry of Education
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)174
  • Downloads (Last 6 weeks)11
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Robust annotation aggregation in crowdsourcing via enhanced worker ability modelingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10391462:1Online publication date: 1-Jan-2025
  • (2024)Graph Contrastive Learning for Truth Inference2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00027(263-275)Online publication date: 13-May-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media