research-article

Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding

Authors:

Xindong WuAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 17, Issue 5

Article No.: 65, Pages 1 - 26

https://doi.org/10.1145/3565576

Published: 27 February 2023 Publication History

Abstract

Crowdsourcing truth inference aims to assign a correct answer to each task from candidate answers that are provided by crowdsourced workers. A common approach is to generate workers’ reliabilities to represent the quality of answers. Although crowdsourced triples can be converted into various crowdsourced relationships, the available related methods are not effective in capturing these relationships to alleviate the harm to inference that is caused by conflicting answers. In this research, we propose a Reliability-driven Multi-view Graph Embedding framework for Truth inference (TiReMGE), which explores multiple crowdsourced relationships by organically integrating worker reliabilities into a graph space that is constructed from crowdsourced triples. Specifically, to create an interactive environment, we propose a reliability-driven initialization criterion for initializing vectors of tasks and workers as interactive carriers of reliabilities. From the perspective of multiple crowdsourced relationships, a multi-view graph embedding framework is proposed for reliability information interaction on a task-worker graph, which encodes latent crowdsourced relationships into vectors of workers and tasks for reliability update and truth inference. A heritable reliability updating method based on the Lagrange multiplier method is proposed to obtain reliabilities that match the quality of workers for interaction by a novel constraint law. Our ultimate goal is to minimize the Euclidean distance between the encoded task vector and the answer that is provided by a worker with high reliability. Extensive experimental results on nine real-world datasets demonstrate that TiReMGE significantly outperforms the nine state-of-the-art baselines.

References

[1]

Bahadir Ismail Aydin, Yavuz Selim Yilmaz, Yaliang Li, Qi Li, Jing Gao, and Murat Demirbas. 2014. Crowdsourcing for multiple-choice question answering. In Proceedings of the Association for the Advancement of Artificial Intelligence. AAAI Press, 2946–2953.

[2]

Agathe Balayn, Panagiotis Soilis, Christoph Lofi, Jie Yang, and Alessandro Bozzon. 2021. What do you mean? Interpreting image classification with crowdsourced concept extraction and analysis. In Proceedings of the Web Conference WWW. ACM, 1937–1948.

Digital Library

[3]

Stephen P. Boyd and Lieven Vandenberghe. 2014. Convex Optimization. Cambridge University Press.

[4]

Hongyun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. 2018. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1616–1637.

Digital Library

[5]

Peng Cao, Yilun Xu, Yuqing Kong, and Yizhou Wang. 2019. Max-MIG: An information theoretic approach for joint learning from crowds. In Proceedings of the International Conference on Learning Representations. OpenReview.net.

[6]

Zhendong Chu, Jing Ma, and Hongning Wang. 2021. Learning from crowds by modeling common confusions. In Proceedings of the Association for the Advancement of Artificial Intelligence. AAAI Press, 5832–5840.

[7]

Alexander Philip Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 20–28.

[8]

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the International World Wide Web Conference. ACM, 469–478.

Digital Library

[9]

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1–22.

[10]

Djellel Difallah and Alessandro Checco. 2021. Aggregation techniques in crowdsourcing: Multiple choice questions and beyond. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, 4842–4844.

Digital Library

[11]

Xin Luna Dong, Laure Berti-Équille, and Divesh Srivastava. 2009. Integrating conflicting data: The role of source dependence. Proceedings of the VLDB Endowment 2, 1 (2009), 550–561.

Digital Library

[12]

Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 61–72.

Digital Library

[13]

Meric Altug Gemalmaz and Ming Yin. 2021. Accounting for confirmation bias in crowdsourced label aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 1729–1735.

[14]

Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, and Lydia Y. Chen. 2021. Online label aggregation: A variational Bayesian approach. In Proceedings of the International World Wide Web Conference. ACM, 1904–1915.

Digital Library

[15]

Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, and Changsheng Xu. 2021. Efficient graph deep learning in TensorFlow with tf_geometric. In Proceedings of the 29th ACM International Conference on Multimedia. ACM, 3775–3778.

Digital Library

[16]

David R. Karger, Sewoong Oh, and Devavrat Shah. 2011. Iterative learning for reliable crowdsourcing systems. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 1953–1961.

[17]

Yasushi Kawase, Yuko Kuroki, and Atsushi Miyauchi. 2019. Graph mining meets crowdsourcing: Extracting experts for answer aggregation. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 1272–1279.

[18]

Hyun-Chul Kim and Zoubin Ghahramani. 2012. Bayesian classifier combination. In AISTATS (JMLR Proceedings), Vol. 22. JMLR.org, 619–627.

[19]

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations. OpenReview.net.

[20]

Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models—Principles and Techniques. MIT Press.

Digital Library

[21]

Ang Li, Yixiao Duan, Huanrui Yang, Yiran Chen, and Jianlei Yang. 2020. TIPRDC: Task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized intermediate representations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 824–832.

Digital Library

[22]

Dan Li, Zhaochun Ren, and Evangelos Kanoulas. 2021. CrowdGP: A Gaussian process model for inferring relevance from crowd annotations. In Proceedings of the International World Wide Web Conference. ACM, 1821–1832.

Digital Library

[23]

Maocheng Li, Jiachuan Wang, Libin Zheng, Han Wu, Peng Cheng, Lei Chen, and Xuemin Lin. 2021. Privacy-preserving batch-based task assignment in spatial crowdsourcing with untrusted server. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, 947–956.

Digital Library

[24]

Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, and Jiawei Han. 2014. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment 8, 4 (2014), 425–436.

Digital Library

[25]

Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1187–1198.

Digital Library

[26]

Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. A survey on truth discovery. SIGKDD Explorations Newsletter 17, 2 (2015), 1–16.

Digital Library

[27]

Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, and Jiawei Han. 2015. On the discovery of evolving truth. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 675–684.

Digital Library

[28]

Yuan Li, Benjamin I. P. Rubinstein, and Trevor Cohn. 2019. Exploiting worker correlation for label aggregation in crowdsourcing. In Proceedings of the International Conference on Machine Learning, Vol. 97. PMLR, 3886–3895.

[29]

Yuan Li, Benjamin I. P. Rubinstein, and Trevor Cohn. 2019. Truth inference at scale: A Bayesian model for adjudicating highly redundant crowd annotations. In Proceedings of the International World Wide Web Conference. ACM, 1028–1038.

Digital Library

[30]

Yanying Li, Haipei Sun, and Wendy Hui Wang. 2020. Towards fair truth discovery from biased crowdsourced answers. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 599–607.

Digital Library

[31]

Qiang Liu, Jian Peng, and Alexander T. Ihler. 2012. Variational inference for crowdsourcing. In Proceedings of the Neural Information Processing Systems. Curran Associates, Inc., 701–709.

[32]

Shanshan Lyu, Wentao Ouyang, Huawei Shen, and Xueqi Cheng. 2017. Truth discovery by claim and source embedding. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 2183–2186.

Digital Library

[33]

Shanshan Lyu, Wentao Ouyang, Yongqing Wang, Huawei Shen, and Xueqi Cheng. 2021. Truth discovery by claim and source embedding. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2021), 1264–1275.

[34]

Tengfei Ma, Cao Xiao, Jiayu Zhou, and Fei Wang. 2018. Drug similarity integration through attentive multi-view graph auto-encoders. In IJCAI. ijcai.org, 3477–3483.

[35]

Adam Marcus, Eugene Wu, Samuel Madden, and Robert C. Miller. 2011. Crowdsourced databases: Query processing with people. In Proceedings of the Conference on Innovative Data Systems Research. 211–214. www.cidrdb.org.

[36]

Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, and Yun Cheng. 2015. Truth discovery on crowd sensing of correlated entities. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, 169–182.

Digital Library

[37]

Aditya G. Parameswaran, Hyunjung Park, Hector Garcia-Molina, Neoklis Polyzotis, and Jennifer Widom. 2012. Deco: Declarative crowdsourcing. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 1203–1212.

Digital Library

[38]

Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. 2014. Fusing data with correlations. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 433–444.

Digital Library

[39]

Hazem Sallouha, Alessandro Chiumento, and Sofie Pollin. 2021. Aerial vehicles tracking using noncoherent crowdsourced wireless networks. IEEE Transactions on Vehicular Technology 70, 10 (2021), 10780–10791.

[40]

Anish Das Sarma, Xin Luna Dong, and Alon Y. Halevy. 2011. Data integration with dependent sources. In Proceedings of the 14th International Conference on Extending Database Technology. ACM, 401–412.

Digital Library

[41]

Zheyuan Ryan Shi, Leah Lizarondo, and Fei Fang. 2021. A recommender system for crowdsourcing food rescue platforms. In Proceedings of the International World Wide Web Conference. ACM, 857–865.

Digital Library

[42]

Jianchao Tang, Shaojing Fu, Ming Xu, Yuchuan Luo, and Kai Huang. 2019. Achieve privacy-preserving truth discovery in crowdsensing systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 1301–1310.

Digital Library

[43]

Yongxin Tong, Yuxiang Zeng, Bolin Ding, Libin Wang, and Lei Chen. 2021. Two-sided online micro-task assignment in spatial crowdsourcing. IEEE Trans. Knowl. Data Eng. 33, 5 (2021), 2295–2309.

[44]

Jiayang Tu, Peng Cheng, and Lei Chen. 2021. Quality-assured synchronized task assignment in crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2021), 1156–1168.

[45]

Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based Bayesian aggregation models for crowdsourcing. In Proceedings of the International World Wide Web Conference. ACM, 155–164.

Digital Library

[46]

Matteo Venanzi, Oliver Parson, Alex Rogers, and Nick R. Jennings. 2015. The ActiveCrowdToolkit: An open-source tool for benchmarking active learning algorithms for crowdsourcing research. In Proceedings of the 3rd AAAI Conference on Human Computation and Crowdsourcing. AAAI Press, 44–45.

[47]

Martin J. Wainwright and Michael I. Jordan. 2008. Graphical models, exponential families, and variational inference. Foundations Trends in Machine Learning 1, 1–2 (2008), 1–305.

Digital Library

[48]

Dong Wang, Md. Tanvir Al Amin, Shen Li, Tarek F. Abdelzaher, Lance M. Kaplan, Siyu Gu, Chenji Pan, Hengchang Liu, Charu C. Aggarwal, Raghu K. Ganti, Xinlei Wang, Prasant Mohapatra, Boleslaw K. Szymanski, and Hieu Khac Le. 2014. Using humans as sensors: An estimation-theoretic perspective. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks. IEEE/ACM, 35–46.

Digital Library

[49]

Ping Wang, Khushbu Agarwal, Colby Ham, Sutanay Choudhury, and Chandan K. Reddy. 2021. Self-supervised learning of contextual embeddings for link prediction in heterogeneous networks. In Proceedings of the International World Wide Web Conference. ACM, 2946–2957.

Digital Library

[50]

Shiguang Wang, Lu Su, Shen Li, Shaohan Hu, Md. Tanvir Al Amin, Hongwei Wang, Shuochao Yao, Lance M. Kaplan, and Tarek F. Abdelzaher. 2015. Scalable social sensing of interdependent phenomena. In Proceedings of the 13th International Symposium on Information Processing in Sensor Networks. ACM, 202–213.

Digital Library

[51]

Yue Wang, Ke Wang, and Chunyan Miao. 2020. Truth discovery against strategic sybil attack in crowdsourcing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 95–104.

Digital Library

[52]

Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS. Curran Associates, Inc., 2035–2043.

[53]

Hanlu Wu, Tengfei Ma, Lingfei Wu, Fangli Xu, and Shouling Ji. 2022. Exploiting heterogeneous graph neural networks with latent worker/task correlation information for label aggregation in crowdsourcing. ACM Transactions on Knowledge Discovery from Data 16, 2 (2022), 27:1–27:18.

Digital Library

[54]

Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, and Baihua Zheng. 2022. CrowdTC: Crowd-powered learning for text classification. ACM Transactions on Knowledge Discovery from Data 16, 1 (2022), 15:1–15:23.

Digital Library

[55]

Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 4805–4815.

[56]

Jing Zhang, Victor S. Sheng, and Tao Li. 2017. Label aggregation for crowdsourcing with Bi-layer clustering. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 921–924.

Digital Library

[57]

Jing Zhang, Victor S. Sheng, and Jian Wu. 2019. Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 3172–3185.

[58]

Jing Zhang, Victor S. Sheng, Jian Wu, and Xindong Wu. 2016. Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering 28, 4 (2016), 1080–1085.

Digital Library

[59]

Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved?Proceedings of the VLDB Endowment 10, 5 (2017), 541–552.

Digital Library

[60]

Dengyong Zhou, John C. Platt, Sumit Basu, and Yi Mao. 2012. Learning from the wisdom of crowds by minimax entropy. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 2204–2212.

Cited By

Chen JFeng JZhang SLi XDjigal H(2025)Robust annotation aggregation in crowdsourcing via enhanced worker ability modelingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10391462:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.ipm.2024.103914
Liu HLiu JTang FLi PChen LYu JZhu YGao MYang YHou X(2024)Graph Contrastive Learning for Truth Inference2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00027(263-275)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00027

Index Terms

Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

Multi-view Clustering with Graph Embedding for Connectome Analysis
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Multi-view clustering has become a widely studied problem in the area of unsupervised learning. It aims to integrate multiple views by taking advantages of the consensus and complimentary information from multiple views. Most of the existing works in ...
Modeling Random Guessing and Task Difficulty for Truth Inference in Crowdsourcing
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

This paper addresses the challenge of truth inference in crowdsourcing applications. We propose a generative method that jointly models tasks' difficulties, workers' abilities and guessing behavior to estimate the truths of crowdsourced tasks, which ...
Co-regularized optimal high-order graph embedding for multi-view clustering
Abstract
Real-world applications frequently involve multiple data modalities in the same samples, which are regarded as multi-view data. Multi-view clustering has been studied extensively in recent years to demonstrate embedded heterogeneity. However, ...
Highlights
- We proposed a co-regularized Optimal High-Order Graph Embedding Method Co-MSE.
- Optimal embedding representation for multi-view data can be obtained in Co-MSE.
- Co-MSE is very efficient and can converge in a few iterations.
- ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 5

June 2023

386 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3583066

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Online AM: 04 October 2022

Accepted: 21 September 2022

Revised: 15 July 2022

Received: 20 January 2022

Published in TKDD Volume 17, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
Program for Innovative Research Team in University of the Ministry of Education
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
589
Total Downloads

Downloads (Last 12 months)174
Downloads (Last 6 weeks)11

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen JFeng JZhang SLi XDjigal H(2025)Robust annotation aggregation in crowdsourcing via enhanced worker ability modelingInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10391462:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.ipm.2024.103914
Liu HLiu JTang FLi PChen LYu JZhu YGao MYang YHou X(2024)Graph Contrastive Learning for Truth Inference2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00027(263-275)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00027

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents