ABSTRACT
This paper introduces an active-learning-based truth estimator for social networks, such as Twitter, that enhances estimation accuracy significantly by requesting a well-selected (small) fraction of data to be labeled. Data assessment and truth discovery from arbitrary open online sources are a hard problem due to uncertainty regarding source reliability. Multiple truth finding systems were developed to solve this problem. Their accuracy is limited by the noisy nature of the data, where distortions, fabrications, omissions, and duplication are introduced. This paper presents a semi-supervised truth estimator for social networks, in which a portion of inputs are carefully selected to be reliably verified. The challenge is to find the subset of observations to verify that would maximally enhance the overall fact-finding accuracy. This work extends previous passive approaches to recursive truth estimation, as well as semi-supervised approaches where the estimator has no control over the choice of data to be labeled. Results show that by optimally selecting claims to be verified, we improve estimated accuracy by 12% over unsupervised baseline, and by 5% over previous semi-supervised approaches.
- Md Tanvir Al Amin, Charu Aggarwal, Shuochao Yao, Tarek Abdelzaher, and Lance Kaplan. 2017. Unveiling polarization in social networks: A matrix factorization approach. Technical Report. IEEE.Google Scholar
- Jeffrey A Burke, Deborah Estrin, Mark Hansen, Andrew Parker, Nithya Ramanathan, Sasank Reddy, and Mani B Srivastava. 2006. Participatory sensing. Center for Embedded Network Sensing(2006).Google Scholar
- Hang Cui, Tarek Abdelzaher, and Lance Kaplan. 2018. Recursive Truth Estimation of Time-Varying Sensing Data from Online Open Sources. In International Conference on Distributed Computing in Sensor Systems (DCOSS). New York, NY.Google ScholarCross Ref
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Integrating conflicting data: the role of source dependence. Proceedings of the VLDB Endowment 2, 1 (2009), 550-561. Google ScholarDigital Library
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2009. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment 2, 1 (2009), 562-573. Google ScholarDigital Library
- Luyang Liu, Hongyu Li, Jian Liu, Cagdas Karatas, Yan Wang, Marco Gruteser, Yingying Chen, and Richard P Martin. 2017. Bigroad: Scaling road data acquisition for dependable self-driving. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 371-384. Google ScholarDigital Library
- Chuishi Meng, Houping Xiao, Lu Su, and Yun Cheng. 2016. Tackling the Redundancy and Sparsity in Crowd Sensing Applications.. In SenSys. 150-163. Google ScholarDigital Library
- Alan Mislove, Massimiliano Marcon, Krishna P Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. ACM, 29-42. Google ScholarDigital Library
- Praneeth Netrapalli and Sujay Sanghavi. 2012. Learning the Graph of Epidemic Cascades. SIGMETRICS Perform. Eval. Rev. 40, 1 (June 2012), 211-222. Google ScholarDigital Library
- Praneeth Netrapalli and Sujay Sanghavi. 2012. Learning the graph of epidemic cascades. In ACM SIGMETRICS Performance Evaluation Review, Vol. 40. ACM, 211-222. Google ScholarDigital Library
- Jeff Pasternack and Dan Roth. 2010. Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 877-885. Google ScholarDigital Library
- Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1009-1020. Google ScholarDigital Library
- Tauhidur Rahman, Alexander Travis Adams, Perry Schein, Aadhar Jain, David Erickson, and Tanzeem Choudhury. 2016. Nutrilyzer: A Mobile System for Characterizing Liquid Food with Photoacoustic Effect.. In SenSys. 123-136. Google ScholarDigital Library
- Dong Wang, Md Tanvir Amin, Shen Li, Tarek Abdelzaher, Lance Kaplan, Siyu Gu, Chenji Pan, Hengchang Liu, Charu C Aggarwal, Raghu Ganti, 2014. Using humans as sensors: an estimation-theoretic perspective. In Information Processing in Sensor Networks, IPSN-14 Proceedings of the 13th International Symposium on. IEEE, 35-46. Google ScholarDigital Library
- Dong Wang, Lance Kaplan, Hieu Le, and Tarek Abdelzaher. 2012. On truth discovery in social sensing: A maximum likelihood estimation approach. In Information Processing in Sensor Networks (IPSN), 2012 ACM/IEEE 11th International Conference on. IEEE, 233-244. Google ScholarDigital Library
- Shiguang Wang, Dong Wang, Lu Su, Lance Kaplan, and Tarek F Abdelzaher. 2014. Towards cyber-physical systems in social spaces: The data reliability challenge. In Real-Time Systems Symposium (RTSS), 2014 IEEE. IEEE, 74-85.Google ScholarCross Ref
- Shuochao Yao, Md Tanvir Amin, Lu Su, Shaohan Hu, Shen Li, Shiguang Wang, Yiran Zhao, Tarek Abdelzaher, Lance Kaplan, Charu Aggarwal, 2016. Recursive ground truth estimator for social data streams. In Information Processing in Sensor Networks (IPSN), 2016 15th ACM/IEEE International Conference on. IEEE, 1-12. Google ScholarDigital Library
- Shuochao Yao, Md Tanvir Amin, Lu Su, Shaohan Hu, Shen Li, Shiguang Wang, Yiran Zhao, Tarek Abdelzaher, Lance Kaplan, Charu Aggarwal, and Aylin Yener. 2016. Recursive Ground Truth Estimator for Social Data Streams. In Proceedings of the 15th International Conference on Information Processing in Sensor Networks(IPSN '16). IEEE Press, Piscataway, NJ, USA, Article 14, 12 pages. http://dl.acm.org/citation.cfm?id=2959355.2959369 Google ScholarDigital Library
- Xiaoxin Yin, Jiawei Han, and S Yu Philip. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering 20, 6(2008), 796-808. Google ScholarDigital Library
- Xiaoxin Yin and Wenzhao Tan. 2011. Semi-supervised truth discovery. In Proceedings of the 20th international conference on World wide web. ACM, 217-226. Google ScholarDigital Library
Recommendations
On truth discovery in social sensing: a maximum likelihood estimation approach
IPSN '12: Proceedings of the 11th international conference on Information Processing in Sensor NetworksThis paper addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. A ...
Maximum likelihood analysis of conflicting observations in social sensing
This article addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. ...
SenseLens: An Efficient Social Signal Conditioning System for True Event Detection
This article narrows the gap between physical sensing systems that measure physical signals and social sensing systems that measure information signals by (i) defining a novel algorithm for extracting information signals (building on results from text ...
Comments