short-paper

Public Access

TATHYA: A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates

Authors:

Dan Goldwasser,

Saurabh BagchiAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 2259 - 2262

https://doi.org/10.1145/3132847.3133150

Published: 06 November 2017 Publication History

Abstract

Fact-checking political discussions has become an essential clog in computational journalism. This task encompasses an important sub-task---identifying the set of statements with 'check-worthy' claims. Previous work has treated this as a simple text classification problem discounting the nuances involved in determining what makes statements check-worthy. We introduce a dataset of political debates from the 2016 US Presidential election campaign annotated using all major fact-checking media outlets and show that there is a need to model conversation context, debate dynamics and implicit world knowledge. We design a multi-classifier system TATHYA, that models latent groupings in data and improves state-of-art systems in detecting check-worthy statements by 19.5% in F1-score on a held-out test set, gaining primarily gaining in Recall.

References

[1]

Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Technical Report. National Bureau of Economic Research.

[2]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research Vol. 3, Jan (2003), 993--1022.

Digital Library

[3]

Ming-Wei Chang, Dan Goldwasser, Dan Roth, and Vivek Srikumar. 2010. Discriminative learning over constrained latent representations Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 429--437.

Digital Library

[4]

Y-Y Chou and Linda G Shapiro. 2003. A hierarchical multiple classifier learning algorithm. Pattern Analysis & Applications Vol. 6, 2 (2003), 150--168.

Digital Library

[5]

Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 171--175.

Digital Library

[6]

Stephan Greene and Philip Resnik. 2009. More than words: Syntactic packaging and implicit sentiment Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics. Association for Computational Linguistics, 503--511.

Digital Library

[7]

Naeemul Hassan, Bill Adair, James T Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. 2015 a. The quest to automate fact-checking. Computation and Journalism Symposium (2015).

[8]

Naeemul Hassan, Chengkai Li, and Mark Tremayne. 2015 b. Detecting check-worthy factual claims in presidential debates Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1835--1838.

Digital Library

[9]

Julien Leblay. 2017. A Declarative Approach to Data-Driven Fact Checking Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California, USA. 147--153.

[10]

Marco Lippi and Paolo Torroni. 2015. Context-Independent Claim Detection for Argument Mining. IJCAI, Vol. Vol. 15. 185--191.

Digital Library

[11]

Rada Mihalcea and Carlo Strapparava. 2009. The lie detector: Explorations in the automatic recognition of deceptive language Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, 309--312.

Digital Library

[12]

Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 309--319.

Digital Library

[13]

Jeff Pasternack and Dan Roth. 2010. Knowing what to believe (when you already know something) Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 877--885.

Digital Library

[14]

Isaac Persing and Vincent Ng. 2016. End-to-end argumentation mining in student essays. Proceedings of NAACL-HLT. 1384--1394.

[15]

Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, and Lillian Lee. 2016. Winning arguments: Interaction dynamics and persuasion strategies in good-faith online discussions. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 613--624.

Digital Library

[16]

James Thorne and Andreas Vlachos. 2017. An Extensible Framework for Verification of Numerical Claims. EACL 2017 (2017), 37.

[17]

Andreas Vlachos and Sebastian Riedel. 2014. Fact Checking: Task definition and dataset construction. ACL 2014 (2014), 18.

[18]

Andreas Vlachos and Sebastian Riedel. 2015. Identification and verification of simple claims about statistical properties Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2596--2601.

Cited By

Rahman MKarim RArefin MDhar PHossain GShimamura T(2025)Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detectionDiscover Applied Sciences10.1007/s42452-024-06444-67:1Online publication date: 11-Jan-2025
https://doi.org/10.1007/s42452-024-06444-6
Ruiz-Dolz RHeras SGarcía-Fornes A(2025)An introduction to computational argumentation research from a human argumentation perspectiveAutonomous Agents and Multi-Agent Systems10.1007/s10458-025-09692-x39:1Online publication date: 1-Jun-2025
https://dl.acm.org/doi/10.1007/s10458-025-09692-x
Martinez-Rico JAraujo LMartinez-Romo J(2024)Building a framework for fake news detection in the health domainPLOS ONE10.1371/journal.pone.030536219:7(e0305362)Online publication date: 8-Jul-2024
https://doi.org/10.1371/journal.pone.0305362
Show More Cited By

Index Terms

TATHYA: A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
      1. Social networking sites
2. Information systems
  1. World Wide Web
    1. Web mining
      1. Web log analysis
    2. Web searching and information discovery
      1. Content ranking

Recommendations

Detecting Check-worthy Factual Claims in Presidential Debates
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Public figures such as politicians make claims about "facts" all the time. Journalists and citizens spend a good amount of time checking the veracity of such claims. Toward automatic fact checking, we developed tools to find check-worthy factual claims ...
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

This paper introduces how ClaimBuster, a fact-checking platform, uses natural language processing and supervised learning to detect important factual claims in political discourses. The claim spotting model is built using a human-labeled dataset of ...
Spreading the news: how can journalists gain more engagement for their tweets?
WebSci '16: Proceedings of the 8th ACM Conference on Web Science

News media face many serious concerns as their distribution channels are gradually being taken over by third parties (e.g., people sharing news on Twitter and Facebook, and GoogleNews acting as a news aggregator). If traditional media is to survive at ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
580
Total Downloads

Downloads (Last 12 months)136
Downloads (Last 6 weeks)18

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rahman MKarim RArefin MDhar PHossain GShimamura T(2025)Facilitating automated fact-checking: a machine learning based weighted ensemble technique for claim detectionDiscover Applied Sciences10.1007/s42452-024-06444-67:1Online publication date: 11-Jan-2025
https://doi.org/10.1007/s42452-024-06444-6
Ruiz-Dolz RHeras SGarcía-Fornes A(2025)An introduction to computational argumentation research from a human argumentation perspectiveAutonomous Agents and Multi-Agent Systems10.1007/s10458-025-09692-x39:1Online publication date: 1-Jun-2025
https://dl.acm.org/doi/10.1007/s10458-025-09692-x
Martinez-Rico JAraujo LMartinez-Romo J(2024)Building a framework for fake news detection in the health domainPLOS ONE10.1371/journal.pone.030536219:7(e0305362)Online publication date: 8-Jul-2024
https://doi.org/10.1371/journal.pone.0305362
Meng KJimenez DDevasier JNaraparaju SArslan FObembe DLi C(2024)Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual ClaimsACM Transactions on Intelligent Systems and Technology10.1145/3689212Online publication date: 20-Aug-2024
https://doi.org/10.1145/3689212
Nenno S(2024)Propositional claim detection: a task and dataset for the classification of claims to truthJournal of Computational Social Science10.1007/s42001-024-00289-07:2(1727-1752)Online publication date: 9-May-2024
https://doi.org/10.1007/s42001-024-00289-0
Nenno S(2024)Is checkworthiness generalizable? Evaluating task and domain generalization of datasets for claim detectionNeural Computing and Applications10.1007/s00521-024-09896-436:24(15165-15176)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00521-024-09896-4
Bai YColas AWang DChen HDuh WHuang HKato MMothe JPoblete B(2023)MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through Multi-Answer Open-Domain Question AnsweringProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591907(3017-3026)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591907
Kartal YKutlu M(2023)Re-Think Before You Share: A Comprehensive Study on Prioritizing Check-Worthy ClaimsIEEE Transactions on Computational Social Systems10.1109/TCSS.2021.313864210:1(362-375)Online publication date: Feb-2023
https://doi.org/10.1109/TCSS.2021.3138642
Saralegui JTommasel A(2023)Towards Automated Fact-Checking: An Exploratory Study on the Detection of Checkable Statements in Spanish2023 42nd IEEE International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC59417.2023.10315728(1-8)Online publication date: 23-Oct-2023
https://doi.org/10.1109/SCCC59417.2023.10315728
Peñas ADeriu JSharma RValentin GReyes-Montesinos J(2023)Holistic Analysis of Organised Misinformation Activity in Social NetworksDisinformation in Open Online Media10.1007/978-3-031-47896-3_10(132-143)Online publication date: 14-Nov-2023
https://doi.org/10.1007/978-3-031-47896-3_10
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten