To Check or Not to Check: Syntax, Semantics, and Context in the Language of Check-Worthy Claims

Zuo, Chaoyuan; Karakas, Ayla Ida; Banerjee, Ritwik

doi:10.1007/978-3-030-28577-7_23

Chaoyuan Zuo¹⁷,
Ayla Ida Karakas¹⁸ &
Ritwik Banerjee¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1159 Accesses
2 Citations

Abstract

As the spread of information has received a compelling boost due to pervasive use of social media, so has the spread of misinformation. The sheer volume of data has rendered the traditional methods of expert-driven manual fact-checking largely infeasible. As a result, computational linguistics and data-driven algorithms have been explored in recent years. Despite this progress, identifying and prioritizing what needs to be checked has received little attention. Given that expert-driven manual intervention is likely to remain an important component of fact-checking, especially in specific domains (e.g., politics, environmental science), this identification and prioritization is critical. A successful algorithmic ranking of “check-worthy” claims can help an expert-in-the-loop fact-checking system, thereby reducing the expert’s workload while still tackling the most salient bits of misinformation. In this work, we explore how linguistic syntax, semantics, and the contextual meaning of words play a role in determining the check-worthiness of claims. Our preliminary experiments used explicit stylometric features and simple word embeddings on the English language dataset in the Check-worthiness task of the CLEF-2018 Fact-Checking Lab, where our primary solution outperformed the other systems in terms of the mean average precision, R-precision, reciprocal rank, and precision at k for multiple values k. Here, we present an extension of this approach with more sophisticated word embeddings and report further improvements in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The dataset does not provide this categorization, but we treat them differently since a debate, unlike a speech, has interactive discourse between multiple speakers.

References

Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims, task 1: check-worthiness. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)
Google Scholar
Bruns, A., Highfield, T.: Blogs, Twitter, and breaking news: the produsage of citizen journalism. In: Produsing Theory in a Digital World: The Intersection of Audiences and Production in Contemporary Theory, vol. 80, pp. 15–32. Peter Lang (2012)
Google Scholar
Cao, T.D., Manolescu, I., Tannier, X.: Extracting statistical mentions from textual claims to provide trusted content. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds.) NLDB 2019. LNCS, vol. 11608, pp. 402–408. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23281-8_36
Chapter Google Scholar
Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Journalism, Misinformation and Fact Checking Alternate Paper Track of The Web Conference (2018)
Google Scholar
Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: Conference on Innovative Data Systems Research, CIDR 2011, ACM, Asilomar (2011)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Google Scholar
Diakopoulos, N.: A functional roadmap for innovation in computational journalism. Rutgers University, Technical report (2011)
Google Scholar
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: ACL, no. 2, pp. 171–175 (2012)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL, pp. 363–370 (2005)
Google Scholar
Flew, T., Spurgeon, C., Daniel, A., Swift, A.: The promise of computational journalism. Journal. Pract. 6(2), 157–171 (2012)
Google Scholar
Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: RANLP 2017, pp. 267–276 (2017)
Google Scholar
Ghanem, B., Montes-y Gómez, M., Rangel, F., Rosso, P.: UPV-INAOE-Autoritas-Check that: preliminary approach for checking worthiness of claims. In: CLEF Working Notes (2018)
Google Scholar
Goode, L.: Social news, citizen journalism and democracy. New Media Soc. 11(8), 1287–1305 (2009)
Article Google Scholar
Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 CheckThat! Lab. In: CLEF Working Notes (2018)
Google Scholar
Harris, Z.S.: Distributional Structure. Word 10(2–3), 146–162 (1954)
Article Google Scholar
Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: CIKM, pp. 1835–1838. CIKM (2015)
Google Scholar
Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN), pp. 1322–1328. IEEE (2008)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: ACM SIGKDD, pp. 168–177. ACM (2004)
Google Scholar
Kang, J.S., Feng, S., Akoglu, L., Choi, Y.: ConnotationWordNet: learning connotation over the word+sense network. In: ACL, pp. 1544–1554. Association for Computational Linguistics, June 2014
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klayman, J.: Varieties of confirmation bias. In: Psychology of Learning and Motivation, vol. 32, pp. 385–418. Elsevier (1995)
Google Scholar
Kumar, S., West, R., Leskovec, J.: Disinformation on the web: impact, characteristics, and detection of wikipedia hoaxes. In: Proceedings of 25th International Conference on World Wide Web, pp. 591–602. International WWWW Conference Committee (IW3C2) (2016)
Google Scholar
Le, D.T., Vu, N.T., Blessing, A.: Towards a text analysis system for political debates. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 134–139 (2016)
Google Scholar
Loria, S.: TextBlob: simplified text processing (2014). http://textblob.readthedocs.org/en/dev/
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France, September 2018
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP, pp. 79–86 (2002)
Google Scholar
Patwari, A., Goldwasser, D., Bagchi, S.: TATHYA: a multi-classifier system for detecting check-worthy statements in political debates. In: CIKM, pp. 1–4 (2017)
Google Scholar
Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html
Qazvinian, V., Rosengren, E., Radev, D., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: EMNLP, pp. 1589–1599. ACL (2011)
Google Scholar
Recasens, M., Danescu-Niculescu-Mizil, C., Jurafsky, D.: Linguistic models for analyzing and detecting biased language. In: ACL, vol. 1, pp. 1650–1659 (2013)
Google Scholar
Rodriguez, M.G., Gummadi, K., Schoelkopf, B.: Quantifying information overload in social media and its impact on social contagions. In: ICWSM (2014)
Google Scholar
Stanovsky, G., Michael, J., Zettlemoyer, L., Dagan, I.: Supervised open information extraction. In: NAACL-HLT, vol. 1 (Long Papers), pp. 885–895 (2018)
Google Scholar
Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Trans. Pattern Anal. Mach. Intell. 1(3), 306–307 (1979)
Article Google Scholar
Vlachos, A., Riedel, S.: Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pp. 18–22 (2014)
Google Scholar
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: EMNLP, pp. 347–354 (2005)
Google Scholar
Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: Toward computational fact-checking. Proc. VLDB Endow. 7(7), 589–600 (2014)
Article Google Scholar
Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service
Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the U.S. National Science Foundation (NSF) under the award SES-1834597.

Author information

Authors and Affiliations

Department of Computer Science, Stony Brook University, Stony Brook, New York, 11794, USA
Chaoyuan Zuo & Ritwik Banerjee
Department of Linguistics, Stony Brook University, Stony Brook, New York, 11794, USA
Ayla Ida Karakas

Authors

Chaoyuan Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Ayla Ida Karakas
View author publications
You can also search for this author in PubMed Google Scholar
Ritwik Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoyuan Zuo .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zuo, C., Karakas, A.I., Banerjee, R. (2019). To Check or Not to Check: Syntax, Semantics, and Context in the Language of Check-Worthy Claims. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_23
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics