Skip to main content

To Check or Not to Check: Syntax, Semantics, and Context in the Language of Check-Worthy Claims

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Abstract

As the spread of information has received a compelling boost due to pervasive use of social media, so has the spread of misinformation. The sheer volume of data has rendered the traditional methods of expert-driven manual fact-checking largely infeasible. As a result, computational linguistics and data-driven algorithms have been explored in recent years. Despite this progress, identifying and prioritizing what needs to be checked has received little attention. Given that expert-driven manual intervention is likely to remain an important component of fact-checking, especially in specific domains (e.g., politics, environmental science), this identification and prioritization is critical. A successful algorithmic ranking of “check-worthy” claims can help an expert-in-the-loop fact-checking system, thereby reducing the expert’s workload while still tackling the most salient bits of misinformation. In this work, we explore how linguistic syntax, semantics, and the contextual meaning of words play a role in determining the check-worthiness of claims. Our preliminary experiments used explicit stylometric features and simple word embeddings on the English language dataset in the Check-worthiness task of the CLEF-2018 Fact-Checking Lab, where our primary solution outperformed the other systems in terms of the mean average precision, R-precision, reciprocal rank, and precision at k for multiple values k. Here, we present an extension of this approach with more sophisticated word embeddings and report further improvements in this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The dataset does not provide this categorization, but we treat them differently since a debate, unlike a speech, has interactive discourse between multiple speakers.

References

  1. Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims, task 1: check-worthiness. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)

    Google Scholar 

  2. Bruns, A., Highfield, T.: Blogs, Twitter, and breaking news: the produsage of citizen journalism. In: Produsing Theory in a Digital World: The Intersection of Audiences and Production in Contemporary Theory, vol. 80, pp. 15–32. Peter Lang (2012)

    Google Scholar 

  3. Cao, T.D., Manolescu, I., Tannier, X.: Extracting statistical mentions from textual claims to provide trusted content. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds.) NLDB 2019. LNCS, vol. 11608, pp. 402–408. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23281-8_36

    Chapter  Google Scholar 

  4. Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Journalism, Misinformation and Fact Checking Alternate Paper Track of The Web Conference (2018)

    Google Scholar 

  5. Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: Conference on Innovative Data Systems Research, CIDR 2011, ACM, Asilomar (2011)

    Google Scholar 

  6. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)

    Google Scholar 

  7. Diakopoulos, N.: A functional roadmap for innovation in computational journalism. Rutgers University, Technical report (2011)

    Google Scholar 

  8. Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: ACL, no. 2, pp. 171–175 (2012)

    Google Scholar 

  9. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL, pp. 363–370 (2005)

    Google Scholar 

  10. Flew, T., Spurgeon, C., Daniel, A., Swift, A.: The promise of computational journalism. Journal. Pract. 6(2), 157–171 (2012)

    Google Scholar 

  11. Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: RANLP 2017, pp. 267–276 (2017)

    Google Scholar 

  12. Ghanem, B., Montes-y Gómez, M., Rangel, F., Rosso, P.: UPV-INAOE-Autoritas-Check that: preliminary approach for checking worthiness of claims. In: CLEF Working Notes (2018)

    Google Scholar 

  13. Goode, L.: Social news, citizen journalism and democracy. New Media Soc. 11(8), 1287–1305 (2009)

    Article  Google Scholar 

  14. Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 CheckThat! Lab. In: CLEF Working Notes (2018)

    Google Scholar 

  15. Harris, Z.S.: Distributional Structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  16. Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: CIKM, pp. 1835–1838. CIKM (2015)

    Google Scholar 

  17. Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)

    Article  Google Scholar 

  18. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  19. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: ACM SIGKDD, pp. 168–177. ACM (2004)

    Google Scholar 

  20. Kang, J.S., Feng, S., Akoglu, L., Choi, Y.: ConnotationWordNet: learning connotation over the word+sense network. In: ACL, pp. 1544–1554. Association for Computational Linguistics, June 2014

    Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  22. Klayman, J.: Varieties of confirmation bias. In: Psychology of Learning and Motivation, vol. 32, pp. 385–418. Elsevier (1995)

    Google Scholar 

  23. Kumar, S., West, R., Leskovec, J.: Disinformation on the web: impact, characteristics, and detection of wikipedia hoaxes. In: Proceedings of 25th International Conference on World Wide Web, pp. 591–602. International WWWW Conference Committee (IW3C2) (2016)

    Google Scholar 

  24. Le, D.T., Vu, N.T., Blessing, A.: Towards a text analysis system for political debates. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 134–139 (2016)

    Google Scholar 

  25. Loria, S.: TextBlob: simplified text processing (2014). http://textblob.readthedocs.org/en/dev/

  26. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  27. Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France, September 2018

    Google Scholar 

  28. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP, pp. 79–86 (2002)

    Google Scholar 

  29. Patwari, A., Goldwasser, D., Bagchi, S.: TATHYA: a multi-classifier system for detecting check-worthy statements in political debates. In: CIKM, pp. 1–4 (2017)

    Google Scholar 

  30. Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html

  31. Qazvinian, V., Rosengren, E., Radev, D., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: EMNLP, pp. 1589–1599. ACL (2011)

    Google Scholar 

  32. Recasens, M., Danescu-Niculescu-Mizil, C., Jurafsky, D.: Linguistic models for analyzing and detecting biased language. In: ACL, vol. 1, pp. 1650–1659 (2013)

    Google Scholar 

  33. Rodriguez, M.G., Gummadi, K., Schoelkopf, B.: Quantifying information overload in social media and its impact on social contagions. In: ICWSM (2014)

    Google Scholar 

  34. Stanovsky, G., Michael, J., Zettlemoyer, L., Dagan, I.: Supervised open information extraction. In: NAACL-HLT, vol. 1 (Long Papers), pp. 885–895 (2018)

    Google Scholar 

  35. Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Trans. Pattern Anal. Mach. Intell. 1(3), 306–307 (1979)

    Article  Google Scholar 

  36. Vlachos, A., Riedel, S.: Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pp. 18–22 (2014)

    Google Scholar 

  37. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: EMNLP, pp. 347–354 (2005)

    Google Scholar 

  38. Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: Toward computational fact-checking. Proc. VLDB Endow. 7(7), 589–600 (2014)

    Article  Google Scholar 

  39. Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service

  40. Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the U.S. National Science Foundation (NSF) under the award SES-1834597.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaoyuan Zuo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zuo, C., Karakas, A.I., Banerjee, R. (2019). To Check or Not to Check: Syntax, Semantics, and Context in the Language of Check-Worthy Claims. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28577-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28576-0

  • Online ISBN: 978-3-030-28577-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics