Abstract
As the spread of information has received a compelling boost due to pervasive use of social media, so has the spread of misinformation. The sheer volume of data has rendered the traditional methods of expert-driven manual fact-checking largely infeasible. As a result, computational linguistics and data-driven algorithms have been explored in recent years. Despite this progress, identifying and prioritizing what needs to be checked has received little attention. Given that expert-driven manual intervention is likely to remain an important component of fact-checking, especially in specific domains (e.g., politics, environmental science), this identification and prioritization is critical. A successful algorithmic ranking of “check-worthy” claims can help an expert-in-the-loop fact-checking system, thereby reducing the expert’s workload while still tackling the most salient bits of misinformation. In this work, we explore how linguistic syntax, semantics, and the contextual meaning of words play a role in determining the check-worthiness of claims. Our preliminary experiments used explicit stylometric features and simple word embeddings on the English language dataset in the Check-worthiness task of the CLEF-2018 Fact-Checking Lab, where our primary solution outperformed the other systems in terms of the mean average precision, R-precision, reciprocal rank, and precision at k for multiple values k. Here, we present an extension of this approach with more sophisticated word embeddings and report further improvements in this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dataset does not provide this categorization, but we treat them differently since a debate, unlike a speech, has interactive discourse between multiple speakers.
References
Atanasova, P., et al.: Overview of the CLEF-2018 CheckThat! Lab on automatic identification and verification of political claims, task 1: check-worthiness. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)
Bruns, A., Highfield, T.: Blogs, Twitter, and breaking news: the produsage of citizen journalism. In: Produsing Theory in a Digital World: The Intersection of Audiences and Production in Contemporary Theory, vol. 80, pp. 15–32. Peter Lang (2012)
Cao, T.D., Manolescu, I., Tannier, X.: Extracting statistical mentions from textual claims to provide trusted content. In: Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M. (eds.) NLDB 2019. LNCS, vol. 11608, pp. 402–408. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23281-8_36
Cazalens, S., Lamarre, P., Leblay, J., Manolescu, I., Tannier, X.: A content management perspective on fact-checking. In: Journalism, Misinformation and Fact Checking Alternate Paper Track of The Web Conference (2018)
Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: Conference on Innovative Data Systems Research, CIDR 2011, ACM, Asilomar (2011)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)
Diakopoulos, N.: A functional roadmap for innovation in computational journalism. Rutgers University, Technical report (2011)
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: ACL, no. 2, pp. 171–175 (2012)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL, pp. 363–370 (2005)
Flew, T., Spurgeon, C., Daniel, A., Swift, A.: The promise of computational journalism. Journal. Pract. 6(2), 157–171 (2012)
Gencheva, P., Nakov, P., Màrquez, L., Barrón-Cedeño, A., Koychev, I.: A context-aware approach for detecting worth-checking claims in political debates. In: RANLP 2017, pp. 267–276 (2017)
Ghanem, B., Montes-y Gómez, M., Rangel, F., Rosso, P.: UPV-INAOE-Autoritas-Check that: preliminary approach for checking worthiness of claims. In: CLEF Working Notes (2018)
Goode, L.: Social news, citizen journalism and democracy. New Media Soc. 11(8), 1287–1305 (2009)
Hansen, C., Hansen, C., Simonsen, J.G., Lioma, C.: The Copenhagen team participation in the check-worthiness task of the competition of automatic identification and verification of claims in political debates of the CLEF-2018 CheckThat! Lab. In: CLEF Working Notes (2018)
Harris, Z.S.: Distributional Structure. Word 10(2–3), 146–162 (1954)
Hassan, N., Li, C., Tremayne, M.: Detecting check-worthy factual claims in presidential debates. In: CIKM, pp. 1835–1838. CIKM (2015)
Hassan, N., et al.: ClaimBuster: the first-ever end-to-end fact-checking system. Proc. VLDB Endow. 10(12), 1945–1948 (2017)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE Joint Conference on Neural Networks (IJCNN), pp. 1322–1328. IEEE (2008)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: ACM SIGKDD, pp. 168–177. ACM (2004)
Kang, J.S., Feng, S., Akoglu, L., Choi, Y.: ConnotationWordNet: learning connotation over the word+sense network. In: ACL, pp. 1544–1554. Association for Computational Linguistics, June 2014
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klayman, J.: Varieties of confirmation bias. In: Psychology of Learning and Motivation, vol. 32, pp. 385–418. Elsevier (1995)
Kumar, S., West, R., Leskovec, J.: Disinformation on the web: impact, characteristics, and detection of wikipedia hoaxes. In: Proceedings of 25th International Conference on World Wide Web, pp. 591–602. International WWWW Conference Committee (IW3C2) (2016)
Le, D.T., Vu, N.T., Blessing, A.: Towards a text analysis system for political debates. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 134–139 (2016)
Loria, S.: TextBlob: simplified text processing (2014). http://textblob.readthedocs.org/en/dev/
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nakov, P., et al.: Overview of the CLEF-2018 lab on automatic identification and verification of claims in political debates. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France, September 2018
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP, pp. 79–86 (2002)
Patwari, A., Goldwasser, D., Bagchi, S.: TATHYA: a multi-classifier system for detecting check-worthy statements in political debates. In: CIKM, pp. 1–4 (2017)
Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html
Qazvinian, V., Rosengren, E., Radev, D., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: EMNLP, pp. 1589–1599. ACL (2011)
Recasens, M., Danescu-Niculescu-Mizil, C., Jurafsky, D.: Linguistic models for analyzing and detecting biased language. In: ACL, vol. 1, pp. 1650–1659 (2013)
Rodriguez, M.G., Gummadi, K., Schoelkopf, B.: Quantifying information overload in social media and its impact on social contagions. In: ICWSM (2014)
Stanovsky, G., Michael, J., Zettlemoyer, L., Dagan, I.: Supervised open information extraction. In: NAACL-HLT, vol. 1 (Long Papers), pp. 885–895 (2018)
Trunk, G.V.: A problem of dimensionality: a simple example. IEEE Trans. Pattern Anal. Mach. Intell. 1(3), 306–307 (1979)
Vlachos, A., Riedel, S.: Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pp. 18–22 (2014)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: EMNLP, pp. 347–354 (2005)
Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: Toward computational fact-checking. Proc. VLDB Endow. 7(7), 589–600 (2014)
Xiao, H.: bert-as-service (2018). https://github.com/hanxiao/bert-as-service
Zuo, C., Karakas, A., Banerjee, R.: A hybrid recognition system for check-worthy claims using heuristics and supervised learning. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) CLEF 2018 Working Notes (2018)
Acknowledgment
This work was supported in part by the U.S. National Science Foundation (NSF) under the award SES-1834597.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zuo, C., Karakas, A.I., Banerjee, R. (2019). To Check or Not to Check: Syntax, Semantics, and Context in the Language of Check-Worthy Claims. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-28577-7_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)