ABSTRACT
The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an explosion of unsolicited information on the web presented as informal text. To account for the aforementioned, in this paper, we propose DESYR. It is a framework that intends on annulling the said issues for informal web-based text by leveraging a combination of hierarchical representation learning (dependency-inspired Poincaré embedding), definition-based alignment, and feature projection. We do away with fine-tuning compute-heavy language models in favor of fabricating a more domain-centric but lighter approach. Experimental results indicate that DESYR builds upon the state-of-the-art system across four benchmark claim datasets, most of which were constructed with informal texts. We see an increase of 3 claim-F1 points on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1 points on the Online Comments (OC) dataset, an increase of 24 claim-F1 points and 17 macro-F1 points on the Web Discourse (WD) dataset, and an increase of 8 claim-F1 points and 5 macro-F1 points on the Micro Texts (MT) dataset. We also perform an extensive analysis of the results. We make a 100-D pre-trained version of our Poincaré-variant along with the source code.
- Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 2 (2017), 211--36.Google ScholarCross Ref
- Alberto Barrón-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. Checkthat! at clef 2020: Enabling the automatic identification and verification of claims in social media. In European Conference on Information Retrieval. Springer, Nature Publishing Group, 499--507.Google ScholarDigital Library
- Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media. In Advances in Information Retrieval. Springer International Publishing, Cham, 499--507.Google Scholar
- Matthew Baum, Katherine Ognyanova, Hanyu Chwe, Alexi Quintana, Roy H Perlis, David Lazer, James Druckman, Mauricio Santillana, Jennifer Lin, John Della Volpe, et al. 2021. The COVID States Project# 14: Misinformation and Vaccine Acceptance. (2021).Google Scholar
- Emily M. Bender, Jonathan T. Morgan, Meghan Oxley, Mark Zachry, Brian Hutchinson, Alex Marin, Bin Zhang, and Mari Ostendorf. 2011. Annotating Social Acts: Authority Claims and Alignment Moves in Wikipedia Talk Pages. In Proceedings of the Workshop on Language in Social Media (LSM 2011). Association for Computational Linguistics, Portland, Oregon, 48--57. https://www.aclweb.org/anthology/W11-0707 Google ScholarDigital Library
- O. Biran and O. Rambow. 2011. Identifying Justifications in Written Dialogs. In 2011 IEEE Fifth International Conference on Semantic Computing, Vol. 5. 162--168. Google ScholarDigital Library
- Silvere Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Automat. Control 58, 9 (2013), 2217--2229.Google ScholarCross Ref
- Tuhin Chakrabarty, Christopher Hidey, and Kathleen McKeown. 2019. IMHO fine-tuning improves claim detection. arXiv preprint arXiv:1905.07000 (2019), 558--563.Google Scholar
- Gullal S Cheema, Sherzod Hakimov, Eric Müller-Budack, and Ralph Ewerth. 2021. On the Role of Images for Analyzing Claims in Social Media. arXiv preprint arXiv:2103.09602 (2021).Google Scholar
- Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, and Iryna Gurevych. 2017. What is the Essence of a Claim? Cross-Domain Claim Identification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2055--2066.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.Google Scholar
- Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning. PMLR, 1180-- 1189. Google ScholarDigital Library
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The journal of machine learning research 17, 1 (2016), 2096--2030. Google ScholarDigital Library
- Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893 (2018).Google Scholar
- Shreya Gupta, Parantak Singh, Megha Sundriyal, Md. Shad Akhtar, and Tanmoy Chakraborty. 2021. LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 3178--3188. https://www.aclweb.org/anthology/2021.eacl-main.277Google ScholarCross Ref
- Ivan Habernal and Iryna Gurevych. 2015. Exploiting Debate Portals for SemiSupervised Argumentation Mining in User-Generated Web Discourse. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 2127--2137. https://doi.org/10.18653/v1/D15--1255Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Ran Levy, Yonatan Bilu, Daniel Hershcovich, Ehud Aharoni, and Noam Slonim. 2014. Context dependent claim detection. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1489-- 1500.Google Scholar
- Ran Levy, Shai Gretz, Benjamin Sznajder, Shay Hummel, Ranit Aharonov, and Noam Slonim. 2017. Unsupervised corpus--wide claim detection. In Proceedings of the 4th Workshop on Argument Mining. 79--84.Google ScholarCross Ref
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
- Marco Lippi and Paolo Torroni. 2015. Context-independent claim detection for argument mining. In Twenty-Fourth International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
- UN News. 2020. During this coronavirus pandemic,?fake news' is putting lives at risk: UNESCO. (2020).Google Scholar
- Maximilian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6341--6350. http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf Google ScholarDigital Library
- Alex Nikolov, Giovanni Da San Martino, Ivan Koychev, and Preslav Nakov. 2020. Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models. arXiv:2009.02931 (2020). arXiv:2009.02931 [cs.CL]Google Scholar
- World Health Organization et al. 2020. Immunizing the public against misinformation. Recuperado em (2020).Google Scholar
- Andreas Peldszus and Manfred Stede. 2015. Joint prediction in MST-style discourse parsing for argumentation mining. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 938--948. https://doi.org/10.18653/v1/D15--1110Google ScholarCross Ref
- Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
- Qi Qin, Wenpeng Hu, and Bing Liu. 2020. Feature projection for improved text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8161--8171.Google ScholarCross Ref
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084Google ScholarCross Ref
- Sara Rosenthal and Kathleen McKeown. 2012. Detecting opinionated claims in online discussions. In 2012 IEEE sixth international conference on semantic computing. IEEE, 30--37. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9. https://doi.org/10.1109/CVPR.2015.7298594Google ScholarCross Ref
- Stephen E Toulmin. 2003. The uses of argument. Cambridge university press.Google Scholar
- Andrew Whalen and Kevin Laland. 2015. Conformity biased transmission in social networks. Journal of Theoretical Biology 380 (2015), 542--549.Google ScholarCross Ref
- Evan Williams, Paul Rodrigues, and Valerie Novak. 2020. Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models. arXiv: 2009.02431 (2020). arXiv:2009.02431 [cs.CL]Google Scholar
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019).Google Scholar
- Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, and Feifei Li. 2018. Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1049--1058. Google ScholarDigital Library
Index Terms
- DESYR: Definition and Syntactic Representation Based Claim Detection on the Web
Recommendations
Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans
FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval EvaluationThe rapid development of online social media platforms has enabled a significant increase in content creation and information exchange, which has been extremely beneficial. These platforms, however, have also become a haven for those who spread false ...
Celebrity's self-disclosure on Twitter and parasocial relationships
This study investigated how celebrities' self-disclosure on personal social media accounts, particularly Twitter, affects fans' perceptions. An online survey was utilized among a sample of 429 celebrity followers on Twitter. Results demonstrated that ...
Inspecting interactions: online news media synergies in social media
ASONAM '18: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningThe rising popularity of social media has radically changed the way news content is propagated, including interactive attempts with new dimensions. To date, traditional news media such as newspapers, television and radio have already adapted their ...
Comments