skip to main content
10.1145/3459637.3482423acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

DESYR: Definition and Syntactic Representation Based Claim Detection on the Web

Authors Info & Claims
Published:30 October 2021Publication History

ABSTRACT

The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an explosion of unsolicited information on the web presented as informal text. To account for the aforementioned, in this paper, we propose DESYR. It is a framework that intends on annulling the said issues for informal web-based text by leveraging a combination of hierarchical representation learning (dependency-inspired Poincaré embedding), definition-based alignment, and feature projection. We do away with fine-tuning compute-heavy language models in favor of fabricating a more domain-centric but lighter approach. Experimental results indicate that DESYR builds upon the state-of-the-art system across four benchmark claim datasets, most of which were constructed with informal texts. We see an increase of 3 claim-F1 points on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1 points on the Online Comments (OC) dataset, an increase of 24 claim-F1 points and 17 macro-F1 points on the Web Discourse (WD) dataset, and an increase of 8 claim-F1 points and 5 macro-F1 points on the Micro Texts (MT) dataset. We also perform an extensive analysis of the results. We make a 100-D pre-trained version of our Poincaré-variant along with the source code.

References

  1. Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 2 (2017), 211--36.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alberto Barrón-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. Checkthat! at clef 2020: Enabling the automatic identification and verification of claims in social media. In European Conference on Information Retrieval. Springer, Nature Publishing Group, 499--507.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alberto Barrón-Cedeño, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, and Fatima Haouari. 2020. CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media. In Advances in Information Retrieval. Springer International Publishing, Cham, 499--507.Google ScholarGoogle Scholar
  4. Matthew Baum, Katherine Ognyanova, Hanyu Chwe, Alexi Quintana, Roy H Perlis, David Lazer, James Druckman, Mauricio Santillana, Jennifer Lin, John Della Volpe, et al. 2021. The COVID States Project# 14: Misinformation and Vaccine Acceptance. (2021).Google ScholarGoogle Scholar
  5. Emily M. Bender, Jonathan T. Morgan, Meghan Oxley, Mark Zachry, Brian Hutchinson, Alex Marin, Bin Zhang, and Mari Ostendorf. 2011. Annotating Social Acts: Authority Claims and Alignment Moves in Wikipedia Talk Pages. In Proceedings of the Workshop on Language in Social Media (LSM 2011). Association for Computational Linguistics, Portland, Oregon, 48--57. https://www.aclweb.org/anthology/W11-0707 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. Biran and O. Rambow. 2011. Identifying Justifications in Written Dialogs. In 2011 IEEE Fifth International Conference on Semantic Computing, Vol. 5. 162--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Silvere Bonnabel. 2013. Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Automat. Control 58, 9 (2013), 2217--2229.Google ScholarGoogle ScholarCross RefCross Ref
  8. Tuhin Chakrabarty, Christopher Hidey, and Kathleen McKeown. 2019. IMHO fine-tuning improves claim detection. arXiv preprint arXiv:1905.07000 (2019), 558--563.Google ScholarGoogle Scholar
  9. Gullal S Cheema, Sherzod Hakimov, Eric Müller-Budack, and Ralph Ewerth. 2021. On the Role of Images for Analyzing Claims in Social Media. arXiv preprint arXiv:2103.09602 (2021).Google ScholarGoogle Scholar
  10. Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, and Iryna Gurevych. 2017. What is the Essence of a Claim? Cross-Domain Claim Identification. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2055--2066.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.Google ScholarGoogle Scholar
  12. Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning. PMLR, 1180-- 1189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The journal of machine learning research 17, 1 (2016), 2096--2030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, and Tomas Mikolov. 2018. Learning word vectors for 157 languages. arXiv preprint arXiv:1802.06893 (2018).Google ScholarGoogle Scholar
  15. Shreya Gupta, Parantak Singh, Megha Sundriyal, Md. Shad Akhtar, and Tanmoy Chakraborty. 2021. LESA: Linguistic Encapsulation and Semantic Amalgamation Based Generalised Claim Detection from Online Content. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 3178--3188. https://www.aclweb.org/anthology/2021.eacl-main.277Google ScholarGoogle ScholarCross RefCross Ref
  16. Ivan Habernal and Iryna Gurevych. 2015. Exploiting Debate Portals for SemiSupervised Argumentation Mining in User-Generated Web Discourse. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 2127--2137. https://doi.org/10.18653/v1/D15--1255Google ScholarGoogle ScholarCross RefCross Ref
  17. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  18. Ran Levy, Yonatan Bilu, Daniel Hershcovich, Ehud Aharoni, and Noam Slonim. 2014. Context dependent claim detection. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1489-- 1500.Google ScholarGoogle Scholar
  19. Ran Levy, Shai Gretz, Benjamin Sznajder, Shay Hummel, Ranit Aharonov, and Noam Slonim. 2017. Unsupervised corpus--wide claim detection. In Proceedings of the 4th Workshop on Argument Mining. 79--84.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarGoogle ScholarCross RefCross Ref
  21. Marco Lippi and Paolo Torroni. 2015. Context-independent claim detection for argument mining. In Twenty-Fourth International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  23. Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google ScholarGoogle Scholar
  24. UN News. 2020. During this coronavirus pandemic,?fake news' is putting lives at risk: UNESCO. (2020).Google ScholarGoogle Scholar
  25. Maximilian Nickel and Douwe Kiela. 2017. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6341--6350. http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alex Nikolov, Giovanni Da San Martino, Ivan Koychev, and Preslav Nakov. 2020. Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models. arXiv:2009.02931 (2020). arXiv:2009.02931 [cs.CL]Google ScholarGoogle Scholar
  27. World Health Organization et al. 2020. Immunizing the public against misinformation. Recuperado em (2020).Google ScholarGoogle Scholar
  28. Andreas Peldszus and Manfred Stede. 2015. Joint prediction in MST-style discourse parsing for argumentation mining. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 938--948. https://doi.org/10.18653/v1/D15--1110Google ScholarGoogle ScholarCross RefCross Ref
  29. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  30. Qi Qin, Wenpeng Hu, and Bing Liu. 2020. Feature projection for improved text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8161--8171.Google ScholarGoogle ScholarCross RefCross Ref
  31. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084Google ScholarGoogle ScholarCross RefCross Ref
  32. Sara Rosenthal and Kathleen McKeown. 2012. Detecting opinionated claims in online discussions. In 2012 IEEE sixth international conference on semantic computing. IEEE, 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9. https://doi.org/10.1109/CVPR.2015.7298594Google ScholarGoogle ScholarCross RefCross Ref
  34. Stephen E Toulmin. 2003. The uses of argument. Cambridge university press.Google ScholarGoogle Scholar
  35. Andrew Whalen and Kevin Laland. 2015. Conformity biased transmission in social networks. Journal of Theoretical Biology 380 (2015), 542--549.Google ScholarGoogle ScholarCross RefCross Ref
  36. Evan Williams, Paul Rodrigues, and Valerie Novak. 2020. Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models. arXiv: 2009.02431 (2020). arXiv:2009.02431 [cs.CL]Google ScholarGoogle Scholar
  37. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019).Google ScholarGoogle Scholar
  38. Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, and Feifei Li. 2018. Opentag: Open attribute value extraction from product profiles. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1049--1058. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DESYR: Definition and Syntactic Representation Based Claim Detection on the Web

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
      October 2021
      4966 pages
      ISBN:9781450384469
      DOI:10.1145/3459637

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader