skip to main content
10.1145/3366423.3380158acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Conquering Cross-source Failure for News Credibility: Learning Generalizable Representations beyond Content Embedding

Published:20 April 2020Publication History

ABSTRACT

False information on the Internet has caused severe damage to society. Researchers have proposed methods to determine the credibility of news and have obtained good results. As different media sources (publishers) have different content generators (writers) and may focus on different topics or aspects, the word/topic distribution for each media source is divergent from others. We expose a challenge in the generalizability of existing content-based methods to perform consistently when applied to news from media sources non-existing in the training set, namely the cross-source failure. A cross-source setting can cause a decrease beyond in accuracy for current methods; content-sensitive features are considered one of the major causes of cross-source failure for a content-based approach. To overcome this challenge, we propose a syntactic network for news credibility (SYNC), which focuses on function words and syntactic structure to learn generalizable representations for news credibility and further reinforce the cross-source robustness for different media. Experiments with cross-validation on 194 real-world media sources showed that the proposed method could learn the generalizable features and outperformed the state-of-the-art methods on unseen media sources. Extensive analysis on the embedding feature representation represents a strength of the proposed method compared to current content embedding feature approaches. We envision that the proposed method is more robust for real-life application with SYNC on account of its good generalizability.

References

  1. Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the 2016 election. Journal of economic perspectives 31, 2 (2017), 211–36.Google ScholarGoogle ScholarCross RefCross Ref
  2. Shlomo Argamon and Shlomo Levitan. 2005. Measuring the usefulness of function words for authorship attribution. In Proceedings of the 2005 ACH/ALLC Conference. 4–7.Google ScholarGoogle Scholar
  3. Carlos Argueta, Fernando H Calderon, and Yi-Shin Chen. 2016. Multilingual emotion classifier using unsupervised pattern extraction from microblog data. Intelligent Data Analysis 20, 6 (2016), 1477–1502.Google ScholarGoogle ScholarCross RefCross Ref
  4. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter Bourgonje, Julian Moreno Schneider, and Georg Rehm. 2017. From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles. In Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism. Association for Computational Linguistics, Copenhagen, Denmark, 84–89. https://doi.org/10.18653/v1/W17-4215Google ScholarGoogle ScholarCross RefCross Ref
  6. Alexandre Bovet and Hernán A Makse. 2019. Influence of fake news in Twitter during the 2016 US presidential election. Nature communications 10, 1 (2019), 7.Google ScholarGoogle Scholar
  7. Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, and Arkaitz Zubiaga. 2017. SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours. arXiv preprint arXiv:1704.05972(2017).Google ScholarGoogle Scholar
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  9. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96. 226–231.Google ScholarGoogle Scholar
  10. Andrew Guess, Brendan Nyhan, and Jason Reifler. 2018. Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 US presidential campaign. European Research Council 9 (2018).Google ScholarGoogle Scholar
  11. Manish Gupta, Peixiang Zhao, and Jiawei Han. 2012. Evaluating event credibility on twitter. In Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 153–164.Google ScholarGoogle ScholarCross RefCross Ref
  12. Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verification by exploiting conflicting social viewpoints in microblogs. In Thirtieth AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rie Johnson and Tong Zhang. 2014. Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058(2014).Google ScholarGoogle Scholar
  14. Heejung Jwa, Dongsuk Oh, Kinam Park, Jang Mook Kang, and Hueiseok Lim. 2019. exBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT). Applied Sciences 9, 19 (2019), 4062.Google ScholarGoogle ScholarCross RefCross Ref
  15. Hamid Karimi and Jiliang Tang. 2019. Learning Hierarchical Discourse-level Structure for Fake News Detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 3432–3442.Google ScholarGoogle ScholarCross RefCross Ref
  16. Mike Kestemont. 2014. Function words in authorship attribution. From black magic to theory?. In Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL). 59–66.Google ScholarGoogle ScholarCross RefCross Ref
  17. Junaed Younus Khan, Md Khondaker, Tawkat Islam, Anindya Iqbal, and Sadia Afroz. 2019. A Benchmark Study on Machine Learning Methods for Fake News Detection. arXiv preprint arXiv:1905.04749(2019).Google ScholarGoogle Scholar
  18. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google ScholarGoogle Scholar
  19. Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013. Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining. IEEE, 1103–1108.Google ScholarGoogle ScholarCross RefCross Ref
  20. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect rumors using time series of social context information on microblogging websites. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1751–1754.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.Google ScholarGoogle Scholar
  23. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google ScholarGoogle Scholar
  24. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon Woodard. 2018. Surveying stylometry techniques and applications. ACM Computing Surveys (CSUR) 50, 6 (2018), 86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2015. A review of relational machine learning for knowledge graphs. Proc. IEEE 104, 1 (2015), 11–33.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jeppe Nørregaard, Benjamin D Horne, and Sibel Adalı. 2019. NELA-GT-2018: A Large Multi-Labelled News Dataset for the Study of Misinformation in News Articles. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 630–638.Google ScholarGoogle ScholarCross RefCross Ref
  28. James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google ScholarGoogle Scholar
  29. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.Google ScholarGoogle ScholarCross RefCross Ref
  30. Dean Pomerleau and Delip Rao. 2017. The Fake News Challenge: Exploring how artificial intelligence technologies could be leveraged to combat fake news.Google ScholarGoogle Scholar
  31. Hannah Rashkin, Eunsol Choi, Jin Yea Jang, Svitlana Volkova, and Yejin Choi. 2017. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2931–2937.Google ScholarGoogle ScholarCross RefCross Ref
  32. Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 797–806.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Elvis Saravia, Hsien-Chi Toby Liu, Yen-Hao Huang, Junlin Wu, and Yi-Shin Chen. 2018. CARER: Contextualized Affect Representations for Emotion Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3687–3697. https://doi.org/10.18653/v1/D18-1404Google ScholarGoogle ScholarCross RefCross Ref
  34. Baoxu Shi and Tim Weninger. 2016. Fact checking in heterogeneous information networks. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 101–102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding streams in knowledge graphs to support fact checking. In 2017 IEEE International Conference on Data Mining (ICDM). IEEE, 859–864.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding user profiles on social media for fake news detection. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 430–435.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond news contents: The role of social context for fake news detection. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 312–320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Efstathios Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538–556.Google ScholarGoogle ScholarCross RefCross Ref
  39. James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, and Arpit Mittal. 2018. The fact extraction and verification (fever) shared task. arXiv preprint arXiv:1811.10971(2018).Google ScholarGoogle Scholar
  40. Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146–1151.Google ScholarGoogle Scholar
  41. William Yang Wang. 2017. ”Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In ACL.Google ScholarGoogle Scholar
  42. Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. 2018. Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining. ACM, 849–857.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. You Wu, Pankaj K Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward computational fact-checking. Proceedings of the VLDB Endowment 7, 7 (2014), 589–600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kai-Chou Yang, Timothy Niven, and Hung-Yu Kao. 2019. Fake News Detection as Natural Language Inference. The Twelfth International Conference on Web Search and Data Mining. (2019).Google ScholarGoogle Scholar
  45. Shuo Yang, Kai Shu, Suhang Wang, Renjie Gu, Fan Wu, and Huan Liu. 2019. Unsupervised fake news detection on social media: A generative approach. In Proceedings of 33rd AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237(2019).Google ScholarGoogle Scholar
  47. Xinyi Zhou, Atishay Jain, Vir V Phoha, and Reza Zafarani. 2019. Fake News Early Detection: A Theory-driven Model. arXiv preprint arXiv:1904.11679(2019).Google ScholarGoogle Scholar

Index Terms

  1. Conquering Cross-source Failure for News Credibility: Learning Generalizable Representations beyond Content Embedding
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              WWW '20: Proceedings of The Web Conference 2020
              April 2020
              3143 pages
              ISBN:9781450370233
              DOI:10.1145/3366423

              Copyright © 2020 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 April 2020

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate1,899of8,196submissions,23%

              Upcoming Conference

              WWW '24
              The ACM Web Conference 2024
              May 13 - 17, 2024
              Singapore , Singapore

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format