ABSTRACT
The social confusion caused by the recent pandemic of COVID-19 has been further facilitated by fake news diffused via social media on the Internet. For this reason, many studies have been proposed to detect fake news as early as possible. The content-based detection methods consider the difference between the contents of true and fake news articles. However, they suffer from the two serious limitations: (1) the publisher can manipulate the content of a news article easily, and (2) the content depends upon the language, with which the article is written. To overcome these limitations, the diffusion-based fake news detection methods have been proposed. The diffusion-based methods consider the difference among the diffusion patterns of true and fake news articles on social media. Despite its success, however, the lack of the diffusion information regarding to the COVID-19 related fake news prevents from studying the diffusion-based fake news detection methods. Therefore, for overcoming the limitation, we propose a diffusion-based fake news detection framework (D-FEND), which consists of four components: (C1) diffusion data collection, (C2) analysis of the data and feature extraction, (C3) model training, and (C4) inference. Our work contributes to the effort to mitigate the risk of infodemics during a pandemic by (1) building a new diffusion dataset, named CoAID+, (2) identifying and addressing the class imbalance problem of CoAID+, and (3) demonstrating that D-FEND successfully detects fake news articles with 88.89% model accuracy on average.
- [n. d.]. BuzzfeedNews Dataset. https://github.com/BuzzFeedNews/everything.Google Scholar
- Hadeer Ahmed, Issa Traore, and Sherif Saad. 2017. Detection of online fake news using n-gram analysis and machine learning techniques. In Proceedings of the International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, 127--138.Google ScholarCross Ref
- Sameer Badaskar, Sachin Agarwal, and Shilpa Arora. 2008. Identifying real or fake articles: Towards better language modeling. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II.Google Scholar
- Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web. 675--684.Google ScholarDigital Library
- Gavin C Cawley. 2006. Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In Proceedings of the 2006 IEEE international joint conference on neural network proceedings. IEEE, 1661--1668.Google Scholar
- Gavin C Cawley and Nicola LC Talbot. 2003. Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers. Pattern Recognition 36, 11 (2003), 2585--2592.Google ScholarCross Ref
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.Google ScholarCross Ref
- Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter 6, 1 (2004), 1--6.Google Scholar
- Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885 (2020).Google Scholar
- Mohamed K Elhadad, Kin Fun Li, and Fayez Gebali. 2020. An ensemble deep learning technique to detect COVID-19 misleading information. In Proceedings of the International Conference on Network-Based Information Systems. Springer, 163--175.Google Scholar
- Adel Ghazikhani, Hadi Sadoghi Yazdi, and Reza Monsefi. 2012. Class imbalance handling using wrapper-based random oversampling. In Proceedings of the 20th Iranian Conference on Electrical Engineering (ICEE2012). IEEE, 611--616.Google ScholarCross Ref
- Sunil Gundapu and Radhika Mamidi. 2021. Transformer based Automatic COVID-19 Fake News Detection System. arXiv preprint arXiv:2101.00180 (2021).Google Scholar
- Shunjie Han, Cao Qubo, and Han Meng. 2012. Parameter selection in SVM with RBF kernel function. In Proceedings of the World Automation Congress 2012. IEEE, 1--4.Google Scholar
- Haibo He and Edwardo A Garcia. 2009. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263--1284.Google ScholarDigital Library
- Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News verification by exploiting conflicting social viewpoints in microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarCross Ref
- Michael Kearns and Dana Ron. 1999. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural computation 11, 6 (1999), 1427--1453.Google Scholar
- Adam Kucharski. 2016. Study epidemiology of fake news. Nature 540, 7634 (2016), 525--525.Google Scholar
- Bor-Chen Kuo, Hsin-Hua Ho, Cheng-Hsuan Li, Chih-Cheng Hung, and Jin-Shiuh Taur. 2013. A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7, 1 (2013), 317--326.Google ScholarCross Ref
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436--444.Google Scholar
- Yin Liu and Keshab K Parhi. 2016. Computing RBF kernel for SVM classification using stochastic logic. In Proceedings of the 2016 IEEE International Workshop on Signal Processing Systems (SiPS). IEEE, 327--332.Google ScholarCross Ref
- Yang Liu and Yi-Fang Brook Wu. 2018. Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Proceedings of the AAAI conference on Artificial Intelligence.Google ScholarCross Ref
- Tanushree Mitra and Eric Gilbert. 2015. Credbank: A large-scale social media corpus with associated credibility annotations. In Proceedings of the international AAAI conference on web and social media.Google Scholar
- Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M Bronstein. 2019. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019).Google Scholar
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830.Google Scholar
- Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2017. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 (2017).Google Scholar
- Benjamin Riedel, Isabelle Augenstein, Georgios P Spithourakis, and Sebastian Riedel. 2017. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv preprint arXiv:1707.03264 (2017).Google Scholar
- Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. Csi: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 797--806.Google ScholarDigital Library
- Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2008. RUSBoost: Improving classification performance when training data is skewed. In Proceedings of the 2008 19th International Conference on Pattern Recognition. IEEE, 1--4.Google ScholarCross Ref
- Gautam Kishore Shahi and Durgesh Nandini. 2020. FakeCovid-A multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343 (2020).Google Scholar
- Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2020. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data 8, 3 (2020), 171--188.Google Scholar
- Kai Shu, Deepak Mahudeswaran, Suhang Wang, and Huan Liu. 2020. Hierarchical propagation networks for fake news detection: Investigation and exploitation. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 626--637.Google ScholarCross Ref
- Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19, 1 (2017), 22--36.Google ScholarDigital Library
- Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding user profiles on social media for fake news detection. In Proceedings of the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 430--435.Google ScholarCross Ref
- Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond news contents: The role of social context for fake news detection. In Proceedings of the twelfth ACM international conference on web search and data mining. 312--320.Google ScholarDigital Library
- Mirela Silva, Fabrício Ceschin, Prakash Shrestha, Christopher Brant, Juliana Fernandes, Catia S Silva, André Grégio, Daniela Oliveira, and Luiz Giovanini. 2020. Predicting misinformation and engagement in covid-19 twitter discourse in the first months of the outbreak. arXiv preprint arXiv:2012.02164 (2020).Google Scholar
- Shivangi Singhal, Rajiv Ratn Shah, Tanmoy Chakraborty, Ponnurangam Kumaraguru, and Shin'ichi Satoh. 2019. Spotfake: A multi-modal framework for fake news detection. In Proceedings of the 2019 IEEE fifth international conference on multimedia big data (BigMM). IEEE, 39--47.Google ScholarCross Ref
- Eugenio Tacchini, Gabriele Ballarin, Marco L Della Vedova, Stefano Moret, and Luca de Alfaro. 2017. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506 (2017).Google Scholar
- Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146--1151.Google Scholar
- Juanjuan Wang, Mantao Xu, Hui Wang, and Jiwu Zhang. 2006. Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In Proceedings of the 2006 8th international Conference on Signal Processing, Vol. 3. IEEE.Google ScholarCross Ref
- William Yang Wang. 2017. " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).Google Scholar
- Xinyi Zhou, Apurva Mulay, Emilio Ferrara, and Reza Zafarani. 2020. Recovery: A multimodal repository for covid-19 news credibility research. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 3205--3212.Google ScholarDigital Library
Index Terms
- D-FEND: a diffusion-based fake news detection framework for news articles related to COVID-19
Recommendations
Beyond News Contents: The Role of Social Context for Fake News Detection
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningSocial media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an ...
Fake News Early Detection: A Theory-driven Model
Field NotesMassive dissemination of fake news and its potential to erode democracy has increased the demand for accurate fake news detection. Recent advancements in this area have proposed novel techniques that aim to detect fake news by exploring how it ...
FakeFinder: Twitter Fake News Detection on Mobile
WWW '20: Companion Proceedings of the Web Conference 2020Misinformation, or fake news, spreads quickly on the social media platform Twitter. Mobile devices are widely used to read Twitter posts. A mobile app that can detect fake news from the live Twitter stream and alert users in real time is an effective ...
Comments