research-article

FakeNewsSetGen: a Process to Build Datasets that Support Comparison Among Fake News Detection Methods

Authors:

Flávio Roberto Matias da Silva,

Paulo Márcio Souza Freire,

Marcelo Pereira de Souza,

Gustavo de A. B. Plenamente,

Ronaldo Ribeiro GoldschmidtAuthors Info & Claims

WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web

Pages 241 - 248

https://doi.org/10.1145/3428658.3430965

Published: 30 November 2020 Publication History

Abstract

Due to easy access and low cost, social media online news consumption has increased significantly for the last decade. Despite their benefits, some social media allow anyone to post news with intense spreading power, which amplifies an old problem: the dissemination of Fake News. In the face of this scenario, several machine learning-based methods to automatically detect Fake News (MLFN) have been proposed. All of them require datasets to train and evaluate their detection models. Although recent MLFN were designed to consider data regarding the news propagation on social media, most of the few available datasets do not contain this kind of data. Hence, comparing the performances amid those recent MLFN and the others is restricted to a very limited number of datasets. Moreover, all existing datasets with propagation data do not contain news in Portuguese, which impairs the evaluation of the MLFN in this language. Thus, this work proposes FakeNewsSetGen, a process that builds Fake News datasets that contain news propagation data and support comparison amid the state-of-the-art MLFN. FakeNewsSetGen's software engineering process was guided to include all kind of data required by the existing MLFN. In order to illustrate FakeNewsSetGen's viability and adequacy, a case study was carried out. It encompassed the implementation of a FakeNewsSetGen prototype and the application of this prototype to create a dataset called FakeNewsSet, with news in Portuguese. Five MLFN with different kind of data requirements (two of them demanding news propagation data) were applied to FakeNewsSet and compared, demonstrating the potential use of both the proposed process and the created dataset.

References

[1]

Eduardo Bezerra. 2007. Princípios de Análise e Projeto de Sistemas com UML. Vol. 2. Elsevier, Rio de Janeiro, RJ, Brazil.

[2]

C. Buntain and J. Golbeck. 2017. Automatically Identifying Fake News in Popular Twitter Threads. In 2017 IEEE International Con on Smart Cloud (SmartCloud). IEEE, New York, NY, USA, 208--215. https://doi.org/10.1109/SmartCloud.2017.40

[3]

Sonia Castelo et al. 2019. A Topic-Agnostic Approach for Identifying Fake News Pages. In CompanionProceedings of The 2019 World Wide Web Conference (San Francisco, USA) (WWW '19). ACM, New York, NY, USA, 975--980.

[4]

Paulo Roberto Cordeiro and Vladia Pinheiro. 2019. Um Corpus de Notícias Falsas do Twitter e Verificação Automática de Rumores em Lingua Portuguesa. In STIL - Brazilian Symposium in Information and Human Language Technology. IEEE, Salvaldor, BA, Brazil, 220--228.

[5]

X. Dong et al. 2014. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In ACM SIGKDD international Con on Knowledge discovery and data mining. ACM, New York, NY, USA, 601--610.

Digital Library

[6]

Mehrdad Farajtabar et al. 2017. Fake News Mitigation via Point Process Based Intervention. In Proceedings of the 34th International Con on Machine Learning (Sydney, Australia) (ICML '17). JMLR.org, Sydney, NSW, Australia, 1097--1106.

[7]

Paulo Márcio Souza Freire and Ronaldo Ribeiro Goldschmidt. 2019. Fake News Detection on Social Media via Implicit Crowd Signals. In Proceedings of the 25th Brazilian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia '19). ACM, New York, NY, USA, 521--524.

Digital Library

[8]

Paulo Márcio Souza Freire and Ronaldo Ribeiro Goldschmidt. 2019. Uma Introdução ao Combate Automático às Fake News em Redes Sociais Virtuais. In Tópicos de Gerenciamento de Dados e Informação (Rio de Janeiro, Brazil) (34th SBBD). SBC, Fortaleza, CE, Brazil, 38--67. http://http://sbbd.org.br/2019/proceedings/

[9]

S. Gilda. 2017. Evaluating machine learning algorithms for fake news detection. In 2017 IEEE 15th Student Con on Research and Development (SCOReD). IEEE, Putrajaya, Malaysia, 110--115. https://doi.org/10.1109/SCORED.2017.8305411

[10]

Jennifer Golbeck et al. 2018. Fake news vs satire: A dataset and analysis. In WebSci 2018 - Proceedings of the 10th ACM Conference on Web Science. ACM, New York, NY, USA, 17--21. https://doi.org/10.1145/3201064.3201100

[11]

Vishal Gupta et al. 2009. A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence 1 (2009), 60--76.

[12]

S. Helmstetter et al. 2018. Weakly Supervised Learning for Fake News Detection on Twitter. In ACM International Conference). IEEE, Barcelona, Spain, 274--277.

[13]

Benjamin D. Horne and Sibel Adali. 2017. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. In Association for the Advancement of Artificial Inteligence. The 2nd International Workshop on News, New York, USA, 163--173.

[14]

Christian Janze and Marten Risius. 2017. Automatic Detection of Fake News on Social Media Platforms. In PACIS 2017. AISeL, Frankfurt, Germany, 261.

[15]

Ron Kohavi et al. 1995. A study of cross-validation and bootstrap for accuracy estimation. In Ijcai, Vol. 14. researchgate, Montreal, Canada, 1137--1145.

[16]

Y. Liu et al. 2018. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks. In AAAI Con on Artificial Intelligence. AAAI, New Jersey, USA, 354--361.

[17]

Yelena Mejova and Kyriaki Kalimeri. 2020. Advertisers Jump on Coronavirus Bandwagon: Politics, News, and Business. ArXiv abs/2003.00923 (2020), 90--101.

[18]

Rafael A. Monteiro et al. 2018. Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In Computational Processing of the Portuguese Language. Springer International, Cham, 324--334.

[19]

Marcos Paulo Moraes, Jonice de Oliveira Sampaio, and Anderson Cordeiro Charles. 2019. Data Mining Applied in Fake News Classification through Textual Patterns. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (RJ, Brazil) (WebMedia '19). ACM, New York, USA, 321--324.

Digital Library

[20]

João Moreno and Graça Bressan. 2019. FACTCK.BR: A New Dataset to Study Fake News. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia '19). ACM, New York, NY, USA, 525--527.

Digital Library

[21]

Mehwish Nasim et al. 2018. Real-time Detection of Content Polluters in Partially Observable Twitter Networks. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International World Wide Web Con Steering Committee, Republic and Canton of Geneva, Switzerland, 1331--1339.

[22]

Thiago Covões Pedro Faustini. 2019. Fake News Detection Using One-Class Classification. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, Salvador, Brazil, 592--597. https://doi.org/10.18653/v1/n19-1141

[23]

Roger Pressman and Bruce Maxim. 2016. Engenharia de Software-8a Edição. McGraw Hill Brasil, Connecticut, EUA.

[24]

V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea. 2018. Automatic Detection of Fake News. In International Conference on Computational Linguistics. arXiv.org, Santa Fe, New Mexico, USA, 3391--3401.

[25]

Feng Qian et al. 2018. Neural user response generator: Fake news detection with collective user intelligence. IJCAI International Joint Conference on Artificial Intelligence 2018-July (2018), 3834--3840.

[26]

Victoria L Rubin and Niall J Conroy. 2015. Towards news verification: Deception detection methods for news discourse. In Hawaii International Conference on System Sciences. WebSci 2018 - Proceedings of the 10th ACM Conference on Web Science, Ontario, CANADA, 13 pages.

[27]

Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. CSI: A Hybrid Deep Model for Fake News Detection. In Proceedings of the 2017 ACM on Con on Information and Knowledge Management (Singapore, Singapore) (CIKM '17). ACM, New York, NY, USA, 797--806. https://doi.org/10.1145/3132847.3132877

Digital Library

[28]

Giovanni C Santia and Jake Ryland Williams. 2018. Buzz face: A news veracity dataset with facebook user commentary and egos. In 12th International AAAI Conference on Web and Social Media. AAAI, Pennsylvania, USA, 531--540.

[29]

Kai Shu et al. 2017. Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explor. Newsl. 19, 1 (Sept. 2017), 22--36.

[30]

Kai Shu et al. 2020. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big Data 8, 3 (2020), 171--188. https://doi.org/10.1089/big.2020.0062 arXiv:https://doi.org/10.1089/big.2020.0062 32491943.

[31]

Kai Shu, Deepak Mahudeswaran, and Huan Liu. 2019. FakeNewsTracker: a tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory 25, 1 (01 Mar 2019), 60--71.

[32]

Ajitesh Srivastava et al. 2018. FActCheck: Keeping Activation of Fake News at Check. In Proceedings of the 17th International Conf on Autonomous Agents and MultiAgent Systems (Stockholm, Sweden) (AAMAS '18). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 2079--2081.

[33]

Sebastian Tschiatschek et al. 2018. Fake News Detection in Social Networks via Crowd Signals. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International World Wide Web Con Steering Committee, Republic and Canton of Geneva, Switzerland, 517--524.

[34]

Soroush Vosoughi et al. 2017. Rumor Gauge: Predicting the Veracity of Rumors on Twitter. ACM Trans. Knowl. Discov. Data 11, 4, Article 50 (July 2017), 36 pages.

[35]

Patrick Wang et al. 2018. Is This the Era of Misinformation Yet: Combining Social Bots and Fake News to Deceive the Masses. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International WWW Con Steering Committee, Republic and Canton of Geneva, Switzerland, 1557--1561.

[36]

William Yang Wang. 2017. "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, Canada, 422--426. https://doi.org/10.18653/v1/P17-2067

[37]

Vinicius Woloszyn and Wolfgang Nejdl. 2018. DistrustRank: Spotting False News Domains. In Proceedings of the 10th ACM Con on Web Science (Amsterdam, Netherlands) (WebSci '18). ACM, New York, NY, USA, 221--228.

Digital Library

[38]

Liang Wu and Huan Liu. 2018. Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate. In Proceedings of the Eleventh ACM International Con on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM '18). ACM, New York, NY, USA, 637--645.

Digital Library

[39]

Fan Yang et al. 2019. XFake: Explainable Fake News Detector with Visualizations. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). ACM, New York, NY, USA, 3600--3604. https://doi.org/10.1145/3308558.3314119

[40]

Qiang Zhang et al. 2018. Ranking-based Method for News Stance Detection. In Proceedings The WebCon 2018 (Lyon, France) (WWW '18). World Wide Web Con Steering Committee, Rep. Canton Geneva, Switzerland, 41--42.

[41]

Xinyi Zhou and Reza Zafarani. 2018. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv:1812.00315 pages. arXiv:1812.00315 [cs.CL]

Cited By

Liu ZZhang TYang KThompson PYu ZAnaniadou S(2024)Emotion detection for misinformation: A reviewInformation Fusion10.1016/j.inffus.2024.102300107(102300)Online publication date: Jul-2024
https://doi.org/10.1016/j.inffus.2024.102300
Oliveira Moreira TPassos CMatias da Silva FSouza Freire PFernandes de Souza IBosaipo Sales da Silva CGoldschmidt R(2024)JEDi - a digital educational game to support student training in identifying portuguese-written fake news: Case studies in high school, undergraduate and graduate scenariosEducation and Information Technologies10.1007/s10639-023-12309-z29:10(11815-11845)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10639-023-12309-z
Souza Freire PMatias da Silva FGoldschmidt R(2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
https://dl.acm.org/doi/10.1016/j.eswa.2021.115414
Show More Cited By

Recommendations

MDFEND: Multi-domain Fake News Detection
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Fake news spread widely on social media in various domains, which lead to real-world threats in many aspects like politics, disasters, and finance. Most existing approaches focus on single-domain fake news detection (SFND), which leads to unsatisfying ...
Is It Really Fake? – Towards an Understanding of Fake News in Social Media Communication
Social Computing and Social Media. User Experience and Behavior
Abstract
This paper outlines the development of Fake News and seeks to clarify different perspectives regarding the term within Social Media communication. Current information systems, such as Social Media platforms, allow real-time communication, enabling ...
Interpretable Fake News Detection on Social Media
ICSIM '23: Proceedings of the 2023 6th International Conference on Software Engineering and Information Management

With the development of information technology, public opinion can quickly spread to all over the world, permeate every corner of social life, and have a great impact on human's lives. Extracted from large-scale and multi-mode social media, user-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web

November 2020

364 pages

ISBN:9781450381963

DOI:10.1145/3428658

General Chair:
Carlos de Salles Soares Neto
UFMA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SBC: Brazilian Computer Society
CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
CGIBR: Comite Gestor da Internet no Brazil
CAPES: Brazilian Higher Education Funding Council

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WebMedia '20

Sponsor:

WebMedia '20: Brazillian Symposium on Multimedia and the Web

November 30 - December 4, 2020

São Luís, Brazil

Acceptance Rates

WebMedia '20 Paper Acceptance Rate 34 of 87 submissions, 39%;

Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
174
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu ZZhang TYang KThompson PYu ZAnaniadou S(2024)Emotion detection for misinformation: A reviewInformation Fusion10.1016/j.inffus.2024.102300107(102300)Online publication date: Jul-2024
https://doi.org/10.1016/j.inffus.2024.102300
Oliveira Moreira TPassos CMatias da Silva FSouza Freire PFernandes de Souza IBosaipo Sales da Silva CGoldschmidt R(2024)JEDi - a digital educational game to support student training in identifying portuguese-written fake news: Case studies in high school, undergraduate and graduate scenariosEducation and Information Technologies10.1007/s10639-023-12309-z29:10(11815-11845)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10639-023-12309-z
Souza Freire PMatias da Silva FGoldschmidt R(2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
https://dl.acm.org/doi/10.1016/j.eswa.2021.115414
de Souza JAssis EMendonça Fde Souza JPereira Ada Rocha L(2021)ReVera FrameworkProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3470482.3479626(137-140)Online publication date: 5-Nov-2021
https://dl.acm.org/doi/10.1145/3470482.3479626

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten